Hi everyone,

just a short notice for all of your that I’ve moved my blog to a new platform and I’ve finally made use of the domain in my possession. I must say I feel much comfy here with all those gadgets and settings. Yes, I’m running it on my own. Can’t help it, the geek in me likes to have control and fine-tune everything.

All the posts including code, pictures, comments, links, dates and other settings are transfered successfully. Some things still need to be done, but the blog is functional and running and my vacation is starting :-) .

The old blog is still on, but has its comments closed. Its content will be updated to include the corresponding links on this blog. Finally, to keep only one version of truth, posts in the old blog will be deleted after some time, but their titles will remain. Inside each post in the old blog I will put a direct link to corresponding post on this blog. In short, they will become shortcuts and stay such in future.

Please, update your links and favorites that refer to my old blog when you find the time. Thank you.

If you’re interested in performing the same operation I made, drop me an e-mail. I might also post an article about it.

Hope you’ll like the new environment. My apologies for any inconvenience I caused by this decision.

Today is the 2nd anniversary of my engagement in MSDN SSAS forum which is a good excuse to summarize activity because somehow I’ve had a feeling that I did much less while spending definitely equal amount of time if not more. Was it really so?

If we don’t measure, we … well, definitely can’t tell for sure – might be a free interpretation of the famous quote made from the technical perspective. In blog articles I usually try to perform an analysis which then results in a conclusion. In this one what I’ll do is a self-analysis with data gathered from various sites combined with some of my own.

Forum stats are as follows: 1250+ posts, 333+ answers, 6200+ points. As mentioned in advance, last year’s focus was supposed to be different – less forum activity, more of other types of engagement. And so it was. My forum contribution was reduced to only 25% of previous year’s posts (which had made me felt less active), but the percentage of correct answers increased relative to that (66% of previous year’s answers). Not bad. I wish to think it means only one thing – that I’m improving my BI skills by doing more with less. This observation, deduced from a quick measurement, helped instantly :-) .

So what were those other activities?

Well, I wrote about a dozen technical articles in my blog (300% increase, 150% if we track exactly the same period because I started blogging in October), 4 articles for local magazines two of which were published, volunteered at 1 foreign conference, had 1 presentation for the local user group, 3 sessions at local conferences and 1 lecture at local university. Again a surprise. I spent a lot of time on these activities, partially because they were not routine to me, partially because they required more time than forum did (in form of preparation). For example, to blog it takes one night (for a good technical article), to lecture or write a magazine article it takes few days up to a week, to prepare a session for a conference it takes three weeks. In contrast, to answer a question on forum it takes approximately few hours of engagement (a rough estimate, spans from few minutes to several days; it varies a lot). Once put together like this, it made sense and again caused my perception to become more positive.

Then, I travelled a lot. I’ve been precisely twice around the globe in last two years with YoY (year-over-year) index of 200% in mileage and 150% in number of days. I’ve been to my very first MVP Summit and I’ve visited 5 conferences in total where I learned a lot and met interesting people. What I’m most happy about is the fact that on some of those trips I had a pleasure of sharing the experience with my family, which, now that I come to mention it, will increase in September :-) .

There were some downs as well. PowerPivot and DAX appeared while Mosha said goodbye to BI at the same time. Yes, that casted a big shadow of uncertainty on future of SSAS and MDX, the things I like the most. Furthermore, as long as PowerPivot and DAX are tied to Excel with no APIs or similar, to a BI developer it is nothing but a peep show. Seriously, there are so many possibilities what we could do with it, programmatically! Anyway, I suppose things will inevitably change and therefore I’m learning this technology intensively. Even found a niche to blog about which amuses me – comparing DAX with MDX and T-SQL.

I also missed Microsoft BI conference in New Orleans and failed to present at some foreign conferences lately. I’ll try my luck again, but this won’t be an area I’ll concentrate the most. Next year that will be my blog.

In the following year I’ll continue blogging and I’ll increase the quantity of blog articles even more. That’s an activity I see myself in. Free topics, writing when inspired, cooking ideas in between the posts, fast delivery and not depending on anyone or anything else. I’ll probably change the platform soon. I’ve been delaying this for some time, but now that (already poor) stats are gone, it’s a clear case for me. As you can see, I like the stats very much. They are a measurement of silent feedback – how many people visited and perhaps read an article. And as we saw earlier, being able to see the results of that measurement is directly linked to feeling of worthiness.

Lower priorities are exams, learning and having a session or two at user group meeting or local conference. However, a recent opportunity to finally finish my master degree in economics might conflict that making those activities even less important because there might not be enough time for everything.

We’ll see which way the story went in a year from now review.

Boyan Penev recently blogged about a strange behavior which manifests when ordering two related sets thereby mentioning a thread in SSAS forum answered by Deepak Puri. I found this intriguing and decided to reply directly, but the explanation kept growing, so I finally cut everything out and put it as an article on my blog instead. After all, this way my chances of being enlisted in his blog roll should increase 8-).

Here’s the problem in short. The first query works, the second doesn’t. The difference? The second query has related hierarchies while the first one doesn’t.

WITH
SET [RowSet]    AS [Product].[Subcategory].Levels(1).Members
SET [ColumnSet] AS [Date].[Calendar].Levels(1).Members
SELECT
     Order( [RowSet], [ColumnSet].Item(0), BDESC ) ON ROWS,
     [ColumnSet]                                   ON COLUMNS
FROM [Adventure Works]

 

WITH
SET [RowSet]    AS [Product].[Subcategory].Levels(1).Members
SET [ColumnSet] AS [Product].[Category].Levels(1).Members
SELECT
     Order( [RowSet], [ColumnSet].Item(0), BDESC ) ON ROWS,
     [ColumnSet]                                   ON COLUMNS
FROM [Adventure Works]

 

Now, some of you may have already guessed this has to be a case of attribute relations and yes, I admit you’re right. Nevertheless, I think an analysis would be beneficial, after all Boyan asked for it.

First, let’s make a small but important distinction. Those queries look like they have been written by a T-SQL developer. Why am I saying it? Because in T-SQL, ORDER is typically (although not necessarily) based on a column, be it numeric, date or string. A set being returned is ordered based on the values of that column.

In MDX, ORDER() is based on an expression which means that the second argument is automatically converted to a tuple (ColumnSet.Item(0)), and then a value of it: (ColumnSet.Item(0)).Value. Let’s see how those two MDX queries might look like in T-SQL.

use AdventureWorksDW2008;

with
t as
(
select
    SubCategory = s.EnglishProductSubcategoryName,
    CalendarYear = d.CalendarYear,
    ResellerSales = sum(f.SalesAmount)
from dbo.DimProductSubcategory s
    join dbo.DimProduct p
        on p.ProductSubcategoryKey = s.ProductSubcategoryKey
    join dbo.FactResellerSales f
        on f.ProductKey = p.ProductKey
    join dbo.DimDate d
        on d.DateKey = f.OrderDateKey
group by
    s.EnglishProductSubcategoryName,
    d.CalendarYear
)
, pvt as
(
select * from t
pivot( sum(ResellerSales)
       for CalendarYear in ( [2001], [2002], [2003], [2004], [2006] ) ) as pvt
)
select * from pvt
order by
– sort A
[2001] desc;
– sort B
/*
(select sum(ResellerSales) from t
        where t.SubCategory = pvt.SubCategory and
              t.CalendarYear = 2001) desc;
*/

 

with
t as
(
select
    SubCategory = s.EnglishProductSubcategoryName,
    Category = c.EnglishProductCategoryName,
    ResellerSales = sum(f.SalesAmount)
from dbo.DimProductSubcategory s
    join dbo.DimProduct p
        on p.ProductSubcategoryKey = s.ProductSubcategoryKey
    join dbo.FactResellerSales f
        on f.ProductKey = p.ProductKey
    join dbo.DimProductCategory c
        on c.ProductCategoryKey = s.ProductCategoryKey
group by
    s.EnglishProductSubcategoryName,
    c.EnglishProductCategoryName
)
, pvt as
(
select * from t
pivot( sum(ResellerSales)
       for Category in ( [Bikes], [Components], [Clothes], [Accessories] ) ) AS pvt
)
select * from pvt
order by (select sum(ResellerSales) from t
          where –t.SubCategory = pvt.SubCategory and
                t.Category = ‘Accessories’
         ) desc;

 

They return exactly the same result as those two MDX queries. Btw, who says MDX is not elegant and easy?

Now, let’s analyze them.

The first query is ordered by column [2001]. We said that MDX uses expressions when sorting, so we’ll try to do the same using an expression. We can comment [2001] DESC and un-comment what’s below “sort B”, but the result won’t change.

In the second query I deliberately skipped sorting by a column (ORDER BY [Accessories] DESC) and used the version with an expression in order to get the false result. The column version would yield the correct, expected result. While we might have thought sorting in MDX behaves that way, the truth is it doesn’t. MDX has some specifics and we need to use the expression version in order to show them.

Important thing to notice is that in both queries there’s a join which binds subcategories (disregard it being commented in the second query at the moment, act like it’s still there). We can imagine this as being the first phase in setting the condition of those correlated sub-queries.

The next phase would be to expand this by specifying the argument of the Order() statement in MDX. In case of a Date dimension, that was year 2001. Since that is a different dimension, nothing has changed in our initial condition.

In the second query, there was a related hierarchy (same dimension) which came after the initial phase, so it forced everything so far to adjust to it (which btw might be seen as a third phase). You can imagine this third phase as limiting the join further by adding conditions like this:

and exists (select ProductCategoryKey from dbo.DimProductCategory
            where EnglishProductCategoryName = pvt.SubCategory)

 

When the related hierarchy is “below” the current one, the initial join could be considered as removed (instead of bringing this extra EXISTS which makes no sense if you analyze what’s inside). Therefore I commented it in the query. In fact, I manually did what SSAS engine implicitly does to joins from the initial phase when a related hierarchy comes in the second phase and when the relation is 1:N, or in OLAP terms – below granularity.

When it is “above”, the initial join is just more limited (now it makes sense to use EXISTS).

In DAX it would be something like the ALL() function (for analogies between MDX, T-SQL and DAX I suggest reading my previous blog article).

Err, one more thing, the join is based on keys rather than names, but for simplicity reasons I haven’t included them in queries.

Now let’s get back to OLAP and it’s multidimensional space.

Cube’s space is N-dimensional which means each coordinate should be N-di
mensional too. SSAS formula engine fills the missing parts using the current members of non-mentioned hierarchies. But who are they?

Based on members explicitly specified in a coordinate, the engine implicitly (the way a trigger fires in T-SQL) finds members on related hierarchies. They become current members of those hierarchies. For non-mentioned dimensions (not hierarchies), the engine picks current members which again are “current” because of some previous context set by an expression, function, slicer or the cube itself.

In the first example year 2001 sets the context of that expression. All the hierarchies of Date dimension implicitly shift their current members accordingly (which is not important in this case, but let it be known). Since the Product and others dimension were not mentioned in that expression, we traverse to previous contexts (backwards) in order to find their current members. The Filter() function sets the immediate previous context. For Product dimension, Product.Subcategory hierarchy to be precise. – How come? – Well, Filter() is sort of a loop, which means we iterate over members in its set (the first argument) which in turn become “current” in that expression, one by one. Other dimensions are not important in this case but the process of identifying current members might be visualized likewise.

In the second example Accessories category sets the context. Its presence automatically shifts Product.Subcategory current member to a related position. – Where? – To its root member. – Why? – Because relation is 1:N. If it were N:1, it would shift to a single related member. Since it can’t find one, it shifts to its root member which is a way of saying “I can not find a direct relation to a single member, therefore I’ll move to a place where I don’t stand in a way, a neutral position that doesn’t change the (explicit) coordinate, though complies to the obligation of me being present in the coordinate”.  – Let me think it through a bit … OK, I see. But how come the current member of Product.Subcategory hierarchy didn’t stay too? I mean, he was in Filter() function too, as the first argument. – Correct, but they are not treated the same. – Why’s that? – Layers! – What? – Calculation layers. In form of a stack. On top there’s only Accessories because only that was explicitly there in the expression and that expression is the last thing evaluated in the Filter() function. – Why is that the last thing? – Well, first you need to set a loop. Then you evaluate the expression for each member, right? As you would do in programming, loop is outside, assignments and calculations are inside the loop. Therefore, Product.Subcategory set (and its current members in a loop) is one layer below in this stack. Whoever is NOT on top, or shall we say BELOW the layer being currently evaluated might be implicitly shifted. Might, because it depends whether he’s related or not. If he is, as in the second example, he will shift, but if he isn’t as in the first example, nothing will happen. He’ll come unchanged. In turn, he might determine who’ll be shifted among his “relatives” in layers below his. And so on. The winner takes it all approach among “relatives” and the winner is the “relative” being explicitly in the coordinate, on top of the others. – Oh, I see. Current and lower layers you say … Wait, I got an idea! In order to make them equal, we should bring them on the same layer, right? – Exactly! – And how do you suggest we should do that? – The expression. That determines the top layer in this case so both of them should be there. – Ok, but how? – Simple, by providing a tuple. We should expand the existing condition with .CurrentMember of the first set.

Here are both the false and the correct MDX for the second query, to emphasize their differences.

WITH
SET [RowSet]    AS [Product].[Subcategory].Levels(1).Members
SET [ColumnSet] AS [Product].[Category].Levels(1).Members
SELECT
     Order( [RowSet], [ColumnSet].Item(0), BDESC ) ON ROWS,
     [ColumnSet]                                   ON COLUMNS
FROM [Adventure Works]
 

WITH
SET [RowSet]    AS [Product].[Subcategory].Levels(1).Members
SET [ColumnSet] AS [Product].[Category].Levels(1).Members
SELECT
     Order( [RowSet], ( [ColumnSet].Item(0),
                      
[RowSet].CurrentMember ), BDESC ) ON ROWS,
     [ColumnSet]                                          ON COLUMNS
FROM [Adventure Works]

 

Forcing the current member is like saying “I want my current member to stay and not get implicitly converted to other members. I’m forcing this coordinate and I know what I’m doing, return me the value of this explicitly requested intersection (aka coordinate)!”.

Now, return to the second T-SQL query and un-comment the join. You’ll get the correct result because this time you explicitly forced the join. Or shall we say you prevented this join from disappearing in T-SQL just as you prevented the current member in MDX from changing because of attributes relations in that dimension. Interesting?

Finally, it wouldn’t hurt to specify the same tuple in the first MDX query too. I mean, the result would not change, they would stay correct. So, if you, the reader of this article, are searching for the expression which works in any scenario, use the one with the tuple.

Important thing to remember. In MDX you can almost always omit current members in your expressions. Yet, when dealing with related hierarchies, as in this example, specifying them or not makes a world of difference.

The question remains – why there is this attribute relations mechanism in OLAP? Well, that might be a topic for some other article. For now let’s just say that it keeps the multidimensional space smaller and consistent. And although it behaves like a wormhole, it’s here to help. Without it, we might get lost out there.

Recently I delivered a session at local conference organized by Microsoft Croatia – WinDays¹º. The title of the presentation was: “DAX vs MDX vs T-SQL”.

As its name suggests, the presentation is oriented towards SQL and BI developers (for change) instead of Excel power-users and compares DAX calculations with T-SQL and MDX expressions. The idea behind, besides improving my T-SQL skills and learning DAX, was to build a mental model of what DAX functions and their combinations actually do by finding their best analogies in other languages. Since there are more than a hundred DAX functions, I’ve concentrated primarily on those that use relations. On one hand, they are probably the most difficult group of functions in DAX. On another, they should be the closest to SQL and BI developers’ mindset.

If you’re wondering why T-SQL and MDX, the reason is simple. PowerPivot uses in-memory column-oriented relational database technology and was developed by SSAS team (together with Excel team). “Relational” means we have every reason to compare DAX with T-SQL. And since SSAS team was involved, MDX comes naturally. Hereby I wish to emphasize that the comparison shown in the presentation was based on imaginary or potential analogies in T-SQL and MDX and not the real implementations built into PowerPivot engine. In the same manner, the summary highlights showing strengths and weaknesses of each language is based on personal impressions. Regardless of that, I believe the presentation might be of great help for many.

The presentation (translated in English) is published on my blog including all the related files. You can find it here.

Looking forward to your feedback.

 

Average among siblings, as a calculated measure, is typically used to compare the result of each member in a group where the group is formed from his siblings. In case of a known hierarchy, the core expression is relatively easy.

(
[<Dimension name>].[<Hierarchy name>].CurrentMember.Parent,
 
[Measures].[<Measure name>]
)
/
Count(
      NonEmpty(
               [<Dimension name>].[<Hierarchy name>].CurrentMember.Siblings,
               [Measures].[<Measure name>]
              )
     )

<> represent placeholders.

The expression says – take the parent’s value and divide it by the count of siblings that have results (only those, not all of them). Of course, we should add testing for zero as well as testing for empty values. Also, we could use sum of siblings instead of parent’s value because at least one member has no parent (root member). Or we should test whether current member has a parent and if not, then only use the sum of siblings version of expression. Might be faster on large dimensions. For simplicity, we won’t cover it here, not to complicate too much.

iif(
    IsEmpty( [Measures].[<Measure name>] ),
    null,
    iif(
        Count(
              NonEmpty(
                       [<Dimension name>].[<Hierarchy name>].CurrentMember.Siblings,
                       [Measures].[<Measure name>]
                      )
             ) = 0,
        null,

       Sum(
           [<Dimension name>].[<Hierarchy name>].CurrentMember.Siblings,
           [Measures].[<Measure name>]
          )
        /
       Count(
             NonEmpty(
                      [<Dimension name>].[<Hierarchy name>].CurrentMember.Siblings,
                      [Measures].[<Measure name>]
                     )
            )
        )
    )

But if the requirement is to make that expression work for any hierarchy (no matter whether that dimension only or for any hierarchy in a cube), we have to reach for the Axis() function.

In case of Excel 2007 as a front-end, the following code should be put into MDX script. Only one thing will be dependant on your cube and that is the first calculated measure. There you should provide a measure for calculation of average among siblings. In this code, Sales Amount from Adventure Works 2008 was used, but as I said, any measure will do. The rest of the code is cube independent, meaning, it will work in any cube and for any hierarchy.

Create Member CurrentCube.[Measures].[Original Measure] AS
    [Measures].[Sales Amount]
;

Create Member CurrentCube.[Measures].[Where are measures in Excel] AS
    case
        when NOT IsError(Extract( Axis(0), Measures ).Count) then 0
        when NOT IsError(Extract( Axis(1), Measures ).Count) then 1
        else -1
    end
;

Create Member CurrentCube.[Measures].[Where are non-measure rows] AS
    iif(
         NOT IsError(Axis(1).Count) AND
             [Measures].[Where are measures in Excel] < 1,
         1,
         iif(
              [Measures].[Where are measures in Excel] <> 0 AND
              NOT IsError(Axis(0).Count),
              0,
              null
            )
       )
;

Create Member CurrentCube.[Measures].[Average Measure] AS
    iif( IsEmpty([Measures].[Original Measure]),
         null,
         iif( IsEmpty([Measures].[Where are non-measure rows]),
              [Measures].[Original Measure],
              iif( Axis([Measures].[Where are non-measure rows]).Item(0).Item(0)
                       .Hierarchy.CurrentMember.Level.Ordinal = 0,
                   [Measures].[Original Measure],       
                   Sum(Axis([Measures].[Where are non-measure rows]).Item(0).Item(0)
                           .Hierarchy.CurrentMember.Siblings,
                       [Measures].[Original Measure])
                   /
                   Count(NonEmpty(Axis([Measures].[Where are non-measure rows]).Item(0).Item(0)
                                      .Hierarchy.CurrentMember.Siblings,
                                  [Measures].[Original Measure]))                               
                 )
            )
       )
, FORMAT_STRING = ‘Currency’
;

As you can see, we define a calculated measure Original measure to be used throughout the other part of the code. Next, we test where Excel 2007 has put the measures. They could be on rows, columns or slicer (WHERE part). Excel uses only 2 axes and a slicer, doesn’t use axis(2) aka pages like OWC.

We test by counting measures extracted from an axis and testing whether that produces an error. If so, measures are not there. If they are not on rows or columns, then they must be in slicer. When we have only one measure, Excel puts that measure in slicer. When there are more measures, they are either on rows or columns, depending on other hierarchies in query.

Once we know where the measures are, we test where is a hierarchy that we need an average for. Although it seems sometimes that hierarchy is on one particular axis (let say rows), Excel tricks us with display and generates MDX with that hierarchy on columns for example. Not always, but sometimes. That’s why we need to test where that hierarchy is. Btw, we’re interested in the first hierarchy that appears on rows, that is the mode of work. And we return the average among siblings result for its members.

The next calculation follows. If there is something on rows in query and measures are on columns or slicer, then we have the position of our hierarchy (it is positioned on rows). If the measures are not on columns but something else is (that’s the scenario where Excel swaps axes), then take position 0 as the place where our hierarchy is. As said, hierarchy is actually on columns although visually it looks like it’s on rows. In all other scenarios take null as result.

Finally, we come to the main expression. As explained in the simple example above, we must test whether the original measure was empty. If so, we must provide the same for our calculated measure not to include that row in result when NON EMPTY was used on axes. Then we test where that hierarchy of choice is. In case of null, which means there is no hierarchy in the query other than measures, we provide the original value since the average of a single member is the same member’s result. In the main part of this expression we test for root member and provide the same value (orig
inal value) since that’s an average for him. In all other cases the expression comes down to sum of siblings over count of siblings with result (hence NonEmpty() part). Without handling empty members, our average would be false.

In the end, we take care of format string and test if it works.

That’s it.

Here’s an example of it in action.

image

The article was inspired by this thread.

A short remark for those of you who read my blog and happen to be in the neighborhood this week.

I’ll be attending Sinergija 09 conference in Belgrade, Serbia, which starts on Tuesday, 17th of November and ends on Thursday, 19th of November, 2009. While I’m there, I’d love to meet some of you working on Microsoft BI platform and have a chat or two regarding that. The best chance to find me is, of course, one of the rooms with SQL Server topic. However, that’s not guaranteed, because there are few other areas (here’s the Agenda) that look tempting too. Anyway, the picture in my profile might help in recognition.

See you there.

Kratka obavijest za one koji prate moj blog i nalaze se ovoga tjedna negdje u blizini.

Ove godine posjećujem Sinergija 09 konferenciju u Beogradu, koja počinje 17.11., a završava 19.11.2009. godine. Volio bih upoznati neke od vas koji se bavite Microsoft BI platformom i razmijeniti riječ ili dvije na tu temu dok sam tamo. Najveća šansa da me nađete je, naravno, u nekoj od prostorija s temom SQL Server. Ipak, to nije garancija, jer neke od ostalih tema (evo rasporeda predavanja) zvuče više nego interesantno. U svakom slučaju, fotografija u mom profilu pomoći će u raspoznavanju.

Vidimo se tamo.

 

This year I’ll be volunteering at PASS Summit. Birds-of-a-Feather (BOF) Lunch is one of events you can find me at. Whether you want to discuss MDX or just meet me in person, search among 50 tables and join me for a chat. The event will take place on Tuesday, 3rd of November, between 11:45 and 12:45 (Expo Hall 4B).

Additionally, the company I work for, SoftPro Tetral, is one of exhibitors. If you’re interested in seeing our CubePlayer in action or just to have a chat with whoever is there at that moment (me included), feel free to drop by, either during breaks or Exhibitors Reception (Tuesday, 3rd of November 3, 6:00pm-8:00pm, Exhibit Hall 4A). We’re located in bottom-right corner on the map.

 

On 2nd of September, I watched a marathon of SQL Server sessions named “24 Hours of PASS”. At least one small part of it. One session that made me think was Steve Jones’ Building a better blog. I wanted to learn few tricks about blogging in order to make mine a better place. What surprised me was Steve’s advice to blog about everything you do, in order to show others what topics you covered so far, regardless whether it was covered by someone else before or not. Now, I simply couldn’t agree with that. For me, writing a blog article, and by that I don’t mean a short announcement or similar, is writing about something others haven’t done so far or at least, not in that way. It’s about creating something new, a challenge to convert ideas in practice and demonstrate it in public. Otherwise, what’s the point in duplicate content? Then again, maybe I just don’t understand the point in having a blog.

And so I thought to myself, surely there must be several types of bloggers. One, for instance, that act as messengers, concentrating mostly on what’s new and what’s considered to be best practice. News and knowledge aggregators, we might call them. Then, there are occasional practitioners, people who don’t blog often and rarely do so regarding news. When they blog, it’s mostly about something they’ve done recently and found interesting to share. Next, there are regular bloggers, who post weekly if not often, regardless of what, just to keep the pace. And so on. Which makes me think. Could we perform a real-life analysis on this topic? Could we analyze bloggers writing about Business Intelligence? Why not, I know a good place where this kind of information can be obtained. Our precious – SSAS-Info.com portal, hosted by Vidas Matelis.

STARTING THE PROJECT

Although the strategic decision to go in this project was done at the same time when the idea came to me, the tactical decision followed a month later.

I mean, I saw this right from the start as a community project, this being merely its initial phase. A project where we all could participate, explore possibilities, share the findings and project’s know-how and have fun at the same time. Because this time it would not be about Northwinds, Foodmarts or AdventureWorks, this time it would be about us. Just imagine, us being inside a cube. Not as analysts, but the one being analyzed for change. Feels weird, yet tempting.

Considering all circumstances, it had to be done before PASS Summit 2009, because that’s a great place to gather, at least some of us will be there, and discuss the findings and further possibilities. Therefore, upon receiving emergency note from tactical management, the operational team had to act fast. And so I did.

I’ve contacted Vidas who kindly sent me CSV dump of his site. One big table containing all the records, like articles, news, faq, etc. Here’s 5 for cooperativeness!

CSV - 1st export

Since my virtual environment wasn’t ready at the moment, I started in Excel 2007 and managed to parse, clean, distill and consolidate the data using Excel functions, formulas and copy-paste technique. Like a real BA :-) . I also transposed some data (tags) into M2M model using Query. Details from that experience in some other article, perhaps. I’m glad I did it in the end. Says enough?

Before going any further I’ve sent it back to Vidas. And I recommend it as a good practice for future users of PowerPivot – once they get what they want, loop-back to IT in order to clean/consolidate data at the very source. Don’t be mistaken, data will always be dirty. Yes, I know, it’s an extra step, but that way data becomes better and better for analysis. Love thy brother thou Gemini.

He gladly cleaned it following my findings inside. At the same time my virtual environment was ready, so I moved to Excel 2010 and Gemini add-in.

BI bloggers - source

I couldn’t make a model directly from the same Excel holding several tables for analysis, therefore, I saved that Excel and used it in a new one as an ODBC datasource.

Connecting went smoothly. I couldn’t see the data, just headers, but I assumed it had something to do with being CTP still (and perhaps something else). I just ignored that and returned to Excel, add a new pivot, filled it with some fields and the data was there. It works!

PERFORMING THE ANALYSIS

I made Four Charts view (under Pivot dropdown) and started to search for interesting and meaningful results (and also the ones where I’m in chart yet it doesn’t look suspicious :-) ). Finally, I settled with this. What do you think?

Gemini analysis of BI bloggers

I connected the orange Section slicer to charts on left (colored orange too). That slicer separates different part of SSAS-Info site, articles being the most fun par of it for analysis (hence selected). The upper radar graph shows the quantity of articles per category. Three most blogged categories (as seen in the radar graph) are: MDX, Management and Design. The bottom bar graph shows Top 10 bloggers.

The blue slicer, Category, represents topics being blogged about. Those top 3 are selected manually in order for the blue stacked bar graph to show their total as well as their share for each of Top 10 bloggers in those topics. As you can see, almost all of Top 10 bloggers are among Top 10 bloggers in these categories too. Vidas Matelis and Teo Lachev focus their attention primarily on other subjects, while Mi
crosoft and I blog above average in top 3 categories. Blue graph is linked to both orange and blue slicers and supports multi-select quite beautifully, because of stacked bar graph being used here.

Green pie graph is linked to green Author slicer (and orange slicer too). Here multi-select is not recommended, because this graph is not designed for that. However, it shows a distribution of categories for an author. Chris Webb, seen in this beautiful graph, appears as a true multi-vitamin for consultancy because of a wide variety of interests he blogs about and their nice distribution.

Which brings us to my initial dilemma. Using this graph it is relatively easy to analyze how versatile one chooses to be in his blog (because of categories he decides to write about). Mosha Pasumansky, me and many other BI bloggers are inclined to one or two areas. Partially because most of us are under 10 posts, which is a relatively small number compared to a number of categories.

The same analysis can be done in radar graph, either by hooking the orange radar graph to Author slicer, or by changing the bottom-right graph type from pie to radar, since it is already connected as required. That shows us some BI bloggers like Carl Rabeler and Dave Fackler have square in radar graph (focus on 4 categories). Many of Microsoft employees have a more or less wide wedge, with SSAS 2008 being the longest side. Other bloggers have triangles, etc.

Chris Webb, Vidas Matelis, Thomas Ivarsson, Dan English and others show more balanced pie covering many categories. And so on. And so on.

Btw, having so many celebrities in my article (not knowing about it too), I feel like Michael Jackson shooting Liberian Girl video. And I apologize in advance for not mentioning all of you (~150). Don’t worry, I have a plenty of ideas for the sequel ;-)

DRAWING THE CONCLUSION

Having seen many of these charts, I conclude that my assumption from the beginning of this article is true. Patterns do exist. Therefore, they should be analyzed.

My comparison with vitamins might not be bad at all. I mean, we both help the society. Next, each of us is targeting one or more areas. Finally, each of us, like vitamins from different vendors, have more or less better perception/absorption. That is, what we write about and how we write about it. Yet, unlike vitamins, we have the option to evolve.

My (not first, but second) impression regarding PowerPivot is that it is envisioned as a very pleasant analytical environment, featuring slicers as the most impressive component of it.

Although the same analysis could have been done in Excel 2007 or in SSAS 200*, I’ve chosen Gemini because I participate in CTP program, haven’t posted anything about it so far and wanted to learn. And what better way than practicing by setting up a task and then fulfilling it. Incrementally, this being the first step.

As for in-memory cube generation is concerned, that part is still to come. I made some tests, but that’s not it. A careful reader should remember that my standards prevent me from posting unless it is something valuable or unique. In CTP3 perhaps.

  

WHERE TO NEXT?

This experiment showed interesting results. At least to me. And it pointed out one important thing – data is everywhere! Yet it’s not being analyzed, just collected. Opportunities are numerous. Business Intelligence, at least its technology part, enables everyone in gaining fast insight into data, encouraging the creative process which then leads to better decision-making and actions.

Opportunities are numerous here too. This was only a small portion of what can be done. I see this project as a potentially flourishing area for the whole community where each of us could cover his area of interest. Btw, this is just one of them I have on my mind, and I intend to start them when the time comes.

For example, this data is static (captured a week ago). One possible project would make it come alive, enabling real-time (or daily) analysis (although weekly could be fine too).

The next one is Time Analysis. Records in the database behind SSAS-Info portal don’t have original article’s date but rather a date/time of when it was manually entered into database. I’ve omitted that in this example, I might make another one soon. Now, that difference means we have uncertainty of a couple of days for more recent items, and unpredictable offset for older items. Those dates could be collected and merged into DW.

Then there are numerous data-mining possibilities. For example, clustering of authors in order to obtain clustered groups based on categories or tags. We could also classify authors as nighthawks and daylight writers. Or by regularity in their posts (regular, occasional), day in week affinity, etc, etc.

This could also be a nice case for data-cleaning experts, to show their strength and ideas in that field, in Excel as well as on SQL Server platform. And perhaps by comparing good and bad sides of each path.

Finally, there are M2M challenges not to be neglected. All together, a nice playground for the upcoming CTP3.

Yes, I know what you’re thinking. OK, fine, no problem. Feedback would however be very appreciated.

If, on the other hand, you are interested in participating in this project, Vidas and me have no objection that you use raw or prepared data, which btw can be found in below.

 

Feel free to contact us regarding cooperation, advices, anything.

And again, don’t be shy, give thumb up or down, feedback, anything. Let me know what you’re thinking. That way I can make another analysis. Just kidding :-)

PS: A small notice to Redmond team – the same idea could be applied to MSDN forum. Extending the text-mining of tags and posts recently introduced there, if I’m not mistaken.

Tomorrow, on 29th of October, I’ll be holding an unplugged session named “OLAP” for local SQL&DEV user group. The term “unplugged” is literal – there will be no laptops, slides or demos. Just a (hopefully) good intro and discussion, with few key points to take away in the end. All together, an hour of specific subject, following the practice of last month’s gathering.

The event will take place in Microsoft’s building in Zagreb (Croatia) at 6 PM.

Beside a regular monthly newsletter, there’s also an announcement on BUG Online (web portal of local IT magazine) that I’ve managed to find regarding this event.

Sutra, 29. listopada, održat ću unplugged predavanje na temu OLAP-a za SQL&DEV user grupu. Ono “unplugged” je doslovno – nema laptopa, nema slajdova, nema demoa. Samo, nadam se, dobar uvod, diskusija i naravno, nekoliko istaknutih stvari, da ostanu u sjećanju. Dakle, jedan sat na odabranu temu, slijedeći dobru praksu prošlomjesečnog druženja.

Događaj će započeti u 18 sati u prostorijama Microsoft Hrvatske, Turinina 3 u Zagrebu.

Osim redovnog mjesečnog e-letka, postoji i obavijest na Bug Online portalu o ovom događaju.

 

Creating a Bit array (also known as Bitmap) is an useful technique when analyzing data. Especially (but not necessarily) in case of low cardinality (small number of distinct values in column compared to the number of records in a table). For example, a column having two distinct values (male/female or yes/no status) could be optimized in storage and represented using values 0 and 1 only (bits).

A collection of such bits forms a bit array or a bitmap. Bitmap indexes use those arrays performing bitwise operations on them.

Almost all DSS and analytical systems use bitmap indexes, and so does Analysis Services. An example where it can be seen is SQL Server Profiler. Query Subcube event represents the bitmap of hierarchies and attributes being used in a query. Yet, what’s fine for computers might not be for humans, hence there’s also Query Subcube Verbose event which explains that in plain English. Kind of.

Although SSAS engine intensively uses them, bitwise operations in MDX are in short supplies. In spite of that, the same idea can be borrowed and turned in action using string functions. After all, an alternate name for bit array is bitstring.

Recently I’ve participated in a case where a measure was expected to return the percentage of parent no matter which hierarchy is placed on an axis and no matter how many of them are there. Actually, a similar problem appeared on MSDN SSAS forum a year ago too. While the initial problem in this latter case was solved using special solution, general solutions emerged also. The offer ranged from heavy string manipulation to recursion and stored procedure. Taking a look at it now, from the perspective of this case, I believe bitmap indexes might have been an alternate MDX solution, a relatively fast approach that doesn’t require anything but a query and few calculated members aside.

Here’s a case study featuring the bitmap index as the key point in obtaining the general solution.

SITUATION

The idea was to have a general solution for percentage (ratio to parent). One calculated measure covering all scenarios, meaning, whatever combination of hierarchies appears in OWC (cube browser).

The problem that soon appeared was this – no matter how much improved the calculation in each iteration I posted in forum was (a previous thread and my blog article on that subject), there was always a case where it didn’t fit. Therefore I started focusing on getting as detailed explanation as possible. When I finally got it (as an Excel sheet) and analyze it, deduced the algorithm and describe it like this (as in integrams):

  •  
    • if there was a non-root member in the rightmost hierarchy, show ratio to parent
    • root member in the rightmost hierarchy combined with all non-root members from other hierarchies to the left should result in 100%
    • if all hierarchies were on root member, the result should be 100%
    • in all other situations, ratio to parent for non-root member should be the result

The idea in integrams (I couldn’t link for them in English, it’s an enigmatic game) is to mark certain events that hold true and other that can never be such. By doing so in iterations (reading the initial text over and over again, every time in new context), one should be able to deduce the final combination of entities in a puzzle and thereby solve it. The idea here was to list all possible combinations using a bitmap, where 0 represents non-root member, 1 root member and their sequence, well, …the sequence of hierarchies on axis. Translated, that would look like this (asterisk being any value):

CASE BITMAP VALUE REQUIRED RESULT
**0 0, 2, 4, 6, … (even numbers) parent of 3rd
001 1 parent of 3rd (self), special case for 100%, per request
011 3 parent of 1st
101 5 parent of 2nd
111 7 parent of 1st, special case for 100%

 

Now, see how this looks more like a solvable case? And so it was. The only thing necessary at this stage is to code it using MDX expressions in the most optimized way.

SOLUTION

A general MDX for this could be far more simpler than the one below. However, the requirement was that it should work in OWC and OWC has its specifics when it comes to MDX, about which I blogged here. Therefore, I had to make a lot of additional things, like testing where OWC put measures and everything related to that in later calculations. Nevertheless, I managed to make it work as required.

Create Member CurrentCube.[Measures].[Where are measures in OWC] AS
case
when NOT IsError(Extract( Axis(0), Measures ).Count) then 0
        when NOT IsError(Extract( Axis(1), Measures ).Count) then 1
        when NOT IsError(Extract( Axis(2), Measures ).Count) then 2
        else -1  -- should not be possible
end
    , Visible = 0
;

Create Member CurrentCube.[Measures].[Ratio measure] AS
-- specify a measure you want to have a ratio for, or
-- use default measure, or
-- use dynamic expression, that selects    -- either first measure on an axis or
-- the default measure in case no measure is selected

    /* -- option 1 */
    [Measures].[Sales Amount]
    /* -- option 2
    [Measures].DefaultMember
    */
    /* -- option 3
      iif(Extract( Axis( [Measures].[Where are measures in OWC] ),
                   Measures ).Item(0).Item(0).UniqueName =
          '[Measures].[Universal ratio %]',
          [Measures].DefaultMember,
          Extract( Axis([Measures].[Where are measures in OWC]),                   Measures ).Item(0)
         )
    */
    , Visible = 0
;

Create Member CurrentCube.[Measures].[Test Empty] AS
    IsEmpty( [Measures].[Ratio measure] )
    , Visible = 0
;

Create Member CurrentCube.[Measures].[Count of columns] AS
    iif( IsError( Axis(1).Count ),
         0,
         iif( [Measures].[Where are measures in OWC] = 1,
              Axis(0).Item(0).Count,
              Axis(1).Item(0).Count )
       )
    , Visible = 0
;

Create Member CurrentCube.[Measures].[Bitmap Index as String] AS
    iif( [Measures].[Where are measures in OWC] = 1,
        Generate( Head(Measures.AllMembers,                       [Measures].[Count of columns] ) AS L,
                  -(Axis(0).Item(0).Item(L.CurrentOrdinal - 1)                    .Hierarchy.CurrentMember.Level.Ordinal = 0)
              ),
        Generate( Head(Measures.AllMembers,                       [Measures].[Count of columns] ) AS L,
                  -(Axis(1).Item(0).Item(L.CurrentOrdinal - 1)                    .Hierarchy.CurrentMember.Level.Ordinal = 0)
          )
     )
    , Visible = 0
;

Create Member CurrentCube.[Measures].[Reversed Bitmap Index as String] AS
    iif( [Measures].[Where are measures in OWC] = 1,
        Generate( Head(Measures.AllMembers,                       [Measures].[Count of columns] ) AS L,
                  -(Axis(0).Item(0).Item([Measures].[Count of columns] -
                     L.CurrentOrdinal)                    .Hierarchy.CurrentMember.Level.Ordinal = 0)
              ),
        Generate( Head(Measures.AllMembers,                       [Measures].[Count of columns] ) AS L,
                  -(Axis(1).Item(0).Item([Measures].[Count of columns] -
                     L.CurrentOrdinal)                    .Hierarchy.CurrentMember.Level.Ordinal = 0)
              )
       )
    , Visible = 0
;

Create Member CurrentCube.[Measures].[Rightmost non root position] AS
    InStr([Measures].[Reversed Bitmap Index as String], '0')
    , Visible = 0
;

Create Member CurrentCube.[Measures].[Inner 100%] AS
    CInt([Measures].[Bitmap Index as String]) = 1
    , Visible = 0
;

Create Member CurrentCube.[Measures].[Universal ratio %] AS
    iif( [Measures].[Test Empty],
         null,
         iif( [Measures].[Rightmost non root position] = 0 OR
              [Measures].[Inner 100%],
              1,
              iif( [Measures].[Where are measures in OWC] = 1,
                   -- check division by zero
                   iif( ( Axis(0).Item(0)                          .Item([Measures].[Count of columns] -
                           [Measures].[Rightmost non root position])                          .Hierarchy.CurrentMember.Parent,
                          [Measures].[Ratio measure] ) = 0,
                        0,
                        [Measures].[Ratio measure]
                        /
                        ( Axis(0).Item(0)                          .Item([Measures].[Count of columns] -
                           [Measures].[Rightmost non root position])                          .Hierarchy.CurrentMember.Parent,
                          [Measures].[Ratio measure] )),
                    -- check division by zero
                    iif( ( Axis(1).Item(0)                           .Item([Measures].[Count of columns] -
                            [Measures].[Rightmost non root position])                           .Hierarchy.CurrentMember.Parent,
                           [Measures].[Ratio measure] ) = 0,
                           0,
                           [Measures].[Ratio measure]
                           /
                           ( Axis(1).Item(0)                             .Item([Measures].[Count of columns] -
                              [Measures].[Rightmost non root position])                             .Hierarchy.CurrentMember.Parent,
                             [Measures].[Ratio measure] ) )
                 )
            )
       ),
    FORMAT_STRING = "percent"
;

 

The first measure detects axis that has measures when using OWC.

The second measure is our ratio measure. That is, the measure we want to calculate the ratio for. I provided the syntax for several scenarios. The first one, of course, is the obvious and easiest one – using any measure we want and hard-coding it there. The second option is more general one – the use of cube’s default measure. Meaning, it could work for any cube. Finally, the third and the most complex option is to use the first measure on axis, whatever it might be. That’s even more general solution. In that scenario, it is up to an end-user to put a measure of his choice as the first measure and to see ratios based on that measure. It should work in any cube/report/query. The calculation features a handle for no measures at all (except that ratio measure) in which case the default measure is taken.

Next we test whether the original, ratio measure is empty. We’ll use the result of this in the following calculations, in order to preserve empty rows.

Count of columns gives us count of hierarchies being placed on an axis of interest (that is rows or columns, depending on where measures are not).

Next thing we do is generate a bitmap, and right next to it – a reversed bitmap. Innermost logical test returns either –1 or 0 (True or False), while minus in front of it turns -1 into 1. Looping is performed on a set of measures (which might be a week point – it is up to BI designer to provide something better in case there are only few measures). From which axis is bitmap formed? Depends on measures position in OWC. Additionally, in the reversed bitmap, the reversed order is obtained as it is usually being done in MDX – using a combination of two MDX functions: Count and CurrentOrdinal.

Why do we needed reversed bitmap? It becomes obvious once we come to the next calculated measure – Rightmost non root position. Analysis should be performed backwards, from right to left. That is the order in which we search patterns in bitmaps and therefore we needed reversed bitmap (string search is left based). This new measure returns us the position of the first zero value. In the table featuring 5 distinct cases, those positions are: 1, 2, 3, 2, 0.

Next measure is a Boolean type measure that detects a special case (see the table above) and that special case is 001 combination, where, per request, the value should be 100%.

Finally, we come to the last and the only visible measure – Universal Ratio %. There, the first thing we should do is test for emptiness of original ratio measure. Then only we handle the cases in our table. After covering cases with 100% result, based on where the measures are, we enter one branch and there test denominator for zero. Finally, we perform the ratio calculation. Only by now it is in utmost relative form, covering wide varieties of cases with only one expression (notice the use of Rightmost non root position measure inside the innermost calculation). That’s the power and beauty of having and using bitmap indexes.

UniversalPercentageInOWC

BENEFITS

It’s worth reminding again that the final formula looks complicated only because of two reasons: OWC is being used as a front-end and, there were special cases in the bitmap table. Without them, the calculation would come down to always necessary checks for emptiness and division by zero, following by the core ratio expression. In case you’re using a client that keeps measures on the axis you’ve dropped them on in your pivot, you can consider stripping down all unnecessary calculations or part of it, as mentioned in the beginning of this article.

I also made tests with real bitmaps. Meaning, I tried to use numbers, 2 powered to N, where N was the number of hierarchies on axis. Yet that turned out to be not only a dead-end (because the reverse operation is not easy), but also time consuming, prolonging the query time significantly. String operations on the other hand were very fast and manipulations with them very easy. Relatively, of course.

All in all, bitmaps and their power in being flexible pointers proved to be an interesting approach in providing universal MDX solutions, calculations that, did I mention it before, work in any cube.

© 2010 Tomislav Piasevoli Suffusion WordPress theme by Sayontan Sinha