May 152010
 

Boyan Penev recently blogged about a strange behavior which manifests when ordering two related sets thereby mentioning a thread in SSAS forum answered by Deepak Puri. I found this intriguing and decided to reply directly, but the explanation kept growing, so I finally cut everything out and put it as an article on my blog instead. After all, this way my chances of being enlisted in his blog roll should increase 8-).

Here’s the problem in short. The first query works, the second doesn’t. The difference? The second query has related hierarchies while the first one doesn’t.

WITH
SET [RowSet]    AS [Product].[Subcategory].Levels(1).Members
SET [ColumnSet] AS [Date].[Calendar].Levels(1).Members
SELECT
     Order( [RowSet], [ColumnSet].Item(0), BDESC ) ON ROWS,
     [ColumnSet]                                   ON COLUMNS
FROM [Adventure Works]

 

WITH
SET [RowSet]    AS [Product].[Subcategory].Levels(1).Members
SET [ColumnSet] AS [Product].[Category].Levels(1).Members
SELECT
     Order( [RowSet], [ColumnSet].Item(0), BDESC ) ON ROWS,
     [ColumnSet]                                   ON COLUMNS
FROM [Adventure Works]

 

Now, some of you may have already guessed this has to be a case of attribute relations and yes, I admit you’re right. Nevertheless, I think an analysis would be beneficial, after all Boyan asked for it.

First, let’s make a small but important distinction. Those queries look like they have been written by a T-SQL developer. Why am I saying it? Because in T-SQL, ORDER is typically (although not necessarily) based on a column, be it numeric, date or string. A set being returned is ordered based on the values of that column.

In MDX, ORDER() is based on an expression which means that the second argument is automatically converted to a tuple (ColumnSet.Item(0)), and then a value of it: (ColumnSet.Item(0)).Value. Let’s see how those two MDX queries might look like in T-SQL.

use AdventureWorksDW2008;

with
t as
(
select
    SubCategory = s.EnglishProductSubcategoryName,
    CalendarYear = d.CalendarYear,
    ResellerSales = sum(f.SalesAmount)
from dbo.DimProductSubcategory s
    join dbo.DimProduct p
        on p.ProductSubcategoryKey = s.ProductSubcategoryKey
    join dbo.FactResellerSales f
        on f.ProductKey = p.ProductKey
    join dbo.DimDate d
        on d.DateKey = f.OrderDateKey
group by
    s.EnglishProductSubcategoryName,
    d.CalendarYear
)
, pvt as
(
select * from t
pivot( sum(ResellerSales)
       for CalendarYear in ( [2001], [2002], [2003], [2004], [2006] ) ) as pvt
)
select * from pvt
order by
— sort A
[2001] desc;
— sort B
/*
(select sum(ResellerSales) from t
        where t.SubCategory = pvt.SubCategory and
              t.CalendarYear = 2001) desc;
*/

 

with
t as
(
select
    SubCategory = s.EnglishProductSubcategoryName,
    Category = c.EnglishProductCategoryName,
    ResellerSales = sum(f.SalesAmount)
from dbo.DimProductSubcategory s
    join dbo.DimProduct p
        on p.ProductSubcategoryKey = s.ProductSubcategoryKey
    join dbo.FactResellerSales f
        on f.ProductKey = p.ProductKey
    join dbo.DimProductCategory c
        on c.ProductCategoryKey = s.ProductCategoryKey
group by
    s.EnglishProductSubcategoryName,
    c.EnglishProductCategoryName
)
, pvt as
(
select * from t
pivot( sum(ResellerSales)
       for Category in ( [Bikes], [Components], [Clothes], [Accessories] ) ) AS pvt
)
select * from pvt
order by (select sum(ResellerSales) from t
          where –t.SubCategory = pvt.SubCategory and
                t.Category = ‘Accessories’
         ) desc;

 

They return exactly the same result as those two MDX queries. Btw, who says MDX is not elegant and easy?

Now, let’s analyze them.

The first query is ordered by column [2001]. We said that MDX uses expressions when sorting, so we’ll try to do the same using an expression. We can comment [2001] DESC and un-comment what’s below “sort B”, but the result won’t change.

In the second query I deliberately skipped sorting by a column (ORDER BY [Accessories] DESC) and used the version with an expression in order to get the false result. The column version would yield the correct, expected result. While we might have thought sorting in MDX behaves that way, the truth is it doesn’t. MDX has some specifics and we need to use the expression version in order to show them.

Important thing to notice is that in both queries there’s a join which binds subcategories (disregard it being commented in the second query at the moment, act like it’s still there). We can imagine this as being the first phase in setting the condition of those correlated sub-queries.

The next phase would be to expand this by specifying the argument of the Order() statement in MDX. In case of a Date dimension, that was year 2001. Since that is a different dimension, nothing has changed in our initial condition.

In the second query, there was a related hierarchy (same dimension) which came after the initial phase, so it forced everything so far to adjust to it (which btw might be seen as a third phase). You can imagine this third phase as limiting the join further by adding conditions like this:

and exists (select ProductCategoryKey from dbo.DimProductCategory
            where EnglishProductCategoryName = pvt.SubCategory)

 

When the related hierarchy is “below” the current one, the initial join could be considered as removed (instead of bringing this extra EXISTS which makes no sense if you analyze what’s inside). Therefore I commented it in the query. In fact, I manually did what SSAS engine implicitly does to joins from the initial phase when a related hierarchy comes in the second phase and when the relation is 1:N, or in OLAP terms – below granularity.

When it is “above”, the initial join is just more limited (now it makes sense to use EXISTS).

In DAX it would be something like the ALL() function (for analogies between MDX, T-SQL and DAX I suggest reading my previous blog article).

Err, one more thing, the join is based on keys rather than names, but for simplicity reasons I haven’t included them in queries.

Now let’s get back to OLAP and it’s multidimensional space.

Cube’s space is N-dimensional which means each coordinate should be N-di
mensional too. SSAS formula engine fills the missing parts using the current members of non-mentioned hierarchies. But who are they?

Based on members explicitly specified in a coordinate, the engine implicitly (the way a trigger fires in T-SQL) finds members on related hierarchies. They become current members of those hierarchies. For non-mentioned dimensions (not hierarchies), the engine picks current members which again are “current” because of some previous context set by an expression, function, slicer or the cube itself.

In the first example year 2001 sets the context of that expression. All the hierarchies of Date dimension implicitly shift their current members accordingly (which is not important in this case, but let it be known). Since the Product and others dimension were not mentioned in that expression, we traverse to previous contexts (backwards) in order to find their current members. The Filter() function sets the immediate previous context. For Product dimension, Product.Subcategory hierarchy to be precise. – How come? – Well, Filter() is sort of a loop, which means we iterate over members in its set (the first argument) which in turn become “current” in that expression, one by one. Other dimensions are not important in this case but the process of identifying current members might be visualized likewise.

In the second example Accessories category sets the context. Its presence automatically shifts Product.Subcategory current member to a related position. – Where? – To its root member. – Why? – Because relation is 1:N. If it were N:1, it would shift to a single related member. Since it can’t find one, it shifts to its root member which is a way of saying “I can not find a direct relation to a single member, therefore I’ll move to a place where I don’t stand in a way, a neutral position that doesn’t change the (explicit) coordinate, though complies to the obligation of me being present in the coordinate”.  – Let me think it through a bit … OK, I see. But how come the current member of Product.Subcategory hierarchy didn’t stay too? I mean, he was in Filter() function too, as the first argument. – Correct, but they are not treated the same. – Why’s that? – Layers! – What? – Calculation layers. In form of a stack. On top there’s only Accessories because only that was explicitly there in the expression and that expression is the last thing evaluated in the Filter() function. – Why is that the last thing? – Well, first you need to set a loop. Then you evaluate the expression for each member, right? As you would do in programming, loop is outside, assignments and calculations are inside the loop. Therefore, Product.Subcategory set (and its current members in a loop) is one layer below in this stack. Whoever is NOT on top, or shall we say BELOW the layer being currently evaluated might be implicitly shifted. Might, because it depends whether he’s related or not. If he is, as in the second example, he will shift, but if he isn’t as in the first example, nothing will happen. He’ll come unchanged. In turn, he might determine who’ll be shifted among his “relatives” in layers below his. And so on. The winner takes it all approach among “relatives” and the winner is the “relative” being explicitly in the coordinate, on top of the others. – Oh, I see. Current and lower layers you say … Wait, I got an idea! In order to make them equal, we should bring them on the same layer, right? – Exactly! – And how do you suggest we should do that? – The expression. That determines the top layer in this case so both of them should be there. – Ok, but how? – Simple, by providing a tuple. We should expand the existing condition with .CurrentMember of the first set.

Here are both the false and the correct MDX for the second query, to emphasize their differences.

WITH
SET [RowSet]    AS [Product].[Subcategory].Levels(1).Members
SET [ColumnSet] AS [Product].[Category].Levels(1).Members
SELECT
     Order( [RowSet], [ColumnSet].Item(0), BDESC ) ON ROWS,
     [ColumnSet]                                   ON COLUMNS
FROM [Adventure Works]
 

WITH
SET [RowSet]    AS [Product].[Subcategory].Levels(1).Members
SET [ColumnSet] AS [Product].[Category].Levels(1).Members
SELECT
     Order( [RowSet], ( [ColumnSet].Item(0),
                      
[RowSet].CurrentMember ), BDESC ) ON ROWS,
     [ColumnSet]                                          ON COLUMNS
FROM [Adventure Works]

 

Forcing the current member is like saying “I want my current member to stay and not get implicitly converted to other members. I’m forcing this coordinate and I know what I’m doing, return me the value of this explicitly requested intersection (aka coordinate)!”.

Now, return to the second T-SQL query and un-comment the join. You’ll get the correct result because this time you explicitly forced the join. Or shall we say you prevented this join from disappearing in T-SQL just as you prevented the current member in MDX from changing because of attributes relations in that dimension. Interesting?

Finally, it wouldn’t hurt to specify the same tuple in the first MDX query too. I mean, the result would not change, they would stay correct. So, if you, the reader of this article, are searching for the expression which works in any scenario, use the one with the tuple.

Important thing to remember. In MDX you can almost always omit current members in your expressions. Yet, when dealing with related hierarchies, as in this example, specifying them or not makes a world of difference.

The question remains – why there is this attribute relations mechanism in OLAP? Well, that might be a topic for some other article. For now let’s just say that it keeps the multidimensional space smaller and consistent. And although it behaves like a wormhole, it’s here to help. Without it, we might get lost out there.

May 032010
 

Recently I delivered a session at local conference organized by Microsoft Croatia – WinDays¹º. The title of the presentation was: “DAX vs MDX vs T-SQL”.

As its name suggests, the presentation is oriented towards SQL and BI developers (for change) instead of Excel power-users and compares DAX calculations with T-SQL and MDX expressions. The idea behind, besides improving my T-SQL skills and learning DAX, was to build a mental model of what DAX functions and their combinations actually do by finding their best analogies in other languages. Since there are more than a hundred DAX functions, I’ve concentrated primarily on those that use relations. On one hand, they are probably the most difficult group of functions in DAX. On another, they should be the closest to SQL and BI developers’ mindset.

If you’re wondering why T-SQL and MDX, the reason is simple. PowerPivot uses in-memory column-oriented relational database technology and was developed by SSAS team (together with Excel team). “Relational” means we have every reason to compare DAX with T-SQL. And since SSAS team was involved, MDX comes naturally. Hereby I wish to emphasize that the comparison shown in the presentation was based on imaginary or potential analogies in T-SQL and MDX and not the real implementations built into PowerPivot engine. In the same manner, the summary highlights showing strengths and weaknesses of each language is based on personal impressions. Regardless of that, I believe the presentation might be of great help for many.

The presentation (translated in English) is published on my blog including all the related files. You can find it here.

Looking forward to your feedback.