*per se*, I’ve focused on the event-based histograms and explained how to achieve them using a step-by-step approach. The inspiration for that article originated from this thread.

**Buckets**!

**INTRODUCTION**

**Bucketization**is a common process when dealing with data. Normally, you want to be able to perform analysis on every possible value of an entity, for example product, store, date, and their attributes as well. However, there are times when such a detailed analysis is either not advisable or not required. Loans, for example, are usually a huge dimension, customers also in some industries. Implementing an attribute based on the primary key of the respective dimension table can have serious consequences on the performance.

*bucketized*. Values are split into discrete regions, bands, groups or ranges, which are nothing but various phrases for the same. Since the number of buckets is smaller than the number of distinct values, manipulation is easier and performance vs storage ratio potentially better. Yet, these buckets still represent the data in satisfactory way, meaning, we don’t lose information about it, we just reduce the

*resolution of analysis*on that entity, since it suits us that way.

**Typical examples**can be found in Customer dimension (Yearly Income, Commute Distance) and Employee dimension (Base rate, Sick Leave Hours, Vacation Hours). Adventure Works, of course.

*virtual*buckets. Here’s how.

**BUILDING BUCKETS**

**1. Open**a copy of

**Adventure Works 2008 database in BIDS**. Adjust Deployment part of Project properties in order not to overwrite your original database. Specify

**Buckets**(or anything you like) for the name of the database.

**2. Open Adventure Works DW.dsv**. Add

**New Named Query**.

**Bucket**for name and then add this T-SQL script:

SELECT 0 AS KeyColumn, ‘Bucket 0’ AS NameColumn

UNION ALL

SELECT 1 AS KeyColumn, ‘Bucket 1’ AS NameColumn

UNION ALL

SELECT 2 AS KeyColumn, ‘Bucket 2’ AS NameColumn

UNION ALL

SELECT 3 AS KeyColumn, ‘Bucket 3’ AS NameColumn

UNION ALL

SELECT 4 AS KeyColumn, ‘Bucket 4’ AS NameColumn

UNION ALL

SELECT 5 AS KeyColumn, ‘Bucket 5’ AS NameColumn

UNION ALL

SELECT 6 AS KeyColumn, ‘Bucket 6’ AS NameColumn

UNION ALL

SELECT 7 AS KeyColumn, ‘Bucket 7’ AS NameColumn

UNION ALL

SELECT 8 AS KeyColumn, ‘Bucket 8’ AS NameColumn

UNION ALL

SELECT 9 AS KeyColumn, ‘Bucket 9’ AS NameColumn

UNION ALL

SELECT 10 AS KeyColumn, ‘Bucket 10’ AS NameColumn

**3. Build a new dimension**from that query. Name the dimension

**Bucket**. Use KeyColumn and NameColumn for the same attribute (one and only) and

**provide MemberValue**also (it should be KeyColumn as well).

**Order by key**.

**4. Add**that

**dimension**to the cube.

**Don’t link**to any measure group.

**5.**Go to

**Calculation tab**. Position cursor just

**below**the

*command.*

**Calculate;****Paste**this script:

Create

Member CurrentCube.[Measures].[Weighted Discount Buckets] As

Int( [Measures].[Discount Amount] /

[Measures].[Reseller Sales Amount] * 10 )

, Format_String = ‘#,##0’

, Associated_Measure_Group = ‘Reseller Sales’

, Display_Folder = ‘WA’;

Scope

(

[Bucket].[Bucket].[Bucket].MEMBERS

);

This = Aggregate( Existing [Promotion].[Promotion].[Promotion].MEMBERS,

iif( [Bucket].[Bucket].CurrentMember.MemberValue =

(

[Measures].[Weighted Discount Buckets],

[Bucket].[Bucket].[All]

),

(

[Measures].CurrentMember,

[Bucket].[Bucket].[All]

),

null

)

);

End Scope;

**6. Build and deploy**.

**7. Jump to Cube Browser**tab.

**Bucket dimension on rows**. Put

**Weighted Discount Buckets measure from WA folder**first, then couple of measures from

**Reseller Sales measure group**. I selected some having Sum aggregation, others being calculated, in order to show how buckets behave. You can also try to slice per country in Sales Territories dimension or any other related dimension – results adjust accordingly.

**EXPLAINATION**

**weighted average of a discount percentage**(which is Discount Amount over Reseller Sales Amount – we already have those two in our cube :-)). Discount percentage is related to a

**DimPromotion table**. These two queries show us that discounts can be found only in reseller sales process – there are 12 distinct promotions having various discount percentages ranging from 0 to 0.5 (0% – 50%).

select

PromotionKey, UnitPriceDiscountPct

from dbo.FactResellerSales

group by PromotionKey, UnitPriceDiscountPct

order by PromotionKey, UnitPriceDiscountPct

select PromotionKey, UnitPriceDiscountPct

from dbo.FactInternetSales

group by PromotionKey, UnitPriceDiscountPct

order by PromotionKey, UnitPriceDiscountPct

Our buckets will be discount percentages, grouped in bands of ten (0-9 = 0, 10-19 = 1, … 91 – 100 = 10). All together, **we need 10 members **in our utility dimension and that’s what initial named query does.

Since that dimension is not linked to any measure group, by bringing it in cube browser, we only see the same values per each member – total values. In order to have correct values per each member, we need to reach for MDX script and write proper expressions. The important thing to notice is that expressions should be written before any other statement in MDX script in order to allow all calculated measures (that follow) to be aggregated the way they’re suppose to (after our scope!). What once was solve order, is now the **order of statements in MDX script**.

The first thing we do is we **define** our **calculated measure that will be used to determine buckets**. It’s the same definition as in [Measures].[Discount Percentage] calculated measure, except we rounded the values and grouped them in 10 buckets. That measure can later be made hidden, if wanted so.

The following scope handles how each bucket is to be calculated per any measure – as **an aggregate of values across existing promotions where dimension bucket key is equal to bucket measure**. That expression is actually our **transformation of data**, which allocates value on buckets. It’s sort of a **projection of calculated bucket measure on bucket dimension**. That’s how it can be visualized and understood. Inside, **it is important to refer to the root member of bucket dimension**, because it’s the only member that has data before transformations get applied. That’s why we used tuples.

**CONCLUSION**

We reached the end of this article. Let’s summarize it one more time.

**It is possible to make analysis of cube data based on dynamically calculated measure/buckets**. The solution is to build an utility dimension with predetermined number of members (buckets) and then assign a value for each bucket. Values are assigned using scope statement where the key of the bucket is compared to the value of the calculation (calculated bucket measure). Effectively, that is a transformation of data from a measure to a dimension. Speaking of which, what better way to finish this article than to provide a vivid example.

We all love music and have our favorite media player, either a device or an application on a computer. Small and simple ones provide only basic operations like Play, Pause, etc. The other, more advanced feature multimedia database (music catalog, tracking the number of times a song is being played) and visualization (spectrum analyzer) among other things.

**Last time we’ve focused on **PlayedCount field, that is, on **the frequency of occurence of an event **(a song being played). A histogram of that data would show how many songs (height of a vertical bar) have been played each number of times (points on x axis). Not a very interesting thing, right? A drill-down/drill-through on top of that would be more interesting (which songs are played N times). But there wouldn’t be drill if it weren’t for histogram in the first place, to separate the data. Therefore, histograms are important, although not always used as the final result.

**In this article, we’ve focused on** the visualization. **Spectrum analyzer **to be precise. Those small panels where several vertical bands show how loud the bass, middle and treble are. Or more bands, depending on the number of them it displays (i.e. 100 Hz, 300 kHz, 1 KHz, 3 kHz, 10 kHz). Bands are ranges of frequency which display the amplitude for any given part of a song.

Well, did you know that a song can also be considered a small fact? The amplitude in that case is the measure. Dimensions? Well, there’s a Time dimension for sure. Others? Not likely. What about the Frequency dimension? Doesn’t exist there. It is however required to be present in the end result. How come? Well, bands are nothing but buckets of frequency range, where the number of them is predetermined in advance. The only thing that’s missing is the **calculation that projects the amplitude across buckets in spectrum analyzer**. Ring a bell?

FFT (Fast Fourier Transformation). How about it ?

Great. It works! Now, I’ll review the MDX. I look forward to applying these MDX buckets. Thank you, Tomislav.

Here are this and some other articles on Attribute Discretization:

http://ssas-wiki.com/w/Articles#Attribute_Discretization

Rating Dimension:

(ID, MainVariance, MaxVariance, Rating, StartDate, EndDate)

Rating Dimension values:

SELECT 1 AS ID, 0 AS MinVariance, 0.10 AS MaxVariance, 1 AS Rating, ’01/01/1900′ AS StartDate, ’06/06/2079′ AS EndDate FROM ApplDim

UNION

SELECT 2 AS ID, 0.11 AS MinVariance, 0.30 AS MaxVariance, 2 AS Rating, ’01/01/1900′ AS StartDate, ’06/06/2079′ AS EndDate FROM ApplDim AS ApplDim_5

UNION

SELECT 3 AS ID, 0.31 AS MinVariance, 1 AS MaxVariance, 3 AS Rating, ’01/01/1900′ AS StartDate, ’06/06/2079′ AS EndDate

FROM ApplDim AS ApplDim_4

UNION

SELECT 4 AS ID, 0 AS MinVariance, 0.20 AS MaxVariance, 1 AS Rating, ’01/01/2011′ AS StartDate, ’12/31/2011′ AS EndDate

FROM ApplDim AS ApplDim_3

UNION

SELECT 5 AS ID, 0.21 AS MinVariance, 0.50 AS MaxVariance, 2 AS Rating, ’01/01/2011′ AS StartDate, ’12/31/2011′ AS EndDate

FROM ApplDim AS ApplDim_2

UNION

SELECT 6 AS ID, 0.51 AS MinVariance, 1 AS MaxVariance, 3 AS Rating, ’01/01/2011′ AS StartDate, ’12/31/2011′ AS EndDate

——–

Created a set in Cube as “RatingSet” :

RatingSet

{

[Rating].[Rating].[Rating] *

[Rating].[End Date].[End Date] *

[Rating].[Max Variance].[Max Variance] *

[Rating].[Min Variance].[Min Variance] *

[Rating].[Start Date].[Start Date]

}

Calculation which needs to be made dynamic:-

IIF( ([Measures].[Variance Percent]>=0 and [Measures].[Variance Percent]0.10 and [Measures].[Variance Percent]=[Rating].[Min Variance].[Min Variance].members.Item(0)

and [Measures].[Ownership Variance Percent][Rating].[Min Variance].[Min Variance].members.Item(1)

and [Measures].[Ownership Variance Percent]<= [Rating].[Max Variance].[Max Variance].members.Item(1))

,[Rating].[Rating].[Rating].members.item(1)

,[Rating].[Rating].[Rating].members.item(2)

)

)

——

Now, How to make it dynamic as above???????

Any help will be appreciated.

Please suggest asap.

Many Thanks in Advance.

Calculation which needs to be made dynamic:-

IIF( ([Measures].[Variance Percent]>=0 and [Measures].[Variance Percent]0.10 and [Measures].[Variance Percent]=[Rating].[Min Variance].[Min Variance].members.Item(0)

and [Measures].[Ownership Variance Percent][Rating].[Min Variance].[Min Variance].members.Item(1)

and [Measures].[Ownership Variance Percent]<= [Rating].[Max Variance].[Max Variance].members.Item(1))

,[Rating].[Rating].[Rating].members.item(1)

,[Rating].[Rating].[Rating].members.item(2)

)

)

——

Now, How to make it dynamic as above???????

Any help will be appreciated.

Please suggest asap.

Many Thanks in Advance.

Hi Puneet, it’s not clear what exactly you need, so I suggest we continue this via e-mail, ok?

Tomislav/Puneet,

Where you able to resolve this because I have a similar problem and was wondering if you can share the solution. I’m using SSAS 2005 Standard Edition

No.

Calculation which needs to be made dynamic:-

IIF( ([Measures].[Variance Percent]>=0 and [Measures].[Variance Percent]0.10 and [Measures].[Variance Percent]<= 0.30)

,2

,3 –Rating

)

)

i.e. Variance Percent will be 0, 10, 30, 50 etc… and Rating will be 0, 1, 2 3 etc…

Tomislav,

is this solution edition specific? Because I cannot get this to work in SSAS 2005 Standard Edition. Just using it just as specified. My result is equivalent of a cross join. To try to get it to work I added Bucket dimension exactly as specified and worked it and Its still a cross join

As far as I know, it shouldn’t be edition specific. Maybe version specific, because of potentially different AW sample databases on 2005 and 2008. Can you verify that?

Hi Tomislav,

Thanks! Your solution works for me! this was exactly what I was looking for.

What I think I miss here is the ability to relate another mesure group to the bucket diminsion.

Lets say that i have one fact table with one mesure group – so far so good. What will happend if i creat another measure group from that fact for the distinct count measure. How can this measure “projected” on the buckets?

Thanks,

Arnon

Hi Arnon, I haven’t tested it, but I guess it should be the same if there is a common dimension in both measure groups (Promotion in my case). Have you tried that?

how to implementation Histogram using JavaScript?pleas help

I guess you just need to fire the corresponing MDX queries against the SSAS cube. First you need to modify the cube as explained in this article. Then, to communicate with your SSAS server, you will need to use XMLA. See Books Online for more info: http://msdn.microsoft.com/en-us/library/ms186604 .

Hi Tomislav, nice technique and thanks for sharing it.

I’m trying to apply it at scale (my equivalent of your Promotions dimension has 370,000 members). At query time it just chews up all the server RAM before failing with memory pressure errors. This happens even when the query is otherwise filtered to a small set of members. I can get the Calculated Measure back fairly quickly, but using the Buckets dimension is the trigger for the memory pressure error.

Do you have any ideas on how I could tune your MDX for this scenario? I’m imagining there is some magic NonEmpty function or similar I could apply to reduce the workload?

Thanks – Mike

I’ve got this performing to where it seems to be viable over my largish dimension. It runs a reasonable query (over a few hundred members) in around 10 secs, and can do the whole dimension in about 3mins.

The key seemed to be adding NonEmpty e.g Aggregate ( NonEmpty ( Existing …

I also added a Non_Empty_Behaviour property inside the SCOPE, against the driving measure, which seems very important for performance.

My buckets are irregularly shaped, so I used a Member Property to pass in the lower and upper boundary of each bucket.

Again, thanks for the tip!

Thank you for sharing your improvements.

Hi Tomislav,

Great method and presentation! Thanks for sharing.

This code works great for me but I feel that the names of the buckets, (“Bucket 1”, “Bucket2”, etc) are not very user friendly and don’t provide much information. I would like to provide the value ranges that each represents as its name. Since all of these values are dynamic based on the MDX query and the set involved, they can’t be hardcoded.

I’ve done some research and it looks like the undocumented MDX function “CreatePropertySet” may be the key. Chris Webb did something similar to this but I’m not sure if he’s doing the same thing as you.

http://cwebbbi.wordpress.com/2005/02/04/bucketisation/

http://www.sqlmag.com/article/quering/mdx-secret-code-revealed-16302

Thanks in advance,

Mick

I might have a look. I am not familiar with that function.

Thanks you.

Hi Tomislav,

I tried your solution and I almost got it working, but not quite.

I followed your solution and I guess that I’m doing something wrong somewhere along the way, but I can’t figure out what.

I’m trying to get an employee count by length of of employment, discretized by the above mentioned solution, but all I get is a value for the ‘All’-member of my buckets.

When I use the following statement:

“SELECT

[Measures].[Employee – Count] ON COLUMNS

, [Length Of Employment].[Length Of Employment].MEMBERS ON ROWS

FROM [myCube]”

, the result I get is the following:

Employee – Count

All 66

0Y – <2Y (null)

2Y – <4Y (null)

4Y – <6Y (null)

6Y – <8Y (null)

8Y – <10Y (null)

+10Y (null)

N/A (null)

I placed the calculated member and the scope script immediately after the CALCULATE as mentioned above, but apparently it doesn't to the aggregation as I expect.

I would expect a result like the following:

Employee – Count

All 66

0Y – <2Y 15

2Y – <4Y 20

4Y – <6Y 15

6Y – <8Y 10

8Y – <10Y 5

+10Y 1

N/A (null)

Can you share your thoughts on what I might be doing wrong?

Thanks,

Thomas

Hi Tomislav,

Great article, looks like something i am looking for.

But just can’t get it working. The screenshots in youre article dissappeared http 404 error. Maybe there is some thing I missed. Can you fix the screenshots..

Andrew

Hi Andrew, thank you for pointing to the problem with pictures. Looks like something went wrong with them over time. I’ll try to fix them.