Livio/ September 14, 2018/ Power BI, Power Query/ 0 comments

Slow Changing Dimensions

Imagine you have a simple data model like the one below composed of four tables. One fact table ( Sales ) and three dimension tables (Customers, Products, Dates).  If you wanted to analyze sales by the Country of Residence of each customer or by Marital Status, you may notice that it is not so straightforward. In fact, each customer may change either Marital Status or Country over time and simply replacing Country A with Country B in the Customers table may not be enough.

Slow Changing Dimensions

As an example, imagine you have Customer X that from 1st January 2018 to 31st January 2018 is living in USA and on February 1st moves to Canada. If on February 1st you go in the Customers table and replace USA with Canada and then try to analyze Sales by Country of Residence, all the purchases she has done in January will now be part of Canada instead of USA and you may get the wrong results. 

Slow Changing Dimensions

 

Slow Changing Dimensions allow us to deal with this kind of problem. In this kind of scenario one solution is add three extra columns to your Customers table. One Alternate Key column which now becomes the Primary Key of the table and that uniquely identifies each row, this can simply be sequential numbers. The other two columns would be “From” and “To” and identify the period in which a Customer had that particular set of Attributes (Country of Residence, Marital Status, Number of Children, Yearly Income ans so on). Each row can be thought as a version of each Customer during a certain time period. 

In our example, our customers table would become (notice that the “To” date for the latest version of a Customer is a very far Date, I used 1/31/2900. This will be clear when you see the PowerQuery code): 

Slow Changing Dimensions

The steps required to perform this modification to the Customers table may vary depending on how you are sourcing your data, how you are being notified about the changes etc. What can be shown here though is the changes required on the Sales table. In fact, now you will no longer be able to relate Sales and Customers by CustomerKey because this is not a Primary Key anymore in the Customers table. The modification can either be done via SQL while you are extracting your data or via PowerQuery. 

If you wanted to do it via SQL, the command would be something like:

SELECT s.customerkey,
           s.quantity,
           s.price,
           s.date,
           s.productkey,
           c.customeralternatekey
FROM   sales s
    INNER JOIN customers c
ON s.customerkey = c.customerkey
AND s.date BETWEEN c.from AND c.to

Whereas in Power Query you would do:

 

This will add an additional column to the Sales table:

Slow Changing Dimensions

Which you will use to create the relationship with the Customers table:

Slow Changing Dimensions

 

Now the breakdown by Country of Residence will yield the correct results:

Slow Changing Dimensions

As opposed to the wrong results below:

Slow Changing Dimensions

As you can see above, the latter pivot table totally ignores that some purchases were performed by customers living in China at the time of the purchase who then moved to another Country. 

 

Final Comments

It is worth going through all this process when the attributes changing over time of your dimension table are more than 1 or 2, otherwise it may be simpler to just store the country of residence within your fact table. Also notice that Slow Changing Dimensions are SLOW changing dimensions, which means that the attributes do not change very frequently: a customer can change Country once a year or Marital Status once each 10 years etc. If you have attributes that are changing rapidly, for instance once a day, then you are not dealing with Slow Changing Dimensions anymore and with this approach your dimension table could become exceptionally large.

Leave a Comment

Your email address will not be published.

*
*