Livio/ October 17, 2018/ Power BI, Power Query/ 0 comments

Power Query Table.FuzzyJoin and Table.FuzzyNestedJoin

With the October 2018 release, the team at Power BI introduced two new functions in Power Query Table.FuzzyJoin and Table.FuzzyNestedJoin. These two functions essentially do the same thing, the only difference being that the former function adds a new column to table1 where each ‘cell’ represents a table containing the found matches whereas the latter function performs a join SQL-like by adding the matching rows as new columns.

For this to work, you need to go into your Power BI options and enable it since it is still in preview:

Power Query Table.FuzzyJoin

Power Query Table.FuzzyJoin

These new functions resemble their cousins Table.NestedJoin and Table.Join except for one addition argument called jointOptions which is passed in the form of a Record. The purpose of these functions is to join table1 with table2 based on key1 and key2 (key1 and key2 can either be a text value representing the column name from each table on which the join is performed or a list of text values if you wish to join the two tables on multiple columns). The joinKind will determine how the join is performed, you have several options:

FullOuter, Inner, LeftAnti, LeftOuter, RightAnti, RightOuter

The function syntax is as follows:

What makes this function shine is the new last argument. This argument, as anticipated before, is a record which can contain the following fields:

  1. ConcurrentRequests – This option determines how many parallel threads can be used to perform the join operation and can be any number from 1 to 8
  2. Culture – Allows the matching to occur based on different cultures
  3. IgnoreCase – Previously the matching was always case sensitive and you couldn’t have matched Dog with DOG. Now, by setting this option to true, you can allow Dog and DOG to be matched
  4. IgnoreSpace – If set to true, this will tell the function to ignore spaces present within key1 and key2, therefore ApplePie will be matched with Apple Pie.
  5. NumberOfMatches – With this option you can limit the number of matches returned. If, for instance, Apple from table1 has 20 matching rows in table2, by setting this argument to 5, you can limit the matching rows to 5, so not all are returned
  6. Threshold – This option I believe utilizes special algorithms. It can be any decimal number from 0 to 1, the more it is closer to 1 the more the matching criteria moves toward exact matching. For instance, you could have Grapes matching with Graes if you set it below 0.9
  7. TransformationTable – This has to be a table which contains two columns named “From” and “To”. This table represents some sort of mapping. If you had one item in table1 whose value was Banana and another item in table2 whose value was Ban3n6 and you wanted a match to occur between these two items, you could map them inside the TransformationTable by adding Banana in the ‘From’ column and Ban3n6 in the ‘To’ column. 

Examples

For these examples I will be using the three tables below:

Table1:

Power Query Table.FuzzyJoin

Power Query Table.FuzzyJoin

Table2:

Power Query Table.FuzzyJoin

Power Query Table.FuzzyJoin

MapTable:

Power Query Table.FuzzyJoin

Power Query Table.FuzzyJoin

I added each table as a new query named Table1, Table2 and MapTable. For each example I will add a blank query and write the function with different arguments.

Example Number 1 (Ignore Case):

As I said earlier, before the matching was case sensitive and Dog would not have matched with DOG. You can see this from the below query:

Power Query Table.FuzzyJoin

Power Query Table.FuzzyJoin

As you can see, Apple is being matched with all its corresponding rows in Table2, Kiwi is not being matched with K1w1 and Kiwe. Bok Choy is not being matched with BokChoy and Pineapple does not get matched with Pine apple, Pin3Appl3 and PINEAPPLE.

Now, look at how the result will change with the below query:

Power Query Table.FuzzyJoin

Power Query Table.FuzzyJoin

Pineapple gets matched with PINEAPPLE. Terrific!

Example Number 2 (Ignore Space):

Still, Pineapple and Pine Apple, Bok Choy and BokChoy are not getting matched. No problem, we can adapt our query to:

Power Query Table.FuzzyJoin

Power Query Table.FuzzyJoin

Pineapple and Pine Apple, Bok Choy and BokChoy finally match! Great! Let’s keep going!

Example Number 3 ( NumberOfMatches ):

Let’s say I do not want to return all the matching rows of Apple, I only want the first 2. No problem, this function has you covered:

Power Query Table.FuzzyJoin

Power Query Table.FuzzyJoin

 

Example Number 4 (Threshold):

Now, let us try to play with the threshold and see if Power Query can match more items for us. Let’s start moving it towards zero, currently it is 1.

PowerQuery FuzzyJoin

What? Kiwi and Kiwe now match! Banana and Ban3n3 match as well! Incredible what these Microsoft guys can do…

 

Example Number 5 (TransformationTable):

We still have the problem of Kiwi not being matched with K1w1 though and we may want to provide some other custom matching the algorithm cannot solve for obvious reasons. It is time to introduce my favorite argument, TransformationTable. As I pointed out earlier, this table is to contain two columns ‘From’ and ‘ To’ which represent a custom mapping you want to apply during the matching. For this to work properly, I found that you need to lower the threshold and set it less than 1 (exact match). Let’s add this final argument to our query and see how it changes:

Power Query Table.FuzzyJoin

Power Query Table.FuzzyJoin

Now, thanks to the table we provided as mapping, everything is matching and we can go ahead with our data shaping. What do you think about this new function? I find it amazing!

 

 

Leave a Comment

Your email address will not be published. Required fields are marked *

*
*