How can we improve Power BI?

Fuzzy Comparison function for similar names

For either a Power Query function or a new DAX function, we could use a fuzzy string compare to provide a score like 1 to 10 of the similarity of a string. I believe that SSIS has some functionality like this. For example we often has Company Names that we need to match to an external source and would like to get a score to compare names like "Wal-Mart", "WalMart","Wal*Mart", "Division or Wal*Mart", "Wal-Mart Stroes", etc. Does anyone else need this type of compare - or found a simple way to do it in Power Query?

257 votes
Sign in
(thinking…)
Sign in with: facebook google
Signed in as (Sign out)

We’ll send you updates on this idea

Robert Erickson shared this idea  ·   ·  Flag idea as inappropriate…  ·  Admin →

27 comments

Sign in
(thinking…)
Sign in with: facebook google
Signed in as (Sign out)
Submitting...
  • Mike Honey commented  ·   ·  Flag as inappropriate

    Another useful feature would be to be able to make some of the column pairs do an "Exact Match" while other column pairs use a "fuzzy match". For example "Exact Match" on ISO Country Code, then "fuzzy match" on Company Name/Address etc.

  • Sherry commented  ·   ·  Flag as inappropriate

    This would be useful for my work as well. Gathering company names from multiple databases and being able to determine likelihood of match with a fuzzy match metric would be most useful

  • Bryan75 commented  ·   ·  Flag as inappropriate

    I'm a Fuzzy Lookup add-in user and It would be awesome if you could implement it in PBI.
    A real game changer as my camarades said !

  • mathew commented  ·   ·  Flag as inappropriate

    Microsoft Research created Fuzzy Lookup as a Excel add-in, this was/is a very good set of functions for Excel. It does seems rather odd that Power Query (more so than DAX), does not have this ability.

    Please resolve sooner rather than later

  • Anonymous commented  ·   ·  Flag as inappropriate

    All - until such time as the code is made available -- here's something that should work:

    let
    Table1 = #table(type table [value = text],
    { {"spruce"}, {"cedar"}, {"birch"}, {"pine"}, {"poplar"}, {"aspen"},
    {"baobab"}, {"larch"}, {"willow"} }),

    Table2 = #table(type table [word = text],
    { {"sprce"}, {"cidar"}, {"pane"}, {"aspern"}, {"babab"} }),

    AddCol = Table.AddColumn(Table2, "Custom", each Table1),
    Expand = Table.ExpandTableColumn(AddCol, "Custom", {"value"}, {"value"}),

    FuzzyCalc = Table.AddColumn(Expand, "Prct", each
    2 * List.Count( List.Intersect(
    { Text.ToList([word]), Text.ToList([value]) } ) )
    / (List.Count(Text.ToList([word])) + List.Count(Text.ToList([value])))),

    SortRows = Table.Sort(FuzzyCalc,{{"word", Order.Ascending}, {"Prct", Order.Descending}}),
    AddIndex = Table.AddIndexColumn(SortRows, "Index", 1, 1),
    RemoveDupls = Table.Distinct(AddIndex, {"word"})
    in
    RemoveDupls

  • David Vinson commented  ·   ·  Flag as inappropriate

    Please add this feature in Power BI - define Column A and Column B and provide a likeness score as a percentage.

  • pato lobos commented  ·   ·  Flag as inappropriate

    Guys there are 2 packages in R that can be used to solve this. (yes I know, it will be better natively in M language)
    The first is called Stringdist, runs open R. The second one run on Microsoft Server R rxGetFuzzyDist https://docs.microsoft.com/en-us/r-server/r-reference/revoscaler/rxgetfuzzydist

    Hope it helps, but still it should be in Power Query as it is a basic function and a crucial one for everyday tasks...

  • John commented  ·   ·  Flag as inappropriate

    This would also help in name matching. John Paul Doe, John Doe, John P. Doe, J. Doe, etc...

  • Deepak commented  ·   ·  Flag as inappropriate

    I support.

    This feature would be handy when stiching up customer account names columns from different sources.

  • Lee commented  ·   ·  Flag as inappropriate

    I'm also keen for this one. It'd be great for quick adhoc projects where SSIS is overkill and Excel FuzzyMatch is a poor fit for the workflow.

  • Ariel Fenrnadez commented  ·   ·  Flag as inappropriate

    Please consider this. In medicine databases I always find similar text describing the same patologies. It would be great to score the findings, like a similiarity check reportó too

← Previous 1

Feedback and Knowledge Base

Ready to get started?

Try new features of Power BI today by signing up and learn more about our powerful suite of apps.