Skip to main content
Microsoft Idea

Power BI

Under Review

"R" Don't remove duplicates

Vote (87) Share
Gregory J. Deckler's profile image

Gregory J. Deckler on 21 Apr 2016 00:52:48

Provide the option of not removing duplicates automatically when creating R visualizations or provide the ability to create R datasets using the same syntax as shown in the comments when creating an R visualization

Comments (16)
Gregory J. Deckler's profile image Profile Picture

George on 05 Jul 2020 22:53:58

RE: "R" Don't remove duplicates

I have no idea why this feature isn't standard behaviour - it's trivial in R to remove duplicates from a dataset if that behaviour is desired

Gregory J. Deckler's profile image Profile Picture

Jo Varney on 05 Jul 2020 22:51:46

RE: "R" Don't remove duplicates

I want to use R to create a histogram. I add one column, and then it removes all the duplicates, which provides a completely inaccurate histogram. This is pretty stilly - and potentially problematic if someone uses this without noticing.

Yes, I can add extra columns, or create an ID column, but I don't want to. I don't want the program to remove duplicates, just because it sees fit. There are times when it isn't appropriate - and as the analyst I want that choice.

Also, I want to write the simplest code, and that should involve only one column for a histogram.

Gregory J. Deckler's profile image Profile Picture

Earl Glynn on 05 Jul 2020 22:46:21

RE: "R" Don't remove duplicates

When linking to an SQL Server Analysis Services database cube, I don't directly have access to the right keys that make records unique to block duplicates from being removed. Therefore, when using cubes it may not be possible to get accurate data in R in some cases. Many statistics/visualizations are worthless when duplicates have been removed. Can someone explain why removing duplicates was ever a good idea?

Gregory J. Deckler's profile image Profile Picture

Earl Glynn on 05 Jul 2020 22:45:03

RE: "R" Don't remove duplicates

I shouldn't have to add an ID or key field to get all the data. If I want to remove duplicates in R, it's trivial with the "unique" statement.

Gregory J. Deckler's profile image Profile Picture

Boris on 05 Jul 2020 22:40:04

RE: "R" Don't remove duplicates

The workaround here is to add "ID column" to the data (don't use it in R script)

Gregory J. Deckler's profile image Profile Picture

Gregory J. Deckler on 05 Jul 2020 22:19:58

RE: "R" Don't remove duplicates

The comments of an R visualization show:
#dataset <- data.frame(Column)

However, I cannot use the same syntax to create my own data frame.