Skip to main content
Microsoft Idea

Data Factory

Completed

Please enable lakehouse schema to reflect changes in dataflows

Vote (23) Share
Daniel Scott's profile image

Daniel Scott on 25 Jun 2023 17:58:01

Hello Fabric team,


Currently when a change is made to a Dataflow Gen2 (such as adding a new column), then this new schema is not picked up in the destination settings using the Lakehouse option.


This means the only option is to delete the destination table in the lakehouse and then added again from scratch - which would have downstream knock-on effects. This is unworkable and renders Dataflows Gen2 only useful if the schema never changes.


I would like to see a similar behaviour as in Dataflows Gen1, where the destination schema (CDM) reflects any changes.


Thank you.


Administrator on 21 Mar 2024 22:02:57

We just released an update to Output Destinations functionality in Dataflows Gen2 that should address this request:  Dataflows Gen2 data destinations and managed settings | Microsoft Fabric Blog | Microsoft Fabric

Managed settings for new tables

When loading into a new table, by default the automatic settings are on. Using the automatic settings, dataflows gen 2 manages the mapping for you. This will allow you the following behavior:

  • Update method replace: Data will be replaced at every dataflow refresh. Any data in the destination will be removed. The data in the destination will be replaced with the output data of the dataflow.
  • Managed mapping: Mapping is managed for you. When you need to make changes to your data/query to add an additional column or change a data type, mapping is automatically adjusted for this when you republish your dataflow. You do not have to go into the data destination experience every time you make changes to your dataflow, allowing you for easy schema changes when you republish the dataflow.
  • Drop and recreate table: To allow for these schema changes, on every dataflow refresh, the table will be dropped and recreated. Your dataflow refresh will fail if you have any relationships or measures added to your table.

NOTE: currently this is only supported for Lakehouse and Azure SQL database as data destination.

Comments (3)
Daniel Scott's profile image Profile Picture

on 13 Apr 2024 10:40:13

RE: Please enable lakehouse schema to reflect changes in dataflows

Great to see improvements on this :D Still, I think this seems like a huge limitation: "Your dataflow refresh will fail if you have any relationships or measures added to your table."

Daniel Scott's profile image Profile Picture

Dean Evans on 14 Dec 2023 11:34:41

RE: Please enable lakehouse schema to reflect changes in dataflows

Update: When your query schema changes (e.g. add or delete new column ) before clicking 'Publish' to lakehouse, go into data destination settings (cog bottom right), click 'Next' into 'Choose Destination Target' - and this is the critical bit - ensure 'New table' is selected (even though you know that this table already exists in your Lakehouse). Ensure your destination lakehouse and table name are unchanged, click 'Next' which should display a message to the effect of 'your schema has changed', (you might need to check the box next to your new column(s) to include them in the schema) then save and publish. Please note this method does not work on my historical lakehouse tables - only new ones (as at 13th Dec 2023). To get this to work on your older tables you will need to delete them from the lakehouse, re-publish them from your Gen2 dataflow, then from that point on you should be good for future schema changes.

Daniel Scott's profile image Profile Picture

Dean Evans on 06 Nov 2023 10:43:09

RE: Please enable lakehouse schema to reflect changes in dataflows

Great idea Daniel, although I suspect this feature is implemented but not working as expected (bug). When a dataflow gen2 query schema changes, if you re-do the destination table part before hitting publish, Fabric 'sees' the new column if you select an existing column dropdown (thereby overwriting the existing column with the new columns detail - obviously of little practical worth) - but there's no way to make this new column appear as an addition to the schema (i.e. show up as it's own line entry).