To increase refresh efficiency and track changes (CDC) from SQL Server in your Power BI ETL workflow:
1. Turn on CDC for SQL Server Tables
On the source, SQL Server database and pertinent tables, enable Change Data Capture.
Change tables with metadata are where CDC keeps track of inserts, updates, and deletions.
2. Utilize Power Query's CDC Metadata
Instead of reading entire tables, create Power Query queries to read CDC change tables.
To obtain only newly added or altered rows since the last refresh, filter changes according to LSN (Log Sequence Number) or change time.
3. Use Power BI's incremental refresh feature.
Use the RangeStart and RangeEnd parameters to load only changes when combining CDC with an incremental refresh in Power BI.
Power Query reduces load by filtering source data only to include recent changes.
4. Monitor the Most Recent Update State
Keep the most recent CDC LSN or timestamp processed (for example, in a control table or parameters).
On the subsequent refresh, use this to filter the CDC data.
5. Deal with Deletes Although Power BI does not natively handle row deletions in incremental refresh, CDC meticulously tracks deletes.
To reconcile deletes, you might need to add custom logic to Power Query or SQL.
In summary, CDC speeds up refresh times and increases efficiency by reducing the amount of data loaded into Power BI through incremental refresh and intelligent filtering.