Here are some important ETL restrictions and best practices to remember when working with big datasets (10M+ rows) in Power BI:
ETL Restrictions:
Memory Restrictions: Both Power BI Desktop and Power BI Service have memory limits, and big datasets may result in crashes or sluggish performance.
Data Refresh Times: It can take a while to perform complete refreshes on large datasets.
Limitations of Query Folding: Complex Power Data may need to load completely into memory if query steps disrupt query folding.
Requirements for Incremental Refresh: Complete dataset refreshes are required without Premium or PPU licensing for incremental refreshes.
Model Size Restrictions: The maximum size of a Power BI dataset is 400 GB for Premium and 1 GB for Pro, depending on licensing.
Transformation Complexity: Power Query's extensive transformations cause refresh times to lag and resource consumption to rise.
Top Techniques:
ETL Push to Source/SQL: Perform extensive filtering and transformations on the SQL views or source system.
Reduce refresh time by using incremental refresh, which loads only new or modified data.
Optimize the data model by using a star schema design, cutting down on cardinality and eliminating superfluous columns.
Steer clear of complicated calculations in Power Query: whenever feasible, perform calculations using DAX.
Monitor and Adjust: To find bottlenecks, use Query Diagnostics and Performance Analyzer.
Divide Big Tables: To effectively manage data volumes, use composite models or aggregations.