Indeed, embedding drift may be a contributing factor, but in practical RAG systems, retrieval-quality deterioration problems typically lead to irrelevant retrieval upon scaling.
Retrieval issues that were undetectable at 1k chunks become apparent at 1M chunks as datasets get larger.
The Most Frequent Reasons for Scale RAG Retrieval Degradation
Chunking Strategy (Most Common) Breaks at Scale
Chunking techniques that are effective for small datasets frequently don't work for huge corporations.
For instance:
fixed 1,000-token blocks with no semantic borders, too little overlap, and improperly divided tables and code
Outcome:
Embedding vectors lose their discriminative power and become noisy.
In vector space, two unconnected pieces could get "close."
-
Semantically broad chunks predominate in symptoms
-
generic chunks are frequently retrieved
-
headers are retrieved rather than responses.
Fix
-
Make use of semantic chunking
-
Aware of paragraphs, sections, markdown, codes, and tables
-
Improved chunk sizes
-
Adaptive chunking > fixed chunking with 200–500 tokens for dense retrieval