SQL Essentials Training & Certification
- 7k Enrolled Learners
- Self Paced
When the data is so large, it can’t be stored and scaled in a single machine. It can be too expensive to store exponentially growing data in a single machine. Moreover, as the size of data increases, data storage in a single machine may not provide an acceptable read and write throughput.
Sharding is the process of storing data records across multiple machines. It provides support to meet the demands of data growth. It is not replication of data, but amassing different data from different machines. Sharding allows horizontal scaling of data stored in multiple shards. With Sharding, we can add more machines to meet the demands of growing data and the demands of read and write operations. The more machines you add, the more read and write operations your database can support.
There are number of replica sets in a MongoDB cluster, each of which contains 3 or more mongod nodes. There are multiple shards within the clusters. Mongos communicate with each of the Shards, and the App server in turn communicates with the query router, Mongos. This way the data is partitioned.
For example, if there are 6 million employee documents, they can’t be stored in a single machine as there is a limit to its storage capacity, and read and write throughput. In such a case, Sharding helps in storing and managing data across multiple shards. If data is to be horizontally divided across the 6 shards, based on the employee id of each employee, every shard will have 1 million employee ids. This way, the large set of data can be easily scaled.
Got a question for us? Mention them in the comments section and we will get back to you.