Published on Feb 22,2017
875 Views
Email Post

 

When the data is so large, it can’t be stored and scaled in a single machine. It can be too expensive to store exponentially growing data in a single machine. Moreover, as the size of data increases, data storage in a single machine may not provide an acceptable read and write throughput.

What is Sharding?

Sharding is the process of storing data records across multiple machines. It provides support to meet the demands of data growth. It is not replication of data, but amassing different data from different machines. Sharding allows horizontal scaling of data stored in multiple shards. With Sharding, we can add more machines to meet the demands of growing data and the demands of read and write operations. The more machines you add, the more read and write operations your database can support.

Why do we need Sharding?

  • In replication, all writes go to master node. The master node is latency sensitive.
  • Each of the single replica set has the limitation of 12 nodes
  • The memory can’t be large enough when the active data set is large enough. There’s a limit up to which main memory can be increased.
  • The local disk is not big enough to store the large amount of data.
  • Verticle scaling is too expensive, e.g. RDBMS

Sharding Architecture

There are number of replica sets in a MongoDB cluster, each of which contains 3 or more mongod nodes. There are multiple shards within the clusters. Mongos communicate with each of the Shards, and the App server in turn communicates with the query router, Mongos. This way the data is partitioned.

For example, if there are 6 million employee documents, they can’t be stored in a single machine as there is a limit to its storage capacity, and read and write throughput. In such a case, Sharding helps in storing and managing data across multiple shards. If data is to be horizontally divided across the 6 shards, based on the employee id of each employee, every shard will have 1 million employee ids. This way, the large set of data can be easily scaled.

Got a question for us? Mention them in the comments section and we will get back to you. 

Related Posts:

MongoDB: The Database for Big Data Processing

Why Big Data Professionals need to Learn MongoDB?

Real World Use Cases of MongoDB

Learn MongoDB

About Author
edureka
Published on Feb 22,2017

Share on

Browse Categories

Comments
0 Comments