Sharding Introduction
Sharding is a method for storing data across multiple machines.Sharding, or horizontal scaling, by contrast, divides the data set and distributes the data over multiple servers, or shards. Each shard is an independent database, and collectively, the shards make up a single logical database.
Sharding in MongoDB
Components in Sharded Cluster :
- Shard
- Query Routers
- Config Servers
- Shards store the data. To provide high availability and data consistency, in a production sharded cluster, each shard is a replica set .
- Query Routers, or mongos instances, interface with client applications and direct operations to the appropriate shard or shards. The query router processes and targets operations to shards and then returns results to the clients. A sharded cluster can contain more than one query router to divide the client request load. A client sends requests to one query router. Most sharded clusters have many query routers.
- Config servers store the cluster’s metadata. This data contains a mapping of the cluster’s data set to the shards. The query router uses this metadata to target operations to specific shards. Production sharded clusters have exactly 3 config servers.
Data Partitioning
MongoDB distributes data, or shards, at the collection level. Sharding partitions a collection’s data by the shard key.
A shard key is either an indexed field or an indexed compound field that exists in every document in the collection. MongoDB divides the shard key values into chunks and distributes the chunks evenly across the shards.
- Range Based Sharding
- Hash Based Sharding
- Customized Data Distribution with Tag Aware Sharding
- For range-based sharding, MongoDB divides the data set into ranges determined by the shard key values to provide range based partitioning.
- For hash based partitioning, MongoDB computes a hash of a field’s value, and then uses these hashes to create chunks.
- For Customized Data Distribution, MongoDB allows administrators to direct the balancing policy using tag aware sharding
Facts of Data Partitioning
- Range Based Sharding
- supports more efficient range queries
- router can easily determine which chunks overlap that range
- route the query to only those shards that contain these chunks
- uneven distribution of data(Unbalanced)
- Hash Based Sharding
- even distribution of data
- expense of efficient range queries
- Hashed key values results in random distribution of data across chunks
- every shard in order to return a result
- will not be able to target a few shards
- Customized Data Distribution with Tag Aware Sharding
- Administrators create and associate tags with ranges of the shard key
- assign those tags to the shards
- the balancer migrates tagged data to the appropriate shards
- cluster always enforces the distribution of data that the tags describe
- tag aware sharding serves to improve the locality of data for sharded clusters that span multiple data centers
Balancing In Data Distribution
Data distribution balancing has more importance within the cluster ,such as a particular shard contains significantly more chunks than another shard or a size of a chunk is significantly greater than other chunk sizes.
- Splitting
- Balancer
- Splitting
- Splitting is a background process that keeps chunks from growing too large
- MongoDB splits the chunk in half when beyond a specified chunk size
- Splits are an efficient meta-data change
- To create splits, MongoDB does not migrate any data or affect the shards
- Balancer
- Balancer is a background process that manages chunk migrations
- Balancer can run from any of the query routers in a cluster
- If there’s an error during the migration, the balancer aborts the process leaving the chunk unchanged on the origin shard
- MongoDB removes the chunk’s data from the origin shard after the migration completes successfully
More details of Sharding Methods For MongoDB are available on official site on mongodb.
Comments
Post a Comment