Skip to main content
To setup mongo sharding environment we should be ideally having the below setups

* A Config Databases Replica Set of 3 or more members
* Multiple Shard Database Replica Sets
* Multiple Mongos Database instances.

To set up all the config , shard and mongos instances on a single linux server ( for a real production scenario all should be on different machines) follow the below steps:

1. Create the folders cfg0 cfg1 cfg2 for storing config database replica sets
mkdir cfg0 cfg1 cfg2

2. Create the folders a0,a1,a2,b0,b1,b2,c0,c1,c2,d0,d1,d2 for storing shards a,b,c,d replica sets
mkdir a0 a1 a2 b0 b1 b2 c0 c1 c2 d0 d1 d2

3.Start the config server instances
mongod --configsvr --dbpath cfg0 --port 26050 --fork --logpath log.cfg0 --logappend --replSet cfg
mongod --configsvr --dbpath cfg1 --port 26051 --fork --logpath log.cfg1 --logappend --replSet cfg
mongod --configsvr --dbpath cfg2 --port 26052 --fork --logpath log.cfg2 --logappend --replSet cfg

4. Initiate Replication in config servers
mongo -port 26050
>rs.initiate()
>rs.add("localhost:26051");
>rs.add("localhost:26052");
>rs.status()

5. Start the shard servers and mongos instances
# shard servers (mongod data servers)
# note : not to use smallfiles on production nor such small oplogsize
mongod --shardsvr --replSet a --dbpath a0 --logpath log.a0 --port 27000 --fork --logappend --smallfiles --oplogSize 50
mongod --shardsvr --replSet a --dbpath a1 --logpath log.a1 --port 27001 --fork --logappend --smallfiles --oplogSize 50
mongod --shardsvr --replSet a --dbpath a2 --logpath log.a2 --port 27002 --fork --logappend --smallfiles --oplogSize 50
mongod --shardsvr --replSet b --dbpath b0 --logpath log.b0 --port 27100 --fork --logappend --smallfiles --oplogSize 50
mongod --shardsvr --replSet b --dbpath b1 --logpath log.b1 --port 27101 --fork --logappend --smallfiles --oplogSize 50
mongod --shardsvr --replSet b --dbpath b2 --logpath log.b2 --port 27102 --fork --logappend --smallfiles --oplogSize 50
mongod --shardsvr --replSet c --dbpath c0 --logpath log.c0 --port 27200 --fork --logappend --smallfiles --oplogSize 50
mongod --shardsvr --replSet c --dbpath c1 --logpath log.c1 --port 27201 --fork --logappend --smallfiles --oplogSize 50
mongod --shardsvr --replSet c --dbpath c2 --logpath log.c2 --port 27202 --fork --logappend --smallfiles --oplogSize 50
mongod --shardsvr --replSet d --dbpath d0 --logpath log.d0 --port 27300 --fork --logappend --smallfiles --oplogSize 50
mongod --shardsvr --replSet d --dbpath d1 --logpath log.d1 --port 27301 --fork --logappend --smallfiles --oplogSize 50
mongod --shardsvr --replSet d --dbpath d2 --logpath log.d2 --port 27302 --fork --logappend --smallfiles --oplogSize 50


# mongos processes
mongos --configdb "cfg/localhost:26050,localhost:26051,localhost:26052" --fork --logpath log.mongos0
mongos --configdb "cfg/localhost:26050,localhost:26051,localhost:26052" --fork --logpath log.mongos1 --port 26061
mongos --configdb "cfg/localhost:26050,localhost:26051,localhost:26052" --fork --logpath log.mongos2 --port 26062
mongos --configdb "cfg/localhost:26050,localhost:26051,localhost:26052" --fork --logpath log.mongos3 --port 26063

6. Start each shard replica sets
mongo -port 27000
>rs.initiate()
>rs.add("localhost:27001");
>rs.add("localhost:27002");
>rs.status()

7.After starting all the shard replica sets, add the shards using mongos
mongos
>sh.addShard("a/localhost:27000");
>sh.addShard("b/localhost:27100");
>sh.addShard("c/localhost:27200");
>sh.addShard("d/localhost:27300");

>sh.status();
--- Sharding Status ---
  sharding version: {
        "_id" : 1,
        "minCompatibleVersion" : 5,
        "currentVersion" : 6,
        "clusterId" : ObjectId("4c90aebc081921bcddd87de8")
  }
  shards:
        {  "_id" : "a",  "host" : "a/localhost:27000,localhost:27001,localhost:27002",  "state" : 1 }
        {  "_id" : "b",  "host" : "b/localhost:27100,localhost:27101,localhost:27102",  "state" : 1 }
        {  "_id" : "c",  "host" : "c/localhost:27200,localhost:27201,localhost:27202",  "state" : 1 }
  active mongoses:
        "4.0.6" : 4
  autosplit:
        Currently enabled: yes
  balancer:
        Currently enabled:  yes
        Currently running:  no
        Failed balancer rounds in last 5 attempts:  0
        Migration Results for the last 24 hours:
                No recent migrations
  databases:

        {  "_id" : "config",  "primary" : "config",  "partitioned" : true }

To add a database as a sharded database :
mongos> sh.enableSharding("mydb")
{
        "ok" : 1,
        "operationTime" : Timestamp(1284550849, 4),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1284550849, 4),
                "signature" : {
                        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                        "keyId" : NumberLong(0)
                }
        }
}

Now if we check sh.status, we can see the database as being sharded and can check to which server it is sharded to :

  {  "_id" : "mydb",  "primary" : "b",  "partitioned" : true,  "version" : {  "uuid" : UUID("f35e426a-e540-439b-b9ad-7d3d042e17d2"),  "lastMod" : 1 } }


We can even shard a particular collection of a database:

mongos> sh.shardCollection("mydb.tester",{_id:1},true)
{
        "collectionsharded" : "mydb.tester",
        "collectionUUID" : UUID("b0027a1d-1e0c-4769-9b65-88eecb9808fc"),
        "ok" : 1,
        "operationTime" : Timestamp(1284551103, 10),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1284551103, 10),
                "signature" : {
                        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                        "keyId" : NumberLong(0)
                }
        }
}

Now if we check sh.status() , we can see which sharded server is the collection chunk placed

 mydb.tester
                        shard key: { "_id" : 1 }
                        unique: true
                        balancing: true
                        chunks:
                                b       1
                        { "_id" : { "$minKey" : 1 } } -->> { "_id" : { "$maxKey" : 1 } } on : b Timestamp(1, 0)


You can also check the shards in the sharded cluster using:

mongos> db.shards.find()
{ "_id" : "a", "host" : "a/localhost:27000,localhost:27001,localhost:27002", "state" : 1 }
{ "_id" : "b", "host" : "b/localhost:27100,localhost:27101,localhost:27102", "state" : 1 }
{ "_id" : "c", "host" : "c/localhost:27200,localhost:27201,localhost:27202", "state" : 1 }
{ "_id" : "d", "host" : "d/localhost:27300,localhost:27301,localhost:27302", "state" : 1 }

After we insert data into a collection, we can check to which shard the data has fallen by using getLastErrorObj

mongos> db.myCol.insert({a:-1})
WriteResult({ "nInserted" : 1 })
mongos> db.getLastErrorObj()
{
        "n" : 0,
        "singleShard" : "localhost:27200",
        "err" : null,
        "ok" : 1,
        "$clusterTime" : {
                "clusterTime" : Timestamp(1524570077, 611),
                "signature" : {
                        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                        "keyId" : NumberLong(0)
                }
        },
        "operationTime" : Timestamp(1524570077, 611)

}


Scatter/Gather

>db.myCol.createIndex({x:1})
>db.myCol.find({x:1}).explain()

The queries that use either shard keys , or a shard key prefix, are not scatter gather.They will be targeted only at those shards that contain the documents that will be returned by the query.
Example:
For collection people , shard key : { friends:1,name:-1}.Index are : {name:1,phoneNumber:1}.

db.people.find({friends:”Bob”,name:”Emily”}) is not scatter / gather query
db.people.find({name:”Alice”}) is scatter gather

Comments