Monday, 27 January 2014

MongoDB Aggregate

Aggregations operations process data records and return computed results. Aggregation operations group values from multiple documents together, and can perform a variety of operations on the grouped data to return a single result. In sql count(*) and with group by is an equivalent of mongodb aggregation.

The aggregate() Method

For the aggregation in mongodb you should use aggregate() method.

SYNTAX:

Basic syntax of aggregate() method is as follows
>db.COLLECTION_NAME.aggregate(AGGREGATE_OPERATION)

EXAMPLE:



> db.democol.find().pretty()
{
        "_id" : ObjectId("52963c2a6f63f810a98b7a98"),
        "title" : "Learn mongo",
        "by" : "mani",
        "likes" : 100
}
{
        "_id" : ObjectId("52963ce96f63f810a98b7a99"),
        "by" : "subramanian",
        "comments" : [
                {
                        "user" : "Tiara",
                        "message" : "Worth to read"
                }
        ],
        "likes" : 20,
        "title" : "Learn SQL"
}
{
        "_id" : ObjectId("52a62f01fdaa40a1db551d04"),
        "title" : "Learn HTML5",
        "by" : "KGSM",
        "likes" : 240
}
{
        "_id" : ObjectId("52a62f3efdaa40a1db551d05"),
        "title" : "Learn Java",
        "by" : "Faulkner",
        "likes" : 209
}
{
        "_id" : ObjectId("52a62f61fdaa40a1db551d06"),
        "title" : "Basics of Life",
        "by" : "Peter",
        "likes" : 218
}
{
        "_id" : ObjectId("52a65a92fdaa40a1db551d07"),
        "title" : "Basics of Life",
        "by" : "Peter",
        "likes" : 318
}


1.

> db.democol.aggregate([{$group: {_id:"$by",likes:{$sum:"$likes"}}}])
{
        "result" : [
                {
                        "_id" : "Peter",
                        "likes" : 536
                },
                {
                        "_id" : "Faulkner",
                        "likes" : 209
                },
                {
                        "_id" : "KGSM",
                        "likes" : 240
                },
                {
                        "_id" : "subramanian",
                        "likes" : 20
                },
                {
                        "_id" : "mani",
                        "likes" : 100
                }
        ],
        "ok" : 1
}

2.

> db.democol.aggregate([{$group: {_id:"$by",likes:{$min:"$likes"}}}])
{
        "result" : [
                {
                        "_id" : "Peter",
                        "likes" : 218
                },
                {
                        "_id" : "Faulkner",
                        "likes" : 209
                },
                {
                        "_id" : "KGSM",
                        "likes" : 240
                },
                {
                        "_id" : "subramanian",
                        "likes" : 20
                },
                {
                        "_id" : "mani",
                        "likes" : 100
                }
        ],
        "ok" : 1
}

3.

> db.democol.aggregate([{$group: {_id:"$by",likes:{$max:"$likes"}}}])
{
        "result" : [
                {
                        "_id" : "Peter",
                        "likes" : 318
                },
                {
                        "_id" : "Faulkner",
                        "likes" : 209
                },
                {
                        "_id" : "KGSM",
                        "likes" : 240
                },
                {
                        "_id" : "subramanian",
                        "likes" : 20
                },
                {
                        "_id" : "mani",
                        "likes" : 100
                }
        ],
        "ok" : 1
}

ExpressionDescriptionExample
$sumSums up the defined value from all documents in the collection.db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$sum : "$likes"}}}])
$avgCalculates the average of all given values from all documents in the collection.db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$avg : "$likes"}}}])
$minGets the minimum of the corresponding values from all documents in the collection.db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$min : "$likes"}}}])
$maxGets the maximum of the corresponding values from all documents in the collection.db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$max : "$likes"}}}])
$pushInserts the value to an array in the resulting document.db.mycol.aggregate([{$group : {_id : "$by_user", url : {$push: "$url"}}}])
$addToSetInserts the value to an array in the resulting document but does not create duplicates.db.mycol.aggregate([{$group : {_id : "$by_user", url : {$addToSet : "$url"}}}])
$firstGets the first document from the source documents according to the grouping. Typically this makes only sense together with some previously applied “$sort”-stage.db.mycol.aggregate([{$group : {_id : "$by_user", first_url : {$first : "$url"}}}])
$lastGets the last document from the source documents according to the grouping. Typically this makes only sense together with some previously applied “$sort”-stage.db.mycol.aggregate([{$group : {_id : "$by_user", last_url : {$last : "$url"}}}])

MongoDB Sharding

Sharding

Sharding is the process of storing data records across multiple machines and it is MongoDB's approach to meeting the demands of data growth. As the size of the data increases, a single machine may not be sufficient to store the data nor provide an acceptable read and write throughput. Sharding solves the problem with horizontal scaling. With sharding, you add more machines to support data growth and the demands of read and write operations.

Sharding in MongoDB

Below given diagram shows the sharding in MongoDB using sharded cluster.
MongoDB Sharding
In the above given diagram there are three main components which are described below:
  • Shards: Shards are used to store data. They provide high availability and data consistency. In production environment each shard is a separate replica set.
  • Config Servers: Config servers store the cluster's metadata. This data contains a mapping of the cluster's data set to the shards. The query router uses this metadata to target operations to specific shards. In production environment sharded clusters have exactly 3 config servers.
  • Query Routers: Query Routers are basically mongos instances, interface with client applications and direct operations to the appropriate shard. The query router processes and targets operations to shards and then returns results to the clients. A sharded cluster can contain more than one query router to divide the client request load. A client sends requests to one query router. Generally a sharded cluster have many query routers.

MongoDB Replication

Replication is the process of synchronizing data across multiple servers. Replication provides redundancy and increases data availability with multiple copies of data on different database servers, replication protects a database from the loss of a single server. Replication also allows you to recover from hardware failure and service interruptions. With additional copies of the data, you can dedicate one to disaster recovery, reporting, or backup.

Why Replication?

  • To keep your data safe
  • High (24*7) availability of data
  • Disaster Recovery
  • No downtime for maintenance (like backups, index rebuilds, compaction)
  • Read scaling (extra copies to read from)
  • Replica set is transparent to the application

How replication works in MongoDB

MongoDB achieves replication by the use of replica set. A replica set is a group of mongod instances that host the same data set. In a replica one node is primary node that receives all write operations. All other instances, secondaries, apply operations from the primary so that they have the same data set. Replica set can have only one primary node.
  1. Replica set is a group of two or more nodes (generally minimum 3 nodes are required).
  2. In a replica set one node is primary node and remaining nodes are secondary.
  3. All data replicates from primary to secondary node.
  4. At the time of automatic failover or maintenance, election establishes for primary and a new primary node is elected.
  5. After the recovery of failed node, it again join the replica set and works as a secondary node.
A typical diagram of mongodb replication is shown in which client application always interact with primary node and primary node then replicate the data to the secondary nodes.
MongoDB Replication

Replica set features

  • A cluster of N nodess
  • Anyone node can be primary
  • All write operations goes to primary
  • Automatic failover
  • Automatic Recovery
  • Consensus election of primary

Set up a replica set

In this tutorial we will convert standalone mongod instance to a replica set. To convert to replica set follow the below given steps:
  • Shutdown already running mongodb server.
Now start the mongodb server by specifying --replSet option. Basic syntax of --replSet is given below:
mongod --port "PORT" --dbpath "YOUR_DB_DATA_PATH" --replSet "REPLICA_SET_INSTANCE_NAME"

EXAMPLE

mongod --port 27017 --dbpath "D:\set up\mongodb\data" --replSet rs0
It will start a mongod instance with the name rs0, on port 27017. Now start the command prompt and connect to this mongod instance. In mongo client issue the command rs.initiate() to initiate a new replica set. To check the replica set configuration issue the command rs.conf(). To check the status of replica sete issue the command rs.status().

Add members to replica set

To add members to replica set, start mongod instances on multiple machines. Now start a mongo client and issue a command rs.add().

SYNTAX:

Basic syntax of rs.add() command is as follows:
>rs.add(HOST_NAME:PORT)

EXAMPLE

Suppose your mongod instance name is mongod1.net and it is running on port 27017. To add this instance to replica set issue the command rs.add() in mongo client.
>rs.add("mongod1.net:27017")
>
You can add mongod instance to replica set only when you are connected to primary node. To check whether you are connected to primary or not issue the command db.isMaster() in mongo client.

DBT - Models

Models are where your developers spend most of their time within a dbt environment. Models are primarily written as a select statement and ...