Monday, 3 March 2014

DATA MART

A data mart is a subset of the data warehouse, which concentrates on a specific business unit. A data mart, may or may not derived from a data warehouse and is aimed at meeting an immediate requirement.

Data marts may or may not dependent on other data marts in an organization. If the data marts have conformed dimensions and facts, then these data marts will be related to each other.

Benefits of data mart:
·                     Frequently needed data can be accessed very easily.
·                     Performance improvement.
·                     Data marts can be created easily.

·                     Lower cost in implementing data mart than a data warehouse.

DATA WAREHOUSE



A data warehouse is a relational database that is designed for query and business analysis rather than for transaction processing. It contains historical data derived from transaction data. This historical data is used by the business analysts to understand about the business in detail.

A data warehouse should have the following characteristics:

Subject oriented: A data warehouse helps in analyzing the data.  For example, to know about a company's sales, a data warehouse needs to build on sales data. Using this data warehouse we can find the last year sales. This ability to define a data warehouse by subject (sales) makes it a subject oriented.

Integrated: Bringing data from different sources and putting them in to a consistent format. This includes resolving the units of measures, naming conflicts etc.

Non volatile: Once the data enters into the data warehouse, the data should not be updated.

Time variant: To analyze the business, analysts need large amounts of data. So, the data warehouse should contain historical data. 

Monday, 27 January 2014

MongoDB Aggregate

Aggregations operations process data records and return computed results. Aggregation operations group values from multiple documents together, and can perform a variety of operations on the grouped data to return a single result. In sql count(*) and with group by is an equivalent of mongodb aggregation.

The aggregate() Method

For the aggregation in mongodb you should use aggregate() method.

SYNTAX:

Basic syntax of aggregate() method is as follows
>db.COLLECTION_NAME.aggregate(AGGREGATE_OPERATION)

EXAMPLE:



> db.democol.find().pretty()
{
        "_id" : ObjectId("52963c2a6f63f810a98b7a98"),
        "title" : "Learn mongo",
        "by" : "mani",
        "likes" : 100
}
{
        "_id" : ObjectId("52963ce96f63f810a98b7a99"),
        "by" : "subramanian",
        "comments" : [
                {
                        "user" : "Tiara",
                        "message" : "Worth to read"
                }
        ],
        "likes" : 20,
        "title" : "Learn SQL"
}
{
        "_id" : ObjectId("52a62f01fdaa40a1db551d04"),
        "title" : "Learn HTML5",
        "by" : "KGSM",
        "likes" : 240
}
{
        "_id" : ObjectId("52a62f3efdaa40a1db551d05"),
        "title" : "Learn Java",
        "by" : "Faulkner",
        "likes" : 209
}
{
        "_id" : ObjectId("52a62f61fdaa40a1db551d06"),
        "title" : "Basics of Life",
        "by" : "Peter",
        "likes" : 218
}
{
        "_id" : ObjectId("52a65a92fdaa40a1db551d07"),
        "title" : "Basics of Life",
        "by" : "Peter",
        "likes" : 318
}


1.

> db.democol.aggregate([{$group: {_id:"$by",likes:{$sum:"$likes"}}}])
{
        "result" : [
                {
                        "_id" : "Peter",
                        "likes" : 536
                },
                {
                        "_id" : "Faulkner",
                        "likes" : 209
                },
                {
                        "_id" : "KGSM",
                        "likes" : 240
                },
                {
                        "_id" : "subramanian",
                        "likes" : 20
                },
                {
                        "_id" : "mani",
                        "likes" : 100
                }
        ],
        "ok" : 1
}

2.

> db.democol.aggregate([{$group: {_id:"$by",likes:{$min:"$likes"}}}])
{
        "result" : [
                {
                        "_id" : "Peter",
                        "likes" : 218
                },
                {
                        "_id" : "Faulkner",
                        "likes" : 209
                },
                {
                        "_id" : "KGSM",
                        "likes" : 240
                },
                {
                        "_id" : "subramanian",
                        "likes" : 20
                },
                {
                        "_id" : "mani",
                        "likes" : 100
                }
        ],
        "ok" : 1
}

3.

> db.democol.aggregate([{$group: {_id:"$by",likes:{$max:"$likes"}}}])
{
        "result" : [
                {
                        "_id" : "Peter",
                        "likes" : 318
                },
                {
                        "_id" : "Faulkner",
                        "likes" : 209
                },
                {
                        "_id" : "KGSM",
                        "likes" : 240
                },
                {
                        "_id" : "subramanian",
                        "likes" : 20
                },
                {
                        "_id" : "mani",
                        "likes" : 100
                }
        ],
        "ok" : 1
}

ExpressionDescriptionExample
$sumSums up the defined value from all documents in the collection.db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$sum : "$likes"}}}])
$avgCalculates the average of all given values from all documents in the collection.db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$avg : "$likes"}}}])
$minGets the minimum of the corresponding values from all documents in the collection.db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$min : "$likes"}}}])
$maxGets the maximum of the corresponding values from all documents in the collection.db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$max : "$likes"}}}])
$pushInserts the value to an array in the resulting document.db.mycol.aggregate([{$group : {_id : "$by_user", url : {$push: "$url"}}}])
$addToSetInserts the value to an array in the resulting document but does not create duplicates.db.mycol.aggregate([{$group : {_id : "$by_user", url : {$addToSet : "$url"}}}])
$firstGets the first document from the source documents according to the grouping. Typically this makes only sense together with some previously applied “$sort”-stage.db.mycol.aggregate([{$group : {_id : "$by_user", first_url : {$first : "$url"}}}])
$lastGets the last document from the source documents according to the grouping. Typically this makes only sense together with some previously applied “$sort”-stage.db.mycol.aggregate([{$group : {_id : "$by_user", last_url : {$last : "$url"}}}])

MongoDB Sharding

Sharding

Sharding is the process of storing data records across multiple machines and it is MongoDB's approach to meeting the demands of data growth. As the size of the data increases, a single machine may not be sufficient to store the data nor provide an acceptable read and write throughput. Sharding solves the problem with horizontal scaling. With sharding, you add more machines to support data growth and the demands of read and write operations.

Sharding in MongoDB

Below given diagram shows the sharding in MongoDB using sharded cluster.
MongoDB Sharding
In the above given diagram there are three main components which are described below:
  • Shards: Shards are used to store data. They provide high availability and data consistency. In production environment each shard is a separate replica set.
  • Config Servers: Config servers store the cluster's metadata. This data contains a mapping of the cluster's data set to the shards. The query router uses this metadata to target operations to specific shards. In production environment sharded clusters have exactly 3 config servers.
  • Query Routers: Query Routers are basically mongos instances, interface with client applications and direct operations to the appropriate shard. The query router processes and targets operations to shards and then returns results to the clients. A sharded cluster can contain more than one query router to divide the client request load. A client sends requests to one query router. Generally a sharded cluster have many query routers.

MongoDB Replication

Replication is the process of synchronizing data across multiple servers. Replication provides redundancy and increases data availability with multiple copies of data on different database servers, replication protects a database from the loss of a single server. Replication also allows you to recover from hardware failure and service interruptions. With additional copies of the data, you can dedicate one to disaster recovery, reporting, or backup.

Why Replication?

  • To keep your data safe
  • High (24*7) availability of data
  • Disaster Recovery
  • No downtime for maintenance (like backups, index rebuilds, compaction)
  • Read scaling (extra copies to read from)
  • Replica set is transparent to the application

How replication works in MongoDB

MongoDB achieves replication by the use of replica set. A replica set is a group of mongod instances that host the same data set. In a replica one node is primary node that receives all write operations. All other instances, secondaries, apply operations from the primary so that they have the same data set. Replica set can have only one primary node.
  1. Replica set is a group of two or more nodes (generally minimum 3 nodes are required).
  2. In a replica set one node is primary node and remaining nodes are secondary.
  3. All data replicates from primary to secondary node.
  4. At the time of automatic failover or maintenance, election establishes for primary and a new primary node is elected.
  5. After the recovery of failed node, it again join the replica set and works as a secondary node.
A typical diagram of mongodb replication is shown in which client application always interact with primary node and primary node then replicate the data to the secondary nodes.
MongoDB Replication

Replica set features

  • A cluster of N nodess
  • Anyone node can be primary
  • All write operations goes to primary
  • Automatic failover
  • Automatic Recovery
  • Consensus election of primary

Set up a replica set

In this tutorial we will convert standalone mongod instance to a replica set. To convert to replica set follow the below given steps:
  • Shutdown already running mongodb server.
Now start the mongodb server by specifying --replSet option. Basic syntax of --replSet is given below:
mongod --port "PORT" --dbpath "YOUR_DB_DATA_PATH" --replSet "REPLICA_SET_INSTANCE_NAME"

EXAMPLE

mongod --port 27017 --dbpath "D:\set up\mongodb\data" --replSet rs0
It will start a mongod instance with the name rs0, on port 27017. Now start the command prompt and connect to this mongod instance. In mongo client issue the command rs.initiate() to initiate a new replica set. To check the replica set configuration issue the command rs.conf(). To check the status of replica sete issue the command rs.status().

Add members to replica set

To add members to replica set, start mongod instances on multiple machines. Now start a mongo client and issue a command rs.add().

SYNTAX:

Basic syntax of rs.add() command is as follows:
>rs.add(HOST_NAME:PORT)

EXAMPLE

Suppose your mongod instance name is mongod1.net and it is running on port 27017. To add this instance to replica set issue the command rs.add() in mongo client.
>rs.add("mongod1.net:27017")
>
You can add mongod instance to replica set only when you are connected to primary node. To check whether you are connected to primary or not issue the command db.isMaster() in mongo client.

Monday, 2 December 2013

MongoDB Queries - 2

MongoDB Update() method

MongoDB's update() method is used to update document into a collection. The update() method update values in the existing document.


> db.democol.find().pretty()

Result:

{
        "_id" : ObjectId("52963c2a6f63f810a98b7a98"),
        "title" : "Learn mongo",
        "by" : "mani",
        "likes" : 100
}
{
        "_id" : ObjectId("52963ce96f63f810a98b7a99"),
        "by" : "subramanian",
        "comments" : [
                {
                        "user" : "Tiara",
                        "message" : "Worth to read"
                }
        ],
        "likes" : 20,
        "title" : "Basic SQL"
}

> db.democol.update({'title':'Basic SQL'},{$set:{'title':'Learn SQL'}})
> db.democol.find().pretty()

Result:

{
        "_id" : ObjectId("52963c2a6f63f810a98b7a98"),
        "title" : "Learn mongo",
        "by" : "mani",
        "likes" : 100
}
{
        "_id" : ObjectId("52963ce96f63f810a98b7a99"),
        "by" : "subramanian",
        "comments" : [
                {
                        "user" : "Tiara",
                        "message" : "Worth to read"
                }
        ],
        "likes" : 20,
        "title" : "Learn SQL"
}

By default mongodb will update only single document, to update multiple you need to set a paramter 'multi' to true.

>db.democol.update({'title':'Basic SQL'},{$set:{'title':'Learn SQL'}},{multi:true})



The remove() Method

MongoDB's remove() method is used to remove document from the collection. remove() method accepts two parameters. One is deletion criteria and second is justOne flag

deletion criteria : (Optional) deletion criteria according to documents will be removed.

justOne : (Optional) if set to true or 1, then remove only one document.

>db.mycol.find()

Result:

{ "_id" : ObjectId(5983548781331adf45ec5), "title":"MongoDB Overview"}
{ "_id" : ObjectId(5983548781331adf45ec6), "title":"NoSQL Overview"}
{ "_id" : ObjectId(5983548781331adf45ec7), "title":"Tutorials Point Overview"}
Following example will remove all the documents whose title is 'MongoDB Overview'

>db.mycol.remove({'title':'MongoDB Overview'})
>db.mycol.find()

Result:

{ "_id" : ObjectId(5983548781331adf45ec6), "title":"NoSQL Overview"}
{ "_id" : ObjectId(5983548781331adf45ec7), "title":"Tutorials Point Overview"}


Remove only one

If there are multiple records and you want to delete only first record, then set justOne parameter in remove() method

>db.COLLECTION_NAME.remove(DELETION_CRITERIA,1)



MongoDB Projection

In mongodb projection meaning is selecting only necessary data rather than selecting whole of the data of a document. If a document has 5 fields and you need to show only 3, then select only 3 fields from them.

The find() Method

MongoDB's find() method, explained in MongoDB Query Document accepts second optional parameter that is list of fields that you want to retrieve. In MongoDB when you execute find() method, then it displays all fields of a document. To limit this you need to set list of fields with value 1 or 0. 1 is used to show the filed while 0 is used to hide the field.

> db.democol.find({},{title:1,_id:0})

Result:

{ "title" : "Learn mongo" }
{ "title" : "Learn SQL" }

Sunday, 1 December 2013

MondoDB Queries

The find() Method

To query data from MongoDB collection, you need to use MongoDB's find() method.

> db.democol.find()

Result:

{ "_id" : ObjectId("52963c2a6f63f810a98b7a98"), "title" : "Learn mongo", "by" :
"mani", "likes" : 100 }
{ "_id" : ObjectId("52963ce96f63f810a98b7a99"), "title" : "Learn SQL", "by" : "s
ubramanian", "likes" : 20, "comments" : [  {  "user" : "Tiara",  "message" : "Wo
rth to read" } ] }


The pretty() Method

To display the results in a formatted way, you can use pretty() method.

> db.democol.find().pretty()

Result:

{
        "_id" : ObjectId("52963c2a6f63f810a98b7a98"),
        "title" : "Learn mongo",
        "by" : "mani",
        "likes" : 100
}
{
        "_id" : ObjectId("52963ce96f63f810a98b7a99"),
        "title" : "Learn SQL",
        "by" : "subramanian",
        "likes" : 20,
        "comments" : [
                {
                        "user" : "Tiara",
                        "message" : "Worth to read"
                }
        ]
}


AND in MongoDB

In the find() method if you pass multiple keys by separating them by ',' then MongoDB treats it AND condition. Basic syntax of AND is shown below:

> db.democol.find({by:"mani"}).pretty()

Result:

> db.democol.find({by:"mani"}).pretty()
{
        "_id" : ObjectId("52963c2a6f63f810a98b7a98"),
        "title" : "Learn mongo",
        "by" : "mani",
        "likes" : 100
}

> db.democol.find({by:"mani",title:"Learn mongo"}).pretty()

Result:

{
        "_id" : ObjectId("52963c2a6f63f810a98b7a98"),
        "title" : "Learn mongo",
        "by" : "mani",
        "likes" : 100
}


OR in MongoDB

To query documents based on the OR condition, you need to use $or keyword. Basic syntax of OR is shown below:

> db.democol.find({$or:[{by:"mani"},{title:"Learn SQL"}]}).pretty()

Result:

{
        "_id" : ObjectId("52963c2a6f63f810a98b7a98"),
        "title" : "Learn mongo",
        "by" : "mani",
        "likes" : 100
}
{
        "_id" : ObjectId("52963ce96f63f810a98b7a99"),
        "title" : "Learn SQL",
        "by" : "subramanian",
        "likes" : 20,
        "comments" : [
                {
                        "user" : "Tiara",
                        "message" : "Worth to read"
                }
        ]
}

Kiro - Core Features

What is Kiro Kiro is an innovative AI-powered IDE that revolutionizes software development through intelligent assistance and structured wor...