As the world becomes more information-driven than ever
before, a major challenge has become how to deal with the explosion of data.
The most common description of Data talks about the four V’s: Volume, Velocity,
Variety, and Veracity.
Volume
- Represents the size of data.
Velocity
- refers to the speed with which data is generated.
Variety
- About data in many forms: structured,
semi-structured, and unstructured data.
Veracity
- How accurate or truthful data. Reduce inconsistencies, incompleteness, and
ambiguities.
Here is an overview of important technologies to know about
how the pieces of information are stored and retrieved.
Traditional RDBMS
The
Relational Databases (RDBMS) (SQL) have dominated the Database market, and they
have done a lot of good. So what changed? Web technology started the
revolution. Today, many people shop on Amazon. RDBMS was not designed to handle
the number of transactions that take place on Amazon every second. The primary
constraining factor was the RDBMS’ schema.
NoSQL Database
Systems
NoSQL
(commonly referred to as "Not Only SQL") represents a completely
different framework of databases that allows for high-performance, agile
processing of information at a massive scale. Also offered an alternative by
eliminating schemas at the expense of relaxing ACID principles.
Hadoop
Hadoop
is a file system and not a database. Although Hadoop and associates (Hbase,
Mapreduce, Hive, Pig, Zookeeper) have turned it into a mighty database, Hadoop
is a scalable, inexpensive distributed filesystem with fault tolerance.