Saturday 17 August 2013

MongoDB - A database for Big Data

The need for managing large datasets that grow very rapidly has always been there. With social networking sites, generating millions of  lines of data every minute, the need to store, process and analyse large datasets is not only required but also critical for surving in the race to remain relevant. This scenario has given an impetus to the rise of NO SQL databases. NO SQL or Not Only SQL databases can be divided into 4 main categories :
  1. Key-Value databases : Voldemort
  2. Graph databases : InfiniteGraph, Neo4J
  3. Document databases : CouchDB, MongoDB
  4. Column Family stores : Cassandra, HBase, Hyperbase
  • MongoDB is a non-relational JSON document store or database. It doesn't support the relational algebra that is most often expressed as SQL.
  • Documents are expressed as JSON and are stored within the MongoDB database in the BSON (Binary JSON) (bsonspec.org) format.
  • BSON supports all the data types available in JSON and a few more. It can support Strings, Floating-point numbers,Arrays, Objects and Timestamps data types.
  • MongoDB supports documents in the same collection that do not have have the same schema.This is referred to as supporting dynamic schema or it is schemaless.
  • There is no SQL, no transaction management and no Joins. 
  • The fact that there is no transaction management across multiple documents and no JOINs makes MongoDB a better suited for scalability and performance.


Problems with traditional RDBMs and the need for MongoDB type No-SQL databases :
Traditional RDBMSs are based on concepts that promote strong referential integrity, data normalization and transaction management. This implies that every data model will have several database tables and to satisfy a query, it might need to perform a JOIN. JOINs are not your best friend when you are looking for speed and performance.
Transaction Management is another key area that provides reliablity and data consistency. However this leads to a drop in performance. 
MongoDB does not utilize JOINs or provide Transaction Management. This results in a performance boost and data access is very fast as compared to Traditional RDBMSs.
While Transaction Management is not supported within MongoDB, it does however guarentee atomic transactions. Atomic transactions are updates affecting one document only. This one document may contain other sub-documents.
MongoDB uses binary-encoded JSON (BSON) which stores different data types efficiently, and the fact that no storage is allocated for fields that do not exist in a particular document, 
implies that documents which do not have a complete set of entries, significant savings in storage can be achieved in comparison to RDBMs, where space must be reserved in every row for every field whether populated or null
Large document sets can also be split (or sharded) over multiple servers and automatically redistributed when additional servers are added for additional scalability
Real world use cases : SAP, Sourceforge, MTV, Twitter (http://www.mongodb.org/about/production-deployments/)

Thus, MongoDB is all about performance, scalability and speed, however there are still some scenarios where MongoDB might not be the best fit. These are as follows:
1. Not suited for an application requiring transaction management.
2. Designed to work behind fire walls so it has less security relative to RDBMs.
3. Documents in MongoDB are limited to 16MB. Once a collection reaches this size, it has to broken up and hosted across various shards.

No comments: