A couple of years back one of my colleague (You can learn more about Big Data here) and my self did some research on the RDBMS and NoSQL Databases mainly focusing on MongoDB. This is just a small blog post to note down our findings.
The main point we should understand here is NoSQL is not a replacement for RDBMS in any way. NoSQL means Not Only SQL. So we should consider RDBMS and NoSQL as two different data storage technologies which we should use only after a careful analysis of our business requirement.
Let's see some core features of RDBMS and NoSQL.
RDBMS (Relational Data Base Management System)
- Structured and organized data
- Structured query language (SQL)
- Data and its relationships are stored in separate tables.
- Data Manipulation Language, Data Definition Language
- Tight Consistency
- ACID Transaction
NoSQL (Not Only SQL)
- Stands for Not Only SQL
- No declarative query language
- No predefined schema
- Key-Value pair storage, Column Store, Document Store, Graph databases
- Eventual consistency rather ACID property
- Unstructured and unpredictable data
- CAP Theorem
- BASE Principles
- Prioritizes high-performance, high-availability and scalability
I think the most of the above features are self-explanatory if you are familiar with at least one RDBMS.
Anyway, what are ACID and CAP?
RDBMS are based on ACID principles
When designing the RDBMS we should consider the ACID principles thoroughly because we are working with the relational data and we should maintain the constraints of data properly to get the true benefits of the database system.
But there are some cases where we don't need to consider much about those ACID principles. Due to this rapid growth of data in the modern world the storing and processing has become very crucial. That's where this NoSQL is coming into action.
The distributed database systems use two main techniques to scale up the capacity of their data storage and to increase the performance of the system. That is Horizontal Scaling and Vertical Scaling.
Here is a very brief introduction to give a simple idea only. These concepts have a vast area to understand.
Horizontal Scaling (AKA Scale UP) - Obtain the storage capacity and processing power by increasing the physical sever hardware. Eg: Storage/RAM/Multi-Core Processors etc
Vertical Scaling (AKA Scale-Out) - Provide more distributed power by adding more servers to the system. That means instead of one having one powerful database server we can add several interconnected commodity servers without any effect on the system.
So when achieving this scalability in a Vertical way we have to sacrifice some of the features in the traditional database systems. (Not a 100% but some)
According to the theorem, a distributed system cannot satisfy all three of these guarantees at the same time
The main idea of NoSQL is to follow the CAP theorem rather than ACID principles.
For that, they have introduced BASE principles
Basically, NoSQL databases are more focused on the scalability, availability and the performance of the system rather than the consistency of data. There's a consistency in the data in the NoSQL systems but that consistency is not guaranteed as in the RDMBS systems.
One can argue that RDMBS can also be highly scalable and available while providing the consistency in data but when it comes to handling big data amount there is a major performance issue in distributed RDBMS because of SQL JOINS.
Assume you have distributed system with few million customer base with several data centers in different geographical locations. How long will it take to result a query joining 5-6 large tables resides in different data centers?
So how does the NoSQL handles this? NoSQL has gained performance in such scenario by avoiding typical SQL Joins but storing data in a way which can easily retrieve. Such as duplicating data, less normalization, embedded data etc.
As I mentioned before NoSQL is based on totally different architecture and concepts than RDBMS. Therefore we should not think about NoSQL database in a traditional way.
I will write more about NoSQL in my next post