Pages

Monday, August 26, 2013

What is Big data,Hadoop,NoSQL and MapReduce?

Big Data, Hadoop, MapReduce,NoSQl- Three buzz words that we hear a lot these days. here is a small description on these technologies.

Big Data: The name itself discloses the identity!! As we all know this is an era of information overload. Organizations have hoards of unstructured data lying around , amounting to many petabyte and exabytes. It would be a costly affair if we try to use relational databases for analyzing this data. The main purpose of analyzing this data is to recognize any repeated patterns, associations (one event connected to another),classification(looking for new patterns), and patterns in data that could lead to reasonable predictions

Hadoop: It is part of the Apache project, and provides a framework for large scale data processing in a distributed computing enviornment. It can handle upto Terabytes of data and work across thousands of nodes. This is always closely associated with cloud computing, considering the requirement for large number of servers as well as processing power mainly on demand.It is based on Google's MapReduce which enables  parallel processing of large data sets. A Hadoop ecosystem consists of Hadoop Kernel,MapReduce, Hadoop distributed file system(HDFS) and Hadoop YARN(a framework for job scheduling and cluster resource management)

MapReduce: It is software framework for processing large amount of unstructured data in parellel across distributed cluster of processors or standalone computers. It is divided into 
Map - Used to distribute work to multiple nodes
Reduce - Function for collecting the results of work and consolidating it into a single value
It provides a fault tolerant robust framework , the nodes in the cluster are is expected to report back with completed work and status updates.If node remains silent for long, the master node redistribute the work to other nodes

NoSQL:  It is a database which is tailored to handle unstructured data. In some scenarios, it is not possible to always implement the Relational database management system which is more suited for predictable , structured data. It does not uses SQL to manipulate Data. In case of RDMS we may have to horizontally or vertically scale the servers when the data grows. In case of NoSQl, it can grow by distributing itself over a cluster of ordinary servers. It offers high performance with high availability , horizontal scaling and most importantly Open-source!!

Reference:
http://searchcloudcomputing.techtarget.com/definition/Hadoop
http://searchcloudcomputing.techtarget.com/definition/MapReduce

0 comments:

Post a Comment