Skip to main content

What is Big data,Hadoop,NoSQL and MapReduce?

Big Data, Hadoop, MapReduce,NoSQl- Three buzz words that we hear a lot these days. here is a small description on these technologies.

Big Data: The name itself discloses the identity!! As we all know this is an era of information overload. Organizations have hoards of unstructured data lying around , amounting to many petabyte and exabytes. It would be a costly affair if we try to use relational databases for analyzing this data. The main purpose of analyzing this data is to recognize any repeated patterns, associations (one event connected to another),classification(looking for new patterns), and patterns in data that could lead to reasonable predictions

Hadoop: It is part of the Apache project, and provides a framework for large scale data processing in a distributed computing enviornment. It can handle upto Terabytes of data and work across thousands of nodes. This is always closely associated with cloud computing, considering the requirement for large number of servers as well as processing power mainly on demand.It is based on Google's MapReduce which enables  parallel processing of large data sets. A Hadoop ecosystem consists of Hadoop Kernel,MapReduce, Hadoop distributed file system(HDFS) and Hadoop YARN(a framework for job scheduling and cluster resource management)

MapReduce: It is software framework for processing large amount of unstructured data in parellel across distributed cluster of processors or standalone computers. It is divided into 
Map - Used to distribute work to multiple nodes
Reduce - Function for collecting the results of work and consolidating it into a single value
It provides a fault tolerant robust framework , the nodes in the cluster are is expected to report back with completed work and status updates.If node remains silent for long, the master node redistribute the work to other nodes

NoSQL:  It is a database which is tailored to handle unstructured data. In some scenarios, it is not possible to always implement the Relational database management system which is more suited for predictable , structured data. It does not uses SQL to manipulate Data. In case of RDMS we may have to horizontally or vertically scale the servers when the data grows. In case of NoSQl, it can grow by distributing itself over a cluster of ordinary servers. It offers high performance with high availability , horizontal scaling and most importantly Open-source!!

Reference:
http://searchcloudcomputing.techtarget.com/definition/Hadoop
http://searchcloudcomputing.techtarget.com/definition/MapReduce

Comments

Popular posts from this blog

Install nested KVM in VMware ESXi 5.1

In this blog, I will explain the steps required to run a nested KVM hypervisor on  Vmware ESXi. The installation of KVM is done on Ubuntu 13.10(64 bit). Note: It is assumed that you have already installed your Ubuntu 13.10 VM in ESXi, and hence we will not look into the Ubuntu installation part. 1) Upgrade VM Hardware version to 9. In my ESXi server, the default VM hardware version was 8. So I had to shutdown my VM and upgrade the Hardware version to 9 to get the KVM hypervisor working. You can right click the VM and select the Upgrade hardware option to do this. 2)In the ESXi host In /etc/vmware edit the 'config' file and add the following setting vhv.enable = "TRUE" 3)Edit the VM settings and go to VM settings > Options  > CPU/MMU Virtualization . Select the Intel EPT option 4) Go to Options->CPUID mask> Advanced-> Level 1, add the following CPU mask level ECX  ---- ---- ---- ---- ---- ---- --H- ---- 5) Open the vmx...

Virtual fibre channel in Hyper V

Virtual fibre channel option in Hyper V allows the connection to pass through from physical  fibre channel HBA to virtual fibre channel HBA, and still have the flexibilities like live migration. Pre-requisites: VM should be running Windows Server 2008, 2008 R2 or Windows Server 2012 Supported physical HBA with N_Port Virtualization(NPIV) enabled in the HBA. This can be enabled using any management utility provided by the SAN manufacturer. If you need to enable live migration, each host should be having two physical HBAs and each HBA should have two World Wide Names(WWN). WWN is used to established connectivity to FC storage.When you perform migration, the second node can use the second WWN to connect to the storage and then the first node can release its connection. Thereby the storage connectivity is maintained during live migration Configuring virtual fibre channel is a two step process Step 1: Create a Virtual SAN in the Hyper-V host First you need to click on Vir...

Windows server 2012: where is my start button??

If you have been using Windows Server OS for a while, the one thing that will strike you most when you login to a Windows server 2012 is that there is no start button!!.. What??..How am I going to manage it?? Microsoft feels that you really dont need a start button, since you can do almost everything from your server  manager or even remotely from your desktop. After all the initial configurations are done, you could also do away with the GUI and go back to server core option.(In server 2012, there is an option to add and remove GUI). So does that mean, you need to learn to live without a start button. Actually no, the start button is very much there .Lets start looking for it. Option 1: There is "charms" bar on the side of your deskop, where you will find a "start" option. You can use the "Windows +C" shortcut to pop out the charms bar Option 2: There is a hidden "start area"in  the bottom left corner of your desktop...