Skip to main content

What is Big data,Hadoop,NoSQL and MapReduce?

Big Data, Hadoop, MapReduce,NoSQl- Three buzz words that we hear a lot these days. here is a small description on these technologies.

Big Data: The name itself discloses the identity!! As we all know this is an era of information overload. Organizations have hoards of unstructured data lying around , amounting to many petabyte and exabytes. It would be a costly affair if we try to use relational databases for analyzing this data. The main purpose of analyzing this data is to recognize any repeated patterns, associations (one event connected to another),classification(looking for new patterns), and patterns in data that could lead to reasonable predictions

Hadoop: It is part of the Apache project, and provides a framework for large scale data processing in a distributed computing enviornment. It can handle upto Terabytes of data and work across thousands of nodes. This is always closely associated with cloud computing, considering the requirement for large number of servers as well as processing power mainly on demand.It is based on Google's MapReduce which enables  parallel processing of large data sets. A Hadoop ecosystem consists of Hadoop Kernel,MapReduce, Hadoop distributed file system(HDFS) and Hadoop YARN(a framework for job scheduling and cluster resource management)

MapReduce: It is software framework for processing large amount of unstructured data in parellel across distributed cluster of processors or standalone computers. It is divided into 
Map - Used to distribute work to multiple nodes
Reduce - Function for collecting the results of work and consolidating it into a single value
It provides a fault tolerant robust framework , the nodes in the cluster are is expected to report back with completed work and status updates.If node remains silent for long, the master node redistribute the work to other nodes

NoSQL:  It is a database which is tailored to handle unstructured data. In some scenarios, it is not possible to always implement the Relational database management system which is more suited for predictable , structured data. It does not uses SQL to manipulate Data. In case of RDMS we may have to horizontally or vertically scale the servers when the data grows. In case of NoSQl, it can grow by distributing itself over a cluster of ordinary servers. It offers high performance with high availability , horizontal scaling and most importantly Open-source!!

Reference:
http://searchcloudcomputing.techtarget.com/definition/Hadoop
http://searchcloudcomputing.techtarget.com/definition/MapReduce

Comments

Popular posts from this blog

Windows server 2012: where is my start button??

If you have been using Windows Server OS for a while, the one thing that will strike you most when you login to a Windows server 2012 is that there is no start button!!.. What??..How am I going to manage it?? Microsoft feels that you really dont need a start button, since you can do almost everything from your server  manager or even remotely from your desktop. After all the initial configurations are done, you could also do away with the GUI and go back to server core option.(In server 2012, there is an option to add and remove GUI). So does that mean, you need to learn to live without a start button. Actually no, the start button is very much there .Lets start looking for it. Option 1: There is "charms" bar on the side of your deskop, where you will find a "start" option. You can use the "Windows +C" shortcut to pop out the charms bar Option 2: There is a hidden "start area"in  the bottom left corner of your desktop

Use Diskpart to make drives online

Issue: In disk management, disk is shown as missing or Offline in Windows Resolution: The disks can be made online by using diskpart utility - Open a command prompt->type diskpart -Inorder to list the disks in the system type: list disk -Note down the number of the disk that you want to make online -Select that disk to operate upon, For eg:, if the disk number is 1, type: Select disk 1 -Now that particular disk will be selected as teh active disk. If you type "list disk" command once more, you can see a * symbol on the left side of the selected disk -Inorder to make the selected disk online type : online disk - If the disk is made online, you will get a message that the operation is completed successfully

Kubernetes best practices in Azure: AKS name space isolation and AAD integration

Once you have decided to run your workloads in AKS service in Azure, there are certain best practices to be followed during design and implementation. In this blog we will discuss two of these recommended practices and the practical aspects of their implementation- Azure AD integration and name space isolation While AAD helps to authenticate users to your AKS cluster using the existing users and groups in your Azure AD, name space isolation provides logical isolation of resources used by them. It is useful in multi tenant scenarios where the same cluster is being used by different teams/departments to run their workloads. It is also useful in running say a dev, test and QA environment for organization in the same cluster. Combining AAD integration with name spaces allow users to login to their namespace using their Azure AD credentials AAD integration with AKS : The following Microsoft document will get you started  with AAD integration of AKS cluster.: https://docs.microsof