HADOOP ~ eduwing

Hadoop is a framework for running applications on large clusters built of commodity hardware.

Scalable: It can reliably store and process petabytes.
Economical: It distributes the data and processing across clusters of commonly available computers (in thousands).
Efficient: By distributing the data, it can process it in parallel on the nodes where the data is located.
Reliable: It automatically maintains multiple copies of data and automatically redeploys computing tasks based on failures.

Why Hadoop?

challenge: Read 1 TB of data
1 Machine:	10 Machines:
4 I/O Channels	4 I/O Channels
Each Channel:100 MB/s	Each Channel:100 MB/s
45 Minutes	4.5 Minutes

HDFS( ):

HDFS is a file system designed for storing very large files with streaming data access patterns, running on clusters on commodity hardware.

A computer cluster consists of a set of loosely connected or tightly connected computers that work together so that in many respects they can be viewed as a single system.

Commodity Hardware: commonly available hardware available from multiple vendors

HDFS Files:

User data divided into 64MB blocks and replicated across local disks of cluster node to address:

Cluster network bottleneck
Cluster node crashes

Master/Slave Architecture

Master (Namenode) maintains a name space and metadata
Slaves (Datanodes): maintain three copies of each data block

HDFS Architecture:

eduwing

This blog related to computer science ebooks,materials of various universities, GATE, UGC-NET.

HADOOP

Why Hadoop?

0 comments:

Post a Comment

GATE

Categories

Data Structures

C++ Language

Constructor/Destructors

Inheritance

Templates

UGC NET

Labels

Blog Archive

Blog Archive

Contact Us

Downloads

Hot Topics

Categories