info@sparkhadoop.com +91-8056293569
Level 1 : Big Data & Hadoop Development
11th January 2015
0

Big Data and Hadoop course has been designed by a team of highly experienced industry professionals to provide in-depth knowledge and skills to the learner in order to become a successful Hadoop Developer. The complete curriculum extensively covers all the topics required to gain an expertise in Hadoop Ecosystem.

Course Highlights

  • A Single Course covers all the Hadoop components and apache Spark.
  • 80 hours of course
  • Overall 60+ Assignments
  • Java Refresher for Non Java Candidates
  • No Pre-Configured VMs.
  • 24 Hours SLA for email Support
  • Refresher Classes
  • Vendor Neutral. Using Apache versions.
  • Offering Big Data Course not Hadoop Alone.
  • Cloudera Certification Assistance
  • Course taken by trainers from real time professionals team (www.datadotz.com) who have taken for 800+ professionals

 

Course Objectives

After the completion of the Bigdata & Hadoop Developer course @sparkhadoop.com, you will be equipped & self-reliant with the following

  • Understanding Big Data
  • Understanding various types of data that can be stored in Hadoop
  • Understanding how Big Data & Hadoop fit in the current environment and infrastructure
  • Master the core concepts of Hadoop eco-system – HDFS & Map Reduce frameworks
  • Writing complex Map Reduce programs
  • Setting up a Hadoop cluster
  • Mastering with various other components of Hadoop eco-system
  • Performing Data Analytics using PIG & HIVE
  • Good understanding of Zookeeper service like maintain configuration information, naming, providing distributed synchronization & group services
  • Implementing a Hadoop project
  • Working on live/real life POC on big data analytics using Hadoop eco-system
  • And Much More..

 

Course Delivery Method

All our courses are live instructor led and interactive sessions handled by highly reputed and experienced professionals from industry giants such as CTS, DataDotz, TCS and  etc.
Who can take up this course?

  • Data Architects
  • Data Integration Architects
  • Tech Managers
  • Decision Makers
  • Database Administrators
  • Java Developers/ Any other developers
  • Technical Infrastructure Team
  • Any working professional interested in knowing Hadoop
  • Any graduate/post-graduate with an urge to learn Hadoop

 

Pre-requisites to take this course  

  • 64 Bit processor laptop/PC with minimum 4GB RAM (for programming practice along with sessions)
  • Familiarity with core java will be an advantage, but is not mandatory.
  • Familiarity with any database will be an advantage , but is not mandatory.

 

Project & Certification

Towards the end of the course, there will be an assignment which you will have to work on. This assignment can be a real life data based assignment with business problems. On successful completion of this assignment (it will be reviewed by instructor & industry expert).

Here are some of the data sets on which you may work as a part of project work ?

Drug Data Set –  contains the day to day records of all the Drugs. It will provide you with the information like opening rate, closing rate, etc. for individual Drug. Hence, this data is highly valuable for people you have to make decision based on the market trends

Cloudera Certification Assistance

 

Why take this course?

Big Data is a term used to describe large sets/volumes of data which companies/organizations store, process & analyze to make better decisions beneficial for overall organization & its stakeholders. Now, these data sets have become so huge that companies are facing difficulties in storing these data & processing them. Traditional systems which were used to store & process data have almost become obsolete when it comes to Big Data. This is where Hadoop comes into existence & companies involved in working with Big Data have started opting/implementing Hadoop for collecting, storing, processing & retrieving peta bytes of data.

Gone are the days when decisions were made on the basis of gut feeling, but currently, all decisions are made on the basis of historical data which is processed & analyzed & accordingly forecasting is done.

The right mix of a professional with excellent analytical skills & hands on experience with advanced technology like Hadoop is what companies/organizations are looking for. According to latest McKinsey report, more than 2,00,000 data scientists will be needed by the industry (2014-2016).

Huge opportunity in the market for you after successful completion of this course!!!


 

Syllabus

Course Outline

Introduction

Big Data (What, Why, Who) – 3++Vs – Overview of Hadoop EcoSystem – Role of Hadoop in Big data – Overview of other Big Data Systems – Who is using Hadoop – Hadoop integrations into Exiting Software Products – Current Scenario in Hadoop Ecosystem – Installation – Configuration – UseCases of Hadoop (HealthCare, Retail, Telecom)

HDFS

Concepts – Architecture – Data Flow (File Read , File Write)–Fault Tolerance – Shell Commands – Java Base API – Data Flow Archives – Coherency – Data Integrity – Role of Secondary NameNode

MapReduce

Theory – Data Flow (Map – Shuffle – Reduce) – MapRed vs MapReduce APIs – Programming [Mapper, Reducer, Combiner, Partitioner] –Writables – InputFormat – Outputformat – Streaming API using python – Inherent Failure Handling using Speculative Execution – Magic of Shuffle Phase –FileFormats – Sequence Files

Advanced Mapreduce Programming

Counters (Built In and Custom) – CustomInputFormat – Distributed Cache – Joins(MapSide, Reduce Side) – Sorting – Performance Tuning –GenericOptionsParser – ToolRunner – Debugging(LocalJobRunner)

Administration

Multi Node Cluster Setup using AWS Cloud Machines –Hardware Considerations –Software Considerations – Commands (fsck, job, dfsadmin) – Schedulers in Job Tracker – RackAwareness Policy – Balancing – NameNode Failure and Recovery – commissioning and Decommissioning a Node – Compression Codecs

HBase

Introduction to NoSQL – CAP Theorem – Classification of NoSQL – Hbase and RDBMS – HBASE and HDFS- Architecture (Read Path, Write Path, Compactions, Splits) – Installation – Configuration – Role of Zookeeper – HBase Shell – Java Based APIs (Scan, Get, other advanced APIs )– Introduction to Filters- RowKey Design – Map reduce Integration – Performance Tuning –What’s New in HBase 0.98 – Backup and Disaster Recovery – Hands On

Hive

Architecture – Installation –Configuration – Hive vs RDBMS – Tables – DDL – DML – UDF – UDAF – Partitioning – Bucketing – MetaStore – Hive-Hbase Integration – Hive Web Interface – Hive Server(JDBC,ODBC, Thrift) – File Formats (RCFile – ORCFile) – Other SQL on Hadoop

Pig

Architecture –Installation – Hive vs Pig – Pig Latin Syntax –Data Types –Functions (Eval, Load/Store, String, DateTime) – Joins – Pig Server –Macros- UDFs- Performance – Troubleshooting – Commonly Used Functions

Sqoop

Architecture , Installation, Commands(Import , Hive-Import, EVal, Hbase Import, Import All tables, Export) – Connectors to Existing DBs and DW

Flume

Why Flume ? – Architecture, Configuration (Agents), Sources(Exec-Avro-NetCat), Channels(File,Memory,JDBC, HBase), Sinks(Logger, Avro, HDFS, Hbase, FileRoll), Contextual Routing (Interceptors, Channel Selectors) – Introduction to other aggregation frameworks

Oozie

Architecture, Installation, Workflow, Coordinator, Action (Mapreduce, Hive, Pig, Sqoop) – Introduction to Bundle – Mail Notifications

Hadoop 2.0

Limitations in Hadoop-1.0 – HDFS Federation – High Availability in HDFS – HDFS Snapshots – Other Improvements in HDFS2- Introduction to YARN aka MR2 – Limitations in MR1 – Architecture of YARN – MapReduce Job Flow in YARN – Introduction to Stinger Initiative and Tez – BackWard Compatibility for Hadoop 1.X

Apache Spark

Introduction to Apache Spark – Role of Spark in Big data – Who is using Spark – Installation of SparkShell and StandAlone Cluster – Configuration – RDD Operations (Transformations and Actions)

UseCases

HealthCare care Management using MapR Distribution  – Legacy Modernization using Hortonworks and Teradata-  Cloud Based ETL using Amazon Elastic MapReduce for manufacturing  – IoT Usecase using Kafka , Storm, and Hortonworks –  Data Archival using Cloudera

 

Write your comment here ...

Leave a Reply