Level 1 :Big Data & Hadoop Development
21st May 2015

Big Data and Hadoop course has been designed by a team of highly experienced industry professionals to provide in-depth knowledge and skills to the learner in order to become a successful Hadoop Developer. The complete curriculum extensively covers all the topics required to gain an expertise in Hadoop Ecosystem.

Course Highlights

  • 80 hours of instructor led live sessions
  • 60+ assessment tests to understand learners progress
  • Major project on real data set
  • 7 day money back guarantee with no questions asked
  • Our 24×7 expert support team will be available to help through email , phone or live chat for any issues you may face during the course
  • Get access to recorded version of the session if you miss upon any live session
  • Vendor Neutral. Using Apache versions.
  • Offering Big Data Course not Hadoop Alone.
  • Cloudera Certification Assistance
  • Get life time access to the course material and session videos
  • Chat with the instructor any time. Instructors will always be ready to answer your questions

Course Objectives

After the completion of the Bigdata & Hadoop Developer course @LearnSocial, you will be equipped & self-reliant with the following

  • Understanding Big Data
  • Understanding various types of data that can be stored in Hadoop
  • Understanding how Big Data & Hadoop fit in the current environment and infrastructure
  • Master the core concepts of Hadoop eco-system – HDFS & Map Reduce frameworks
  • Writing complex Map Reduce programs
  • Setting up a Hadoop cluster
  • Mastering with various other components of Hadoop eco-system
  • Performing Data Analytics using PIG & HIVE
  • Good understanding of Zookeeper service like maintain configuration information, naming, providing distributed synchronization & group services
  • Implementing a Hadoop project
  • Working on live/real life project on big data analytics using Hadoop eco-system
  • And Much More..

Course Delivery Method

All our courses are live instructor led and interactive sessions handled by highly reputed and experienced professionals from industry giants such as CTS,Data Dotz,TCS etc. All the classes are conducted through LIVE Video Streaming, where learners can interact with the instructor by speaking, chatting and sharing screen. Instructor trains learners by sharing their screen and through other technology tools.

All you need is a PC with a webcam, microphone and a 1 MBPS internet connection to attend the LIVE classes. However, we have seen people attending the classes from a much slower internet.
Who can take up this course?

  • Data Architects
  • Data Integration Architects
  • Tech Managers
  • Decision Makers
  • Database Administrators
  • Java Developers/ Any other developers
  • Technical Infrastructure Team
  • Any working professional interested in knowing Hadoop
  • Any graduate/post-graduate with an urge to learn Hadoop

Pre-requisites to take this course  

  • 64 Bit processor laptop/PC with minimum 4GB RAM (for programming practice along with sessions)
  • Familiarity with core java will be an advantage, but is not mandatory.
  • Familiarity with any database will be an advantage , but is not mandatory.

Project & Certification

Towards the end of the course, there will be an assignment which you will have to work on. This assignment can be a real life data based assignment with business problems. On successful completion of this assignment (it will be reviewed by instructor & industry expert).

Here are some of the data sets on which you may work as a part of project work ?

Drug Data Set –  contains the day to day records of all the Drugs. It will provide you with the information like opening rate, closing rate, etc. for individual Drug. Hence, this data is highly valuable for people you have to make decision based on the market trends

Cloudera Certification Assistance

Why take this course?

Big Data is a term used to describe large sets/volumes of data which companies/organizations store, process & analyze to make better decisions beneficial for overall organization & its stakeholders. Now, these data sets have become so huge that companies are facing difficulties in storing these data & processing them. Traditional systems which were used to store & process data have almost become obsolete when it comes to Big Data. This is where Hadoop comes into existence & companies involved in working with Big Data have started opting/implementing Hadoop for collecting, storing, processing & retrieving peta bytes of data.

Gone are the days when decisions were made on the basis of gut feeling, but currently, all decisions are made on the basis of historical data which is processed & analyzed & accordingly forecasting is done.

The right mix of a professional with excellent analytical skills & hands on experience with advanced technology like Hadoop is what companies/organizations are looking for. According to latest McKinsey report, more than 2,00,000 data scientists will be needed by the industry (2014-2016).

Huge opportunity in the market for you after successful completion of this course!!!


Course Outline


Big Data (What, Why, Who) – 3++Vs – Overview of Hadoop EcoSystem – Role of Hadoop in Big data – Overview of other Big Data Systems – Who is using Hadoop – Hadoop integrations into Exiting Software Products – Current Scenario in Hadoop Ecosystem – Installation – Configuration – UseCases of Hadoop (HealthCare, Retail, Telecom)


Concepts – Architecture – Data Flow (File Read , File Write)–Fault Tolerance – Shell Commands – Java Base API – Data Flow Archives – Coherency – Data Integrity – Role of Secondary NameNode


Theory – Data Flow (Map – Shuffle – Reduce) – MapRed vs MapReduce APIs – Programming [Mapper, Reducer, Combiner, Partitioner] –Writables – InputFormat – Outputformat – Streaming API using python – Inherent Failure Handling using Speculative Execution – Magic of Shuffle Phase –FileFormats – Sequence Files

Advanced Mapreduce Programming

Counters (Built In and Custom) – CustomInputFormat – Distributed Cache – Joins(MapSide, Reduce Side) – Sorting – Performance Tuning –GenericOptionsParser – ToolRunner – Debugging(LocalJobRunner)


Multi Node Cluster Setup using AWS Cloud Machines –Hardware Considerations –Software Considerations – Commands (fsck, job, dfsadmin) – Schedulers in Job Tracker – RackAwareness Policy – Balancing – NameNode Failure and Recovery – commissioning and Decommissioning a Node – Compression Codecs


Introduction to NoSQL – CAP Theorem – Classification of NoSQL – Hbase and RDBMS – HBASE and HDFS- Architecture (Read Path, Write Path, Compactions, Splits) – Installation – Configuration – Role of Zookeeper – HBase Shell – Java Based APIs (Scan, Get, other advanced APIs )– Introduction to Filters- RowKey Design – Map reduce Integration – Performance Tuning –What’s New in HBase 0.98 – Backup and Disaster Recovery – Hands On


Architecture – Installation –Configuration – Hive vs RDBMS – Tables – DDL – DML – UDF – UDAF – Partitioning – Bucketing – MetaStore – Hive-Hbase Integration – Hive Web Interface – Hive Server(JDBC,ODBC, Thrift) – File Formats (RCFile – ORCFile) – Other SQL on Hadoop


Architecture –Installation – Hive vs Pig – Pig Latin Syntax –Data Types –Functions (Eval, Load/Store, String, DateTime) – Joins – Pig Server –Macros- UDFs- Performance – Troubleshooting – Commonly Used Functions


Architecture , Installation, Commands(Import , Hive-Import, EVal, Hbase Import, Import All tables, Export) – Connectors to Existing DBs and DW


Why Flume ? – Architecture, Configuration (Agents), Sources(Exec-Avro-NetCat), Channels(File,Memory,JDBC, HBase), Sinks(Logger, Avro, HDFS, Hbase, FileRoll), Contextual Routing (Interceptors, Channel Selectors) – Introduction to other aggregation frameworks


Architecture, Installation, Workflow, Coordinator, Action (Mapreduce, Hive, Pig, Sqoop) – Introduction to Bundle – Mail Notifications

Hadoop 2.0

Limitations in Hadoop-1.0 – HDFS Federation – High Availability in HDFS – HDFS Snapshots – Other Improvements in HDFS2- Introduction to YARN aka MR2 – Limitations in MR1 – Architecture of YARN – MapReduce Job Flow in YARN – Introduction to Stinger Initiative and Tez – BackWard Compatibility for Hadoop 1.X

Apache Spark

Introduction to Apache Spark – Role of Spark in Big data – Who is using Spark – Installation of SparkShell and StandAlone Cluster – Configuration – RDD Operations (Transformations and Actions)


HealthCare care Management using MapR Distribution  – Legacy Modernization using Hortonworks and Teradata-  Cloud Based ETL using Amazon Elastic MapReduce for manufacturing  – IoT Usecase using Kafka , Storm, and Hortonworks –  Data Archival using Cloudera

Write your comment here ...

Leave a Reply