Email:info@araniconsulting.com
Basic
Unit 1: Introduction and Overview of Hadoop
- What is Hadoop?
- History of Hadoop.
- Building Blocks – Hadoop Eco-System.
- Who is behind Hadoop?
- What Hadoop is good for and what it is not?
Unit 2: Hadoop Distributed FileSystem (HDFS)
- HDFS Overview and ArchitecturePREVIEW
- HDFS Installation
- HDFS Use Cases
- Hadoop File System Shell
- File System Java API
- Hadoop Configuration
Unit 3: HBase – The Hadoop Database
- HBase Overview and Architecture
- HBase Installation
- HBase Shell
- Java Client API
- Java Administrative API
- Filters
- Scan Caching and Batching
- Key Design
- Table Design
Unit 4: Map/Reduce 2.0/YARN
- Decomposing Problems into MapReduce Workflow
- Using JobControl
- Oozie Introduction and Architecture
- Oozie Installation
- Developing, deploying, and Executing Oozie Workflows
Unit 5: Pig
- Pig Overview
- Installation
- Pig Latin
- Developing Pig Scripts
- Processing Big Data with Pig
- Joining data-sets with Pig
Unit 6: Hive
- Hive Overview
- Installation
- Hive QL
Unit 7: Sqoop
- Introduction
- Sqoop Tools
- Sqoop Import
- Sqoop Import all tables
- Sqoop Export
- Sqoop Job
- Sqoop metastore
- Sqoop Eval
- Sqoop Codegen
- Sqoop List Databases and List Tables
- Sqoop Create Hive Table
Advance
Unit 1: Integrating Hadoop Into The Workflow
- Relational Database Management Systems
- Storage Systems
- Importing Data from RDBMSs With Sqoop
- Hands-on exercise
- Importing Real-Time Data with Flume
- Accessing HDFS Using FuseDFS and Hoop
Unit 2: Delving Deeper Into The Hadoop API
- More about ToolRunner
- Testing with MRUnit
- Reducing Intermediate Data With Combiners
- The configure and close methods for Map/Reduce Setup and Teardown
- Writing Partitioners for Better Load Balancing
- Hands-On Exercise
- Directly Accessing HDFS
- Using the Distributed Cache
Unit 3: Common Map Reduce Algorithms
- Sorting and Searching
- Indexing
- Machine Learning With Mahout
- Term Frequency – Inverse Document Frequency
- Word Co-Occurrence
Unit 4: Using Hive and Pig
- Hive Basics
- Pig Basics
Unit 5: Practical Development Tips and Techniques
- Debugging MapReduce Code
- Using LocalJobRunner Mode For Easier Debugging
- Retrieving Job Information with Counters
- Logging
- Splittable File Formats
- Determining the Optimal Number of Reducers
- Map-Only MapReduce Jobs
Unit 6: More Advanced Map Reduce Programming
- Custom Writables and WritableComparables
- Saving Binary Data using SequenceFiles and Avro Files
- Creating InputFormats and OutputFormats
Unit 7: Joining Data Sets in Map Reduce
- Map-Side Joins
- The Secondary Sort
- Reduce-Side Joins
Unit 8: Graph Manipulation in Hadoop
- Introduction to graph techniques
- Representing graphs in Hadoop
- Implementing a sample algorithm: Single Source Shortest Path
Unit 9: Creating Workflows With Oozie
- The Motivation for Oozie
- Oozie’s Workflow Definition Format