Certificate in Software
Development Life Cycle in
Big data and Business
Intelligence (SDLC-BD & BI)

Download Brochure

Program Outline

The Certificate in Software Development Life Cycle in BIGDATA, Business Intelligence and Tableau program prepares students for employment in IT industry. Successful candidates should be able to integrate the data to Hadoop File System, Extract transform and load data using the Hadoop applications, work in the workflow and resource negotiator environments, identify various types of file formats, analyze data using spark programs. Training the professionals with negligible or little knowledge of Big Data environments, Hadoop applications, Oracle SQL, Business Intelligence and Tableau environments.


  • Big Data Engineer, Hadoop Engineer, Hadoop Data Analyst, Hadoop Programmer, Spark Analyst Programmer, Hadoop Administrator.

  • 155
  • $4860
Three months / 55 hours each month approximately / 13 to 14 hours each week.
Grading Scale : Class attendance and excercises (75%), exams, questions and answers(25%)
Week/Day Topic/Reading
Week 1/Day 1 Session 1 Introduction
  • Course Outline and Introduction        
  • Introduction to Big Data.
  • What is Hadoop?
  • Limitations of the existing solution (Distributed System)
  • Motivation and the need of Hadoop
  • Common terminologies used in Hadoop
  • Introduction to Hadoop Eco System tools and their uses
  • RDBMS vs Hadoop
Week 1/Day 2 Session 2: HDFS basics and Data Storage
  • What is HDFS?
  • HDFS Architecture
  • How Data is stored?
  • HDFS Daemons
  • HDFS Commands
  • HDFS Classic Mode
  • HDFS HA Mode
Week 1/Day 3 Session 3: Job Execution and YARN
  • Gen 1 and Gen 2 Hadoop
  • Job Execution in Classic Hadoop (Job Tracker and Task Tracker)
  • What is YARN? Or MR2
  • YARN Architecture
  • YARN Daemons
Week 2/Day 1 Session 4: HUE and MR
  • Introduction to HUE
  • HUE capabilities
  • Introduction to MapReduce
  • How MapReduce Work
  • Classic Word Count Example
Week 2/Day 2 Session 5: SQOOP
  • What is Sqoop?
  • Requirement of SQOOP
  • Ingesting relational data with sqoop.
  • Working with SQOOP tools
  • List Tables, List Databases, Import all table,
  • import, export, job, eval
  • More SQOOP Queries and Hands On
  • Sqoop 2 Introduction
Week 2/Day 3 Session 6:  HIVE and IMPALA
  • Introduction to Impala and Hive
  • Difference between Impala, Hive and RDBMS
  • Hive Architecture
  • The Metastore
Week 3/Day 1 Session 7: Hive and Impala continues, Data Formats
  • Creating more tables
  • Insert Data in HIVE table
  • Creating partitioned tables
  • Impala Architecture
  • How Impala works
  • Impala Queries
Week 3/Day 2, 3 Session 8: Data Formats
  • What are different data formats
  • Text
  • Avro
  • Parquet
  • Storing Data in Avro and Parquet format using SQOOP
  • Create tables on Avro Parquet Data using IMPALA and HIVE
  • More examples on HIVE and IMPALA tables
Week 4/Day 1 Session 9: FLUME
  • Recap of last day and Queries
  • Data Format Continues
  • What is flume?
  • What is Source, Sink and Channel
  • How Flume Works
  • Flume Use cases
  • Writing and Executing Flume Configuration file
  • Hands On Examples: Retrieve Twitter Data using Flume
Week 4/Day 2 Session 10:  PIG
  • Recap of last day and Queries
  • Introduction to Pig
  • Pig components
  • Working with Pig
  • Interacting with Pig
  • Pig Latin Syntax
  • Loading Data
  • Simple Data Types
  • Field Definitions
  • Data output
  • Viewing the schema
  • Hands On Exercise
Week 4/Day 3 Session 11: PIG Continues
  • Recap of last day and Queries
  • Filtering and Sorting Data
  • Commonly Used Functions
  • Storage Formats
  • Complex/Nested Data Types
  • Grouping
  • Built in Functions for Complex Data
  • Iterating Grouped Data
  • Hands On Exercise
Week 5/Day 1, 2 Session 12: SCALA BASICS
  • Recap of last day and Queries
  • Introduction to Scala
  • Writing programs Scala
  • Know about variables, declaring and defining functions.
  • Writing CASE classes
  • Hands On Exercise
Week 5/Day 3
Week 6/Day 1
  • Recap of last day and Queries
  • Introduction to Python
  • Writing programs in python
  • Know about variables, declaring and defining functions.
  • Hands On Exercise
Week/Day Topic/Reading
Week 6/Day 2 Session 14: CORE SPARK
  • What is spark?
  • How spark is different from MapReduce?
  • Setting up the environment
  • Spark RDD
  • Basic RDD Operations
  • Hands On Exercise
  • Pair RDD Operations
  • Running Spark Application on Cluster
  • Hands On Exercise
  • Spark Parallel Processing
  • RDD Persistence
Week 6/Day 3 Session 15: SPARK USE CASES
  • Hands On exercises on how to write spark for processing data
  • Other Use cases like airport, census data processing
  • Other Use cases like Temperature Data
  • Email processing
Week 7/Day 1 Session 16: Spark Machine Learning
  • Basics of Machine learning
  • Develop page rank algorithm (Iterative Algo)
  • How to build recommender engine
Week 7/Day 2 Session 17: SPARK SQL
  • What is Spark SQL and SQLContext
  • Difference Between SparkContext and SQLContext
  • What is Dataframe
  • Spark SQL operations on Dataframes
  • Dataframe to RDD conversion
  • Defining a schema and creating dataframe programmatically
Week 7/Day 3 Session 18: Spark Streaming Basics and Kafka
  • What is Kafka
  • what is spark streaming
  • working with spark streaming and required libraries
  • first streaming application
  • configure IntelliJ to work with Spark Streaming
Week 8/Day 1 Session 19: Spark Streaming in Action
  • Sliding Window Operation
  • Spark Streaming for Twitter Data
  • Processing Tweets using Spark Streaming
Week 8/Day 2 Session 20: Introduction to OOZIE
  • Managing workflow using OOZIE
  • Hands On Exercise
Week 8/Day 3 Session 21: Introduction to Solr and Cloudera Search
  • What is Solr and Cloudera search
  • Indexing Data with Cloudera Search
  • Hands On exercise
Week 9/Day 1 Session 22: Introduction to HBASE
Week 9/Day 2 Session 23: Administrating Hadoop
  • Installation and configuration of Hadoop components
  • How to setup own cluster
Week 9/Day 3 Session 24: Cluster Setup continues
Week 10/Day 1 Session 25: Cloudera Manager and Cloudera Navigator to work with cluster
Week 10/Day 2 Session 26: Introduction to Data Set used
  • During entire course we will work on a fictitious mobile company that has
  • RDBMS for accounts info (MySQL)
  • Apache server logs for customer service data
  • HTML Files – Knowledge Base Articles
  • XML Files – Device activation records
  • Real time device status logs
  • Base stations – cell tower locations
  • Other Sample Data Sets: Airport, Names, Temperature, Census Data, Amazon Reviews etc.
Week 10/Day 3 Introduction to Tableu and Discuss SDLC in Tableau
  • Tableau introduction
  • Charts
  • Filters/Parameters
  • Table Calculations
  • Aggregations and Functions
  • Tableau File Types
Week 11/Day 1 Dashboards
  • Data Blending and Joins
  • Server Roles and Publishing Dashboards
  • Issues during Development
  • Performance Tips and Other techniques
Week 11/Day 2 Projects – Tableau – Domain specific Hands on Projects
  • Requirement Gathering, Data Modelling
  • Use case preparation and wireframe design
  • Implementing the final dashboard
  • Testing the dashboard
Week 11/Day 3
Week 12/Day 1
A Practical Training on Oracle SQL
  • Introduction to RDBMS and SQL
  • ER and Data Diagrams
  • Installation of Oracle Database and SQL Developer
  • Working with essential SQL commands
  • Working with Text Data, Numerical Data and Dates
  • Working with Grouping Functions and Analytical Functions
  • Data acquisition and Data verification
  • Understanding SQL Threats
  • Testing Relational DB applications
Week 12/Day 2
Week 12/Day 3
A Practical Training on Oracle SQL
  • Working with Grouping Functions and Analytical Functions
  • Data acquisition and Data verification
  • Understanding SQL Threats
  • Testing Relational DB applications
Certification Examinations for Certification for Hadoop