× Successfully! Added to wish list

The Ultimate Hands-On Hadoop - Tame your Big Data!

By: Frank Kane

  • 5
  • (9)
  • 14:31:05
  • 95
  • 30
  • Language: English
449 4990
Apply
Promocode successfully applied Promocode not valid

Course Summary

The world of Hadoop and "Big Data" can be intimidating - hundreds of different technologies with cryptic names form the Hadoop ecosystem. With this course, you'll not only understand what those systems are and how they fit together - but you'll go

Read More

Target Audience

  • Software engineers and programmers who want to understand the larger Hadoop ecosystem, and use it to store, analyze, and vend "big data" at scale.
  • Project, program, or product managers who want to understand the lingo and high-level architecture of Hadoop.
  • Data analysts and database administrators who are curious about Hadoop and how it relates to their work.
  • System architects who need to understand the components available in the Hadoop ecosystem, and how they fit together.

Pre-Requisites

  • You will need access to a PC running 64-bit Windows, MacOS, or Linux with an Internet connection, if you want to participate in the hands-on activities and exercises. You must have at least 8GB of RAM on your system; 10GB or more is recommended. If your PC does not meet these requirements, you can still follow along in the course without doing hands-on activities.
  • Some activities will require some prior programming experience, preferably in Python or Scala.
  • A basic familiarity with the Linux command line will be very helpful.

Curriculum

  • [Activity]Introduction, and install Hadoop on your desktop!
    16:59
  • Hadoop Overview and History
    07:44
  • Overview of the Hadoop Ecosystem
    01:26
  • Tips for Using This Course
    16:46
  • HDFS: What it is, and how it works
    13:53
  • [Activity]Install the MovieLens dataset into HDFS using the Ambari UI
    06:20
  • [Activity]Install the MovieLens dataset into HDFS using the command line
    07:50
  • MapReduce: What it is, and how it works
    10:40
  • How MapReduce distributes processing
    12:57
  • MapReduce example: Break down movie ratings by rating score
    11:35
  • [Activity]Installing Python, MRJob, and nano
    07:33
  • [Activity]Code up the ratings histogram MapReduce job and run it
    07:36
  • [Exercise]Rank movies by their popularity
    07:06
  • [Activity]Check your results against mine!
    08:23
  • Introducing Ambari
    09:49
  • Introducing Pig
    06:25
  • Example: Find the oldest movie with a 5-star rating using Pig
    15:07
  • [Activity]Find old 5-star movies with Pig
    09:40
  • More Pig Latin
    07:34
  • [Exercise]Find the most-rated one-star movie
    01:56
  • Pig Challenge: Compare Your Results to Mine!
    05:37
  • Why Spark?
    10:06
  • The Resilient Distributed Dataset (RDD)
    10:13
  • [Activity] Find the movie with the lowest average rating - with RDD's
    15:33
  • Datasets and Spark 2.0
    06:28
  • [Activity] Find the movie with the lowest average rating - with DataFrames
    10:00
  • [Activity] Movie recommendations with MLLib
    12:16
  • [Exercise] Filter the lowest-rated movies by number of ratings
    02:51
  • [Activity] Check your results against mine!
    06:40
  • What is Hive?
    06:32
  • [Activity] Use Hive to find the most popular movie
    10:46
  • How Hive works
    09:11
  • [Exercise] Use Hive to find the movie with the highest average rating
    01:56
  • Compare your solution to mine.
    04:11
  • Integrating MySQL with Hadoop
    08:00
  • [Activity] Install MySQL and import our movie data
    07:36
  • [Activity] Use Sqoop to import data from MySQL to HFDS/Hive
    07:31
  • [Activity] Use Sqoop to export data from Hadoop to MySQL
    07:17
  • Why NoSQL?
    13:55
  • What is HBase
    12:55
  • [Activity] Import movie ratings into HBase
    13:29
  • [Activity] Use HBase with Pig to import data at scale.
    11:20
  • Cassandra overview
    14:51
  • [Activity] Installing Cassandra
    11:44
  • [Activity] Write Spark output into Cassandra
    11:01
  • MongoDB overview
    16:54
  • [Activity] Install MongoDB, and integrate Spark with MongoDB
    12:45
  • [Activity] Using the MongoDB shell
    07:48
  • Choosing a database technology
    15:59
  • [Exercise] Choose a database for a given problem
    05:00
  • Overview of Drill
    07:56
  • [Activity] Setting up Drill
    11:19
  • [Activity] Querying across multiple databases with Drill
    07:07
  • Overview of Phoenix
    08:55
  • [Activity] Install Phoenix and query HBase with it
    07:08
  • [Activity] Integrate Phoenix with Pig
    11:45
  • Overview of Presto
    06:39
  • [Activity] Install Presto, and query Hive with it.
    12:26
  • [Activity] Query both Cassandra and Hive using Presto.
    09:01
  • YARN explained
    10:01
  • Tez explained
    04:56
  • [Activity] Use Hive on Tez and measure the performance benefit
    08:35
  • Mesos explained
    07:13
  • ZooKeeper explained
    13:10
  • [Activity] Simulating a failing master with ZooKeeper
    06:47
  • Oozie explained
    11:56
  • [Activity] Set up a simple Oozie workflow
    16:39
  • Zeppelin overview
    05:01
  • [Activity] Use Zeppelin to analyze movie ratings, part 1
    12:28
  • [Activity] Use Zeppelin to analyze movie ratings, part 2
    09:46
  • Hue overview
    08:07
  • Other technologies worth mentioning
    04:35
  • Kafka explained
    09:48
  • [Activity] Setting up Kafka, and publishing some data.
    07:24
  • [Activity] Publishing web logs with Kafka
    10:21
  • Flume explained
    10:16
  • [Activity] Set up Flume and publish logs with it.
    07:46
  • [Activity] Set up Flume to monitor a directory and store its data in HDFS
    09:12
  • Spark Streaming: Introduction
    14:27
  • [Activity] Analyze web logs published with Flume using Spark Streaming
    14:20
  • [Exercise] Monitor Flume-published logs for errors in real time
    02:02
  • Exercise solution: Aggregating HTTP access codes with Spark Streaming
    04:24
  • Apache Storm: Introduction
    09:27
  • [Activity] Count words with Storm
    14:35
  • Flink: An Overview
    06:53
  • [Activity] Counting words with Flink
    10:20
  • The Best of the Rest
    09:24
  • Review: How the pieces fit together
    06:29
  • Understanding your requirements
    08:02
  • Sample application: consume webserver logs and keep track of top-sellers
    10:07
  • Sample application: serving movie recommendations to a website
    11:18
  • [Exercise] Design a system to report web sessions per day
    02:53
  • Exercise solution: Design a system to count daily sessions
    04:24
  • Books and online resources
    05:33
  • Bonus lecture: Discounts on my other big data / data science courses!
    02:26

About the Author

Frank Kane, Founder of Sundog Software, LLC

Founder & CEO of Sundog Software, makers of the SilverLining Sky, Cloud, and Weather SDK and the Triton Ocean SDK. Broad and deep experience in software engineering, computer graphics, technical leadership, and machine learning.

More From Author

Reviews

Anjali Mehta
5

Very good course for learning big data and hadoop.

The Ultimate Hands-On Hadoop - Tame your Big Data!

By: Frank Kane 5
  • 14:31:05
  • 95
  • 30
  • Language: English
4990 449
  • 15 days Money back Gurantee
  • Unlimited Access
  • Android, iPhone and iPad Access
  • Certificate of Completion

Course Summary

The world of Hadoop and "Big Data" can be intimidating - hundreds of different technologies with cryptic names form the Hadoop ecosystem. With this course, you'll not only understand what those systems are and how they fit together - but you'll go

Read More

Target Audience

  • Software engineers and programmers who want to understand the larger Hadoop ecosystem, and use it to store, analyze, and vend "big data" at scale.
  • Project, program, or product managers who want to understand the lingo and high-level architecture of Hadoop.
  • Data analysts and database administrators who are curious about Hadoop and how it relates to their work.
  • System architects who need to understand the components available in the Hadoop ecosystem, and how they fit together.

Pre-Requisites

  • Software engineers and programmers who want to understand the larger Hadoop ecosystem, and use it to store, analyze, and vend "big data" at scale.
  • Project, program, or product managers who want to understand the lingo and high-level architecture of Hadoop.
  • Data analysts and database administrators who are curious about Hadoop and how it relates to their work.
  • System architects who need to understand the components available in the Hadoop ecosystem, and how they fit together.

About the Author

Frank Kane, Founder of Sundog Software, LLC

Founder & CEO of Sundog Software, makers of the SilverLining Sky, Cloud, and Weather SDK and the Triton Ocean SDK. Broad and deep experience in software engineering, computer graphics, technical leadership, and machine learning.

More From Author

Review & Rating

Anjali Mehta 5

Very good course for learning big data and hadoop.