× Successfully! Added to wish list

Taming Big Data with Apache Spark and Python - Hands On!!

By: Frank Kane

  • 5
  • (7)
  • 05:08:15
  • 46
  • 19
  • Language: English
599 4990
Apply
Promocode successfully applied Promocode not valid

Course Summary

“Big data" analysis is a hot and highly valuable skill – and this course will teach you the hottest technology in big data: Apache Spark. Employers including AmazonEBayNASA JPL, and <

Read More

Target Audience

  • People with some software development background who want to learn the hottest technology in big data analysis will want to check this out. This course focuses on Spark from a software development standpoint; we introduce some machine learning and data mining concepts along the way, but that's not the focus. If you want to learn how to use Spark to carve up huge datasets and extract meaning from them, then this course is for you.
  • If you've never written a computer program or a script before, this course isn't for you - yet. I suggest starting with a Python course first, if programming is new to you.
  • If your software development job involves, or will involve, processing large amounts of data, you need to know about Spark.
  • If you're training for a new career in data science or big data, Spark is an important part of it.

Pre-Requisites

  • Access to a personal computer. This course uses Windows, but the sample code will work fine on Linux as well.
  • Some prior programming or scripting experience. Python experience will help a lot, but you can pick it up as we go.

Curriculum

  • Introduction
    02:16
  • How to Use This Course
    01:41
  • [Activity]Getting Set Up: Installing Python, a JDK, Spark, and its Dependencies.
    14:51
  • [Activity] Installing the MovieLens Movie Rating Dataset
    03:35
  • [Activity] Run your first Spark program! Ratings histogram example.
    04:53
  • Introduction to Spark
    10:12
  • The Resilient Distributed Dataset (RDD)
    12:17
  • Ratings Histogram Walkthrough
    13:34
  • Key/Value RDD's, and the Average Friends by Age Example
    16:13
  • [Activity] Running the Average Friends by Age Example
    05:39
  • Filtering RDD's, and the Minimum Temperature by Location Example
    08:10
  • [Activity]Running the Minimum Temperature Example, and Modifying it for Maximums
    05:09
  • [Activity] Running the Maximum Temperature by Location Example
    03:22
  • [Activity] Counting Word Occurrences using flatmap()
    07:28
  • [Activity] Improving the Word Count Script with Regular Expressions
    04:45
  • [Activity] Sorting the Word Count Results
    07:45
  • Customer Orders Assignment
    04:01
  • Customer Order Solution
    05:08
  • [Activity] Find the Most Popular Movie
    05:53
  • [Activity] Use Broadcast Variables to Display Movie Names Instead of ID Numbers
    08:24
  • Find the Most Popular Superhero in a Social Graph
    04:29
  • [Activity] Run the Script - Discover Who the Most Popular Superhero is!
    06:00
  • Superhero Degrees of Separation: Introducing Breadth-First Search
    07:54
  • Superhero Degrees of Separation: Accumulators, and Implementing BFS in Spark
    06:44
  • [Activity] Superhero Degrees of Separation: Review the Code and Run it
    09:14
  • Item-Based Collaborative Filtering in Spark, cache(), and persist()
    10:13
  • [Activity] Running the Similar Movies Script using Spark's Cluster Manager
    10:55
  • [Exercise] Improve the Quality of Similar Movies
    02:58
  • Introducing Elastic MapReduce
    05:08
  • [Activity] Setting up your AWS / Elastic MapReduce Account and Setting Up PuTTY
    09:56
  • Partitioning
    04:21
  • Create Similar Movies from One Million Ratings - Part 1
    05:12
  • [Activity] Create Similar Movies from One Million Ratings - Part 2
    11:27
  • Create Similar Movies from One Million Ratings - Part 3
    03:28
  • Troubleshooting Spark on a Cluster
    03:43
  • More Troubleshooting, and Managing Dependencies
    05:47
  • Introducing SparkSQL
    06:08
  • Executing SQL commands and SQL-style functions on a DataFrame
    08:16
  • Using DataFrames instead of RDD's
    05:52
  • Introducing MLLib
    08:10
  • [Activity] Using MLLib to Produce Movie Recommendations
    02:56
  • Analyzing the ALS Recommendations Results
    04:53
  • Using DataFrames with MLLib
    07:31
  • Spark Streaming and GraphX
    07:36
  • Downloadable Material and Exercises
  • Learning More about Spark and Data Science
    04:09

About the Author

Frank Kane, Founder of Sundog Software, LLC

Founder & CEO of Sundog Software, makers of the SilverLining Sky, Cloud, and Weather SDK and the Triton Ocean SDK. Broad and deep experience in software engineering, computer graphics, technical leadership, and machine learning.

More From Author

Reviews

Anjali Mehta
5

good course to start learning about Spark

Taming Big Data with Apache Spark and Python - Hands On!!

By: Frank Kane 5
  • 05:08:15
  • 46
  • 19
  • Language: English
4990 599
  • 15 days Money back Gurantee
  • Unlimited Access
  • Android, iPhone and iPad Access
  • Certificate of Completion

Course Summary

“Big data" analysis is a hot and highly valuable skill – and this course will teach you the hottest technology in big data: Apache Spark. Employers including AmazonEBayNASA JPL, and <

Read More

Target Audience

  • People with some software development background who want to learn the hottest technology in big data analysis will want to check this out. This course focuses on Spark from a software development standpoint; we introduce some machine learning and data mining concepts along the way, but that's not the focus. If you want to learn how to use Spark to carve up huge datasets and extract meaning from them, then this course is for you.
  • If you've never written a computer program or a script before, this course isn't for you - yet. I suggest starting with a Python course first, if programming is new to you.
  • If your software development job involves, or will involve, processing large amounts of data, you need to know about Spark.
  • If you're training for a new career in data science or big data, Spark is an important part of it.

Pre-Requisites

  • People with some software development background who want to learn the hottest technology in big data analysis will want to check this out. This course focuses on Spark from a software development standpoint; we introduce some machine learning and data mining concepts along the way, but that's not the focus. If you want to learn how to use Spark to carve up huge datasets and extract meaning from them, then this course is for you.
  • If you've never written a computer program or a script before, this course isn't for you - yet. I suggest starting with a Python course first, if programming is new to you.
  • If your software development job involves, or will involve, processing large amounts of data, you need to know about Spark.
  • If you're training for a new career in data science or big data, Spark is an important part of it.

About the Author

Frank Kane, Founder of Sundog Software, LLC

Founder & CEO of Sundog Software, makers of the SilverLining Sky, Cloud, and Weather SDK and the Triton Ocean SDK. Broad and deep experience in software engineering, computer graphics, technical leadership, and machine learning.

More From Author

Review & Rating

Anjali Mehta 5

good course to start learning about Spark