Course Summary :
Prerequisites: Working with Pig requires some basic knowledge of the SQL query language, a brief understanding of the Hadoop eco-system and MapReduce
Taught by a team which includes 2 Stanford-educated, ex-Googlers and 2 ex-Flipkart Lead Analysts. This team has decades of practical experience in working with large-scale data processing jobs.
Pig is aptly named, it is omnivorous, will consume any data that you throw at it and bring home the bacon!
Let's parse that
omnivorous: Pig works with unstructured data. It has many operations which are very SQL-like but Pig can perform these operations on data sets which have no fixed schema. Pig is great at wrestling data into a form which is clean and can be stored in a data warehouse for reporting and analysis.
bring home the bacon: Pig allows you to transform data in a way that makes is structured, predictable and useful, ready for consumption.
Pig Basics: Scalar and Complex data types (Bags, Maps, Tuples), basic transformations such as Filter, Foreach, Load, Dump, Store, Distinct, Limit, Order by and other built-in functions.
Advanced Data Transformations and Optimizations: The mind-bending Nested Foreach, Joins and their optimizations using "parallel", "merge", "replicated" and other keywords, Co-groups and Semi-joins, debugging using Explain and Illustrate commands
Real-world example: Clean up server logs using Pig
What am I going to get from this course?
- Work with unstructured data to extract information, transform it and store it in a usable form
- Write intermediate level Pig scripts to munge data
- Optimize Pig operations which work on large data sets
- A basic understanding of SQL and working with data
- A basic understanding of the Hadoop eco-system and MapReduce tasks
Target Audience :
- Yep! Analysts who want to wrangle large, unstructured data into shape
- Yep! Engineers who want to parse and extract useful information from large datasets
Section 1 - You, This Course and Us
You, this course and Us01:46
Section 2 - Where does Pig fit in?
Pig and the Hadoop ecosystem
Install and set up
How does Pig compare with Hive?
Pig Latin as a data flow language
Pig with HBase
DOWNLOAD 1 SECTION 2 PigDeck1
DOWNLOAD 2 SECTION 2 EexampleDatasets
Section 3 - Pig Basics
Operating modes, running a Pig script, the Grunt shell
Loading data and creating our first relation
Scalar data types
Complex data types - The Tuple, Bag and Map
Partial schema specification for relations
Displaying and storing relations - The dump and store commands
DOWNLOAD SECTION 3 PigDeck2
Section 4 - Pig Operations And Data Transformations
Selecting fields from a relation
Using the distinct, limit and order by keywords
Filtering records based on a predicate
DOWNLOAD SECTION 4 PigDeck3
Section 5 - Advanced Data Transformations
Group by and aggregate transformations
Combining datasets using Join
Concatenating datasets using Union
Generating multiple records by flattening complex fields
Using Co-Group, Semi-Join and Sampling records
The nested Foreach command
Debug Pig scripts using Explain and Illustrate
DOWNLOAD SECTION 5 PigDeck4
Section 6 - Optimizing Data Transformations
Parallelize operations using the Parallel keyword
Join Optimizations: Multiple relations join, large and small relation join
Join Optimizations: Skew join and sort-merge join
Common sense optimizations
Section 7 - A real-world example
Parsing server logs
Summarizing error logs
DOWNLOAD SECTION 7 PigDeck5
Section 8 - Installing Hadoop in a Local Environment
Hadoop Install Modes
Setup a Virtual Linux Instance (For Windows users)
Hadoop Standalone mode Install
Hadoop Pseudo-Distributed mode Install
DOWNLOAD SECTION 8 Install-Guides
Loonycorn A 4-ppl team;ex-Google.
Loonycorn is us, Janani Ravi, Vitthal Srinivasan, Swetha Kolalapudi and Navdeep Singh. Between the four of us, we have studied at Stanford, IIM Ahmedabad, the IITs and have spent years (decades, actually) working in tech, in the Bay Area, New York, Singapore and Bangalore. Janani: 7 years at Google (New York, Singapore); Studied at Stanford; also worked at Flipkart and Microsoft Vitthal: Also Google (Singapore) and studied at Stanford; Flipkart, Credit Suisse and INSEAD too Swetha: Early Flipkart employee, IIM Ahmedabad and IIT Madras alum Navdeep: longtime Flipkart employee too, and IIT Guwahati alum We think we might have hit upon a neat way of teaching complicated tech courses in a funny, practical, engaging way, which is why we are so excited to be here on Unanth! We hope you will try our offerings, and think you'll like them :-)