Pig For Wrangling Big Data

SGD 12 | 599
SGD 50

Loading the player...
Lectures
41
Language
English
Students
2
Reviews
Category
Business
Sub-Category
Data Analytics

15 days Money back Gurantee

Unlimited Access for 1 year

Android, iPhone and iPad Access

Certificate of Completion

Course Summary :

Prerequisites: Working with Pig requires some basic knowledge of the SQL query language, a brief understanding of the Hadoop eco-system and MapReduce 

Taught by a team which includes 2 Stanford-educated, ex-Googlers  and 2 ex-Flipkart Lead Analysts. This team has decades of practical experience in working with large-scale data processing jobs. 

Pig is aptly named, it is omnivorous, will consume any data that you throw at it and bring home the bacon!

Let's parse that 

omnivorousPig works with unstructured data. It has many operations which are very SQL-like but Pig can perform these operations on data sets which have no fixed schema. Pig is great at wrestling data into a form which is clean and can be stored in a data warehouse for reporting and analysis.

bring home the baconPig allows you to transform data in a way that makes is structured, predictable and useful, ready for consumption.

What's Covered: 

Pig Basics: Scalar and Complex data types (Bags, Maps, Tuples), basic transformations such as Filter, Foreach, Load, Dump, Store, Distinct, Limit, Order by and other built-in functions.

Advanced Data Transformations and Optimizations: The mind-bending Nested Foreach, Joins and their optimizations using "parallel", "merge", "replicated" and other keywords, Co-groups and Semi-joins, debugging using Explain and Illustrate commands

Real-world example: Clean up server logs using Pig

 

What am I going to get from this course?

  • Work with unstructured data to extract information, transform it and store it in a usable form
  • Write intermediate level Pig scripts to munge data
  • Optimize Pig operations which work on large data sets

Pre-Requisites :

  • A basic understanding of SQL and working with data
  • A basic understanding of the Hadoop eco-system and MapReduce tasks

Target Audience :

  • Yep! Analysts who want to wrangle large, unstructured data into shape
  • Yep! Engineers who want to parse and extract useful information from large datasets
Curriculum
Section 1 - You, This Course and Us
      1 : You, this course and Us01:46
    Section 2 - Where does Pig fit in?
        2 : Pig and the Hadoop ecosystem
        3 : Install and set up
        4 : How does Pig compare with Hive?
        5 : Pig Latin as a data flow language
        6 : Pig with HBase
        7 : DOWNLOAD 1 SECTION 2 PigDeck1
        8 : DOWNLOAD 2 SECTION 2 EexampleDatasets
      Section 3 - Pig Basics
          9 : Operating modes, running a Pig script, the Grunt shell
          10 : Loading data and creating our first relation
          11 : Scalar data types
          12 : Complex data types - The Tuple, Bag and Map
          13 : Partial schema specification for relations
          14 : Displaying and storing relations - The dump and store commands
          15 : DOWNLOAD SECTION 3 PigDeck2
        Section 4 - Pig Operations And Data Transformations
            16 : Selecting fields from a relation
            17 : Built-in functions
            18 : Evaluation functions
            19 : Using the distinct, limit and order by keywords
            20 : Filtering records based on a predicate
            21 : DOWNLOAD SECTION 4 PigDeck3
          Section 5 - Advanced Data Transformations
              22 : Group by and aggregate transformations
              23 : Combining datasets using Join
              24 : Concatenating datasets using Union
              25 : Generating multiple records by flattening complex fields
              26 : Using Co-Group, Semi-Join and Sampling records
              27 : The nested Foreach command
              28 : Debug Pig scripts using Explain and Illustrate
              29 : DOWNLOAD SECTION 5 PigDeck4
            Section 6 - Optimizing Data Transformations
                30 : Parallelize operations using the Parallel keyword
                31 : Join Optimizations: Multiple relations join, large and small relation join
                32 : Join Optimizations: Skew join and sort-merge join
                33 : Common sense optimizations
              Section 7 - A real-world example
                  34 : Parsing server logs
                  35 : Summarizing error logs
                  36 : DOWNLOAD SECTION 7 PigDeck5
                Section 8 - Installing Hadoop in a Local Environment
                    37 : Hadoop Install Modes
                    38 : Setup a Virtual Linux Instance (For Windows users)
                    39 : Hadoop Standalone mode Install
                    40 : Hadoop Pseudo-Distributed mode Install
                    41 : DOWNLOAD SECTION 8 Install-Guides

                Reviews

Instructor :

Loonycorn A 4-ppl team;ex-Google.

Biography

Loonycorn is us, Janani Ravi, Vitthal Srinivasan, Swetha Kolalapudi and Navdeep Singh. Between the four of us, we have studied at Stanford, IIM Ahmedabad, the IITs and have spent years (decades, actually) working in tech, in the Bay Area, New York, Singapore and Bangalore. Janani: 7 years at Google (New York, Singapore); Studied at Stanford; also worked at Flipkart and Microsoft Vitthal: Also Google (Singapore) and studied at Stanford; Flipkart, Credit Suisse and INSEAD too Swetha: Early Flipkart employee, IIM Ahmedabad and IIT Madras alum Navdeep: longtime Flipkart employee too, and IIT Guwahati alum We think we might have hit upon a neat way of teaching complicated tech courses in a funny, practical, engaging way, which is why we are so excited to be here on Unanth! We hope you will try our offerings, and think you'll like them :-)

Reviews

Average Rating