Autoplay
Autocomplete
Previous Lesson
Complete and Continue
IGTC01DA - Big Data Analytics with Hadoop and Apache Spark (EN)
01 - Introduction
01 - The combined power of Spark and Hadoop Distributed File System (HDFS) (1:20)
Ex_Files_Big_Data_Hadoop_Apache_Spark
Ex_Files_Big_Data_Hadoop_Apache_Spark
02 - 1. Introduction and Setup
02 - Apache Spark overview (0:45)
01 - Apache Hadoop overview (1:50)
03 - Integrating Hadoop and Spark (1:19)
04 - Setting up the environment (3:20)
05 - Using exercise files (4:06)
03 - 2. HDFS Data Modeling for Analytics
01 - Storage formats (2:20)
02 - Compression (2:05)
03 - Partitioning (2:03)
04 - Bucketing (1:17)
05 - Best practices for data storage (1:19)
04 - 3. Data Ingestion with Spark
01 - Reading external files into Spark (2:33)
02 - Writing to HDFS (2:00)
03 - Parallel writes with partitioning (1:11)
04 - Parallel writes with bucketing (1:27)
05 - Best practices for ingestion (0:55)
05 - 4. Data Extraction with Spark
02 - Reading HDFS files with schema (1:44)
01 - How Spark works (3:00)
03 - Reading partitioned data (1:32)
04 - Reading bucketed data (0:55)
05 - Best practices for data extraction (1:08)
06 - 5. Optimizing Spark Processing
01 - Pushing down projections (1:45)
02 - Pushing down filters (1:50)
03 - Managing partitions (2:42)
04 - Managing shuffling (2:35)
05 - Improving joins (2:04)
06 - Storing intermediate results (2:39)
07 - Best practices for data processing (1:18)
07 - 6. Use Case Project
02 - Data loading (1:38)
01 - Problem definition (1:57)
03 - Total score analytics (1:31)
04 - Average score analytics (1:18)
05 - Top student analytics (1:48)
08 - Conclusion
01 - Next steps (0:44)
Teach online with
05 - Improving joins
Lesson content locked
If you're already enrolled,
you'll need to login
.
Enroll in Course to Unlock