Cognitive Class training, MOOC (2020). This learning path addresses the fundamentals of Apache Spark, an open source engine for large scale data processing that is revolutionizing the analytics and big data world. This training is an opportunity to learn from industry leaders about Spark, which is built around speed, ease of use and analytics, and provides hands-on opportunities and projects to build confidence with the Spark toolset.
Course 1: Spark fundamentals I
- Introduction to Spark;
- Resilient Distributed Dataset (RDD) and DataFrames;
- Spark application programming;
- Introduction to Spark libraries;
- Spark configuration, monitoring and tuning.
Course 2: Spark fundamentals II
- Introduction to notebooks;
- RDD architecture;
- Optimizing transformations and actions;
- Caching and serialization;
- Developing and testing.
Course 3: Spark MLlib
- Spark MLlib data types;
- Review of algorithms;
- Decision trees and random forests;
- Spark MLlib clustering.
Course 4: Exploring GraphX
- Introduction to Graph-Parallel;
- Exploring graph operators;
- Visualizing and modifying GraphX;
- Aggregation and caching.
Course 5: Big data in R using Spark
- Introduction to SparkR;
- Data manipulation in SparkR;
- Machine learning in SparkR.
Spark fundamentals I (course certificate)
Spark – Level 1 (certification badge)
Spark fundamentals II (course certificate)
Spark MLlib (course certificate)
Exploring GraphX (course certificate)
Big data in R using Spark (course certificate)
Spark - Level 2 (certification badge)