François Chung, Ph.D.
Spark fundamentals

Spark fundamentals

Cognitive Class training, MOOC (2020). This learning path addresses the fundamentals of Apache Spark, an open source engine for large scale data processing that is revolutionizing the analytics and big data world. This training is an opportunity to learn from industry leaders about Spark, which is built around speed, ease of use and analytics, and provides hands-on opportunities and projects to build confidence with the Spark toolset.

Course 1: Spark fundamentals I

Main topics:

  • Introduction to Spark;
  • Resilient Distributed Dataset (RDD) and DataFrames;
  • Spark application programming;
  • Introduction to Spark libraries;
  • Spark configuration, monitoring and tuning.

Course 2: Spark fundamentals II

Main topics:

  • Introduction to notebooks;
  • RDD architecture;
  • Optimizing transformations and actions;
  • Caching and serialization;
  • Developing and testing.

Course 3: Spark MLlib

Main topics:

  • Spark MLlib data types;
  • Review of algorithms;
  • Decision trees and random forests;
  • Spark MLlib clustering.

Course 4: Exploring GraphX

Main topics:

  • Introduction to Graph-Parallel;
  • Exploring graph operators;
  • Visualizing and modifying GraphX;
  • Aggregation and caching.

Course 5: Big data in R using Spark

Main topics:

  • Introduction to SparkR;
  • Data manipulation in SparkR;
  • Machine learning in SparkR.

References

Training

Spark fundamentals I (course certificate)
Spark – Level 1 (certification badge)
Spark fundamentals II (course certificate)
Spark MLlib (course certificate)
Exploring GraphX (course certificate)
Big data in R using Spark (course certificate)
Spark - Level 2 (certification badge)

Related articles

Hadoop fundamentals (Cognitive Class training)
Data science specialization (Coursera training)

Learn more