François Chung, Ph.D.

Tag: hive

Hadoop fundamentals

Hadoop fundamentals

Cognitive Class training, MOOC (2020). This learning path presents Hadoop, which is an open source framework for distributed storage and processing of big data. The training covers content that is critical to anyone's success in this realm by explaining the Hadoop conceptual design, introducing MapReduce, YARN (Yet Another Resource Negotiator) and Hive, then explaining how to use Hadoop and manipulate data without the use of complex coding.

Course 1: Hadoop 101

Main topics:

  • Introduction to Hadoop;
  • Hadoop architecture and HDFS;
  • Hadoop administration;
  • Hadoop components.

Course 2: MapReduce and YARN

Main topics:

  • Introduction to MapReduce and YARN;
  • Limitations of Hadoop v1 and MapReduce v1;
  • YARN architecture.

Course 3: Moving data into Hadoop

Main topics:

  • Load scenarios;
  • Using Sqoop;
  • Flume overview;
  • Using Data Click.

Course 4: Accessing Hadoop data using Hive

Main topics:

  • Introduction to Hive;
  • Hive DDL - Data Definition Language;
  • Hive DML - Data Manipulation Language;
  • Hive operators and functions.

References

Training

Hadoop 101 (course certificate)
Hadoop Foundations – Level 1 (certification badge)
MapReduce and YARN (course certificate)
Hadoop Programming – Level 1 (certification badge)
Moving data into Hadoop (course certificate)
Hadoop Administration – Level 1 (certification badge)
Accessing Hadoop data using Hive (course certificate)
Hadoop Data Access – Level 1 (certification badge)
Hadoop Foundations – Level 2 (certification badge)

Related articles

Spark fundamentals (Cognitive Class training)
Data science specialization (Coursera training)

Learn more