François Chung, Ph.D.
Data science specialization

Data science specialization

Coursera training, MOOC (2020). This specialization covers the concepts and tools needed throughout the entire data science pipeline, from asking the right kinds of questions to making inferences and publishing results. Topics covered include using R to clean, analyze, and visualize data, navigating the entire data science pipeline from data acquisition to publication, using GitHub to manage data science projects, and performing regression analysis, least squares and inference using regression models.

Course 1: Data scientist’s toolbox

Main topics:

  • Data science fundamentals;
  • R and Rstudio;
  • Version control and GitHub;
  • R Markdown, scientific thinking and big data.

Course 2: R programming

Main topics:

  • Background and getting started;
  • Programming with R;
  • Loop functions and debugging;
  • Simulation and code profiling.

Course 3: Getting and cleaning data

Main topics:

  • Finding data and reading different file types;
  • Data storage systems;
  • Organizing, merging and managing data;
  • Text and data manipulation in R.

Course 4: Exploratory data analysis

Main topics:

  • Analytic graphics and base plotting in R;
  • Lattice and ggplot2;
  • Data dimension reduction;
  • Cluster analysis techniques.

Course 5: Reproducible research

Main topics:

  • Concepts, ideas and structure;
  • Markdown and knitr;
  • Reproducible research checklist;
  • Evidence-based data analysis.

Course 6: Statistical inference

Main topics:

  • Probability and expected values;
  • Variability, distribution and asymptote;
  • Intervals, testing and p-value;
  • Power, bootstrapping and permutation tests.

Course 7: Regression models

Main topics:

  • Least squares and linear regression;
  • Linear and multivariate regression;
  • Residuals and diagnostics;
  • Logistic and Poisson regression.

Course 8: Practical machine learning

Main topics:

  • Prediction, errors and cross validation;
  • Caret package;
  • Decision trees and random forests;
  • Regularized regression and combining predictors.

Course 9: Developing data products

Main topics:

  • Shiny, GoogleVis and Plotly;
  • R Markdown and Leaflet;
  • R Pakages and Swirl.

References

Related articles

Spark fundamentals (Cognitive Class training)
Hadoop fundamentals (Cognitive Class training)
AWS: foundations and machine learning (AWS training)

Learn more