François Chung, Ph.D.

Tag: statistical model

Data science specialization

Data science specialization

Coursera training, MOOC (2020). This specialization covers the concepts and tools needed throughout the entire data science pipeline, from asking the right kinds of questions to making inferences and publishing results. Topics covered include using R to clean, analyze, and visualize data, navigating the entire data science pipeline from data acquisition to publication, using GitHub to manage data science projects, and performing regression analysis, least squares and inference using regression models.

Course 1: Data scientist’s toolbox

Main topics:

  • Data science fundamentals;
  • R and Rstudio;
  • Version control and GitHub;
  • R Markdown, scientific thinking and big data.

Course 2: R programming

Main topics:

  • Background and getting started;
  • Programming with R;
  • Loop functions and debugging;
  • Simulation and code profiling.

Course 3: Getting and cleaning data

Main topics:

  • Finding data and reading different file types;
  • Data storage systems;
  • Organizing, merging and managing data;
  • Text and data manipulation in R.

Course 4: Exploratory data analysis

Main topics:

  • Analytic graphics and base plotting in R;
  • Lattice and ggplot2;
  • Data dimension reduction;
  • Cluster analysis techniques.

Course 5: Reproducible research

Main topics:

  • Concepts, ideas and structure;
  • Markdown and knitr;
  • Reproducible research checklist;
  • Evidence-based data analysis.

Course 6: Statistical inference

Main topics:

  • Probability and expected values;
  • Variability, distribution and asymptote;
  • Intervals, testing and p-value;
  • Power, bootstrapping and permutation tests.

Course 7: Regression models

Main topics:

  • Least squares and linear regression;
  • Linear and multivariate regression;
  • Residuals and diagnostics;
  • Logistic and Poisson regression.

Course 8: Practical machine learning

Main topics:

  • Prediction, errors and cross validation;
  • Caret package;
  • Decision trees and random forests;
  • Regularized regression and combining predictors.

Course 9: Developing data products

Main topics:

  • Shiny, GoogleVis and Plotly;
  • R Markdown and Leaflet;
  • R Pakages and Swirl.

References

Related articles

ODSC APAC Conference 2023 (ODSC conference)
Spark fundamentals (Cognitive Class training)
Hadoop fundamentals (Cognitive Class training)
AWS: foundations and machine learning (AWS training)

Learn more

Introduction to operations management

Introduction to operations management

Coursera training, MOOC (2015). Given online by The Wharton School of the University of Pennsylvania (US), this training introduces the management skills required to run operations (e.g. to run a restaurant or a hospital). Specifically, the training explains how to improve productivity, increase responsiveness, provide more choice to the customer and deliver higher quality standards. The aim is to provide strategic analyses and solutions to improve business processes.

Module 1: Process analysis

Main topics:

  • Find a bottleneck;
  • Compute throughput;
  • Apply Little’s Law;
  • Compute inventory turns;
  • Deal with multiple flow units.

Module 2: Productivity

Main topics:

  • Understand the sources of waste;
  • Balance a line and compute Takt time;
  • Overall Equipment Effectiveness (OEE) analysis;
  • Key Performance Indicator (KPI) tree.

Module 3: Variety

Main topics:

  • Determine the impact of set-ups on capacity;
  • Analyze set-ups;
  • Single-Minute Exchange of Die (SMED);
  • Strategies to deal with variety;
  • Limitations to variety.

Module 4: Responsiveness

Main topics:

  • Waiting time analysis;
  • Map out the customer journey;
  • Predict customer loss rates.

Module 5: Quality

Main topics:

  • Analyze processes with yield losses;
  • Toyota production system;
  • Six Sigma;
  • Statistical Process Control (SPC).

Reference

Learn more