Coursera training, MOOC (2020). This specialization covers the concepts and tools needed throughout the entire data science pipeline, from asking the right kinds of questions to making inferences and publishing results. Topics covered include using R to clean, analyze, and visualize data, navigating the entire data science pipeline from data acquisition to publication, using GitHub to manage data science projects, and performing regression analysis, least squares and inference using regression models.
Course 1: Data scientist’s toolbox
- Data science fundamentals;
- R and Rstudio;
- Version control and GitHub;
- R Markdown, scientific thinking and big data.
Course 2: R programming
- Background and getting started;
- Programming with R;
- Loop functions and debugging;
- Simulation and code profiling.
Course 3: Getting and cleaning data
- Finding data and reading different file types;
- Data storage systems;
- Organizing, merging and managing data;
- Text and data manipulation in R.
Course 4: Exploratory data analysis
- Analytic graphics and base plotting in R;
- Lattice and ggplot2;
- Data dimension reduction;
- Cluster analysis techniques.
Course 5: Reproducible research
- Concepts, ideas and structure;
- Markdown and knitr;
- Reproducible research checklist;
- Evidence-based data analysis.
Course 6: Statistical inference
- Probability and expected values;
- Variability, distribution and asymptote;
- Intervals, testing and p-value;
- Power, bootstrapping and permutation tests.
Course 7: Regression models
- Least squares and linear regression;
- Linear and multivariate regression;
- Residuals and diagnostics;
- Logistic and Poisson regression.
Course 8: Practical machine learning
- Prediction, errors and cross validation;
- Caret package;
- Decision trees and random forests;
- Regularized regression and combining predictors.
Course 9: Developing data products
- Shiny, GoogleVis and Plotly;
- R Markdown and Leaflet;
- R Pakages and Swirl.