Data science specialization
Coursera training, MOOC (2020). This specialization covers the concepts and tools needed throughout the entire data science pipeline, from asking the right kinds of questions to making inferences and publishing results. Topics covered include using R to clean, analyze, and visualize data, navigating the entire data science pipeline from data acquisition to publication, using GitHub to manage data science projects, and performing regression analysis, least squares and inference using regression models.
Course 1: Data scientist’s toolbox
Main topics:
- Data science fundamentals;
- R and Rstudio;
- Version control and GitHub;
- R Markdown, scientific thinking and big data.
Course 2: R programming
Main topics:
- Background and getting started;
- Programming with R;
- Loop functions and debugging;
- Simulation and code profiling.
Course 3: Getting and cleaning data
Main topics:
- Finding data and reading different file types;
- Data storage systems;
- Organizing, merging and managing data;
- Text and data manipulation in R.
Course 4: Exploratory data analysis
Main topics:
- Analytic graphics and base plotting in R;
- Lattice and ggplot2;
- Data dimension reduction;
- Cluster analysis techniques.
Course 5: Reproducible research
Main topics:
- Concepts, ideas and structure;
- Markdown and knitr;
- Reproducible research checklist;
- Evidence-based data analysis.
Course 6: Statistical inference
Main topics:
- Probability and expected values;
- Variability, distribution and asymptote;
- Intervals, testing and p-value;
- Power, bootstrapping and permutation tests.
Course 7: Regression models
Main topics:
- Least squares and linear regression;
- Linear and multivariate regression;
- Residuals and diagnostics;
- Logistic and Poisson regression.
Course 8: Practical machine learning
Main topics:
- Prediction, errors and cross validation;
- Caret package;
- Decision trees and random forests;
- Regularized regression and combining predictors.
Course 9: Developing data products
Main topics:
- Shiny, GoogleVis and Plotly;
- R Markdown and Leaflet;
- R Pakages and Swirl.
References
Related articles
ODSC APAC Conference 2023 (ODSC conference)
Spark fundamentals (Cognitive Class training)
Hadoop fundamentals (Cognitive Class training)
AWS: foundations and machine learning (AWS training)