KnowML-Begin your ML journey

About Machine Learning:


Machine learning is a field of computer application which is basically the statistics of the various data that a firm acquires and uses to learn better about its customers. The data that we search for, the surveys that we provide and the reviews which we produce are primarily the source data and Machine learning algorithms use these data to produce different statiscal data that can be used to determine the ratings of a product, the type of content a particular user might like to get notified about, similar sites/services available to the user's choice. Thus Machine learning has its roots with mathematical statistics and shares many similarities.



Common terms and their varied names between ML and Statistics:


Machine Learning Statistics
Learning Estimation Fitting
Hypothesis Testing Confirmatory Data Analysis
Example/Instance Data Point
Weights Parameters
Supervised Learning Regression/
Classification
Unsupervised Learning Clustering
Feature Covariate
Label Response


Data types and their operations:


Regression:

Predicts values or outcomes based on given independent variables.

Dimensional Reduction:

Uses an algorithm to turn raw data or "Unstructured data" into numeric values or "structured data".

Classification:

Predicts the label or category of an item based on a set of data points; labels are then assigned to characteristics.

Clustering:

Groups data points into characteristic clusters, which allows the gathering of valuable insights.



Videos to start with:


Introduction to predictive modeling


Types of predictive modeling


Stages of predictive modeling


Understanding hypothesis generation


Data extraction.


Understanding Data Exploration


Functions to read data in Python(jupyter notebook)


Variable Identification


Univariate analysis for continuous variables


Understanding Univariate Analysis for categorical variables


Understanding Bivariate Analysis


Understanding and treating missing values


Understanding Outlier Treatment


Understanding Variable Transformation


Basics of Model Building


Introduction to Problem Statement


Building first predictive model


Preparing the Dataset


Benchmark regression final


Classification Benchmark


Introduction to Evaluation Metrics


Confusion Matrix


Accuracy


Alternatives of Accuracy


Precision and Recall


Thresholding


AUC ROC


Log loss


Evaluation Metrics for regression Final


R2 and Adjusted R2


Introduction to kNN


Building a kNN model


Determining right value of k


How to calculate distance


Issue with distance based algorithms


Introduction to Overfitting and Underfitting Models.


What is Validation


Understanding Hold-Out Validation


Understanding k-fold cross validation.


Bias Variance Tradeoff.


Introduction to Linear Model


Understanding Gradient descent.


Gradient Des in Linear Regression


Convexity of cost function


Assumptions of Linear Regression


Introduction to Logistic Regression


Odds ratio.


Multiclass using Logistic Regression


Introduction to Decision Tree


Purity in Decision Trees


Terminologies Related to Decision Trees.


How to Select the Best Split Point in Decision Trees.


Chi-Square


Information Gain


Reduction in Variance


Reduction in Variance.


Optimizing Performance of Decision Trees.


Introduction to Feature Engineering.


Feature engineering


Feature Transformation.


Feature Scaling.


Combining Sparse Classes.


Feature Generation by binning.


Feature Interaction.


Feature Engineering on Date Time Features


Automated Feature Engineering


Introduction to Ensemble Models


Basic Ensemble techniques.


Why Ensemble Models Work Well


Bootstrap Sampling


Introduction to Random Forest.


Hyperparameters of Random Forest.


Introduction to Clustering.


Applications of Clustering


Evaluation Metrics for Clustering


Understanding K-Means


Challenges with K-Means


How to Choose Right k-Value