KnowML-Begin your ML journey

About Machine Learning:

Machine learning is a field of computer application which is basically the statistics of the various data that a firm acquires and uses to learn better about its customers. The data that we search for, the surveys that we provide and the reviews which we produce are primarily the source data and Machine learning algorithms use these data to produce different statiscal data that can be used to determine the ratings of a product, the type of content a particular user might like to get notified about, similar sites/services available to the user's choice. Thus Machine learning has its roots with mathematical statistics and shares many similarities.

Common terms and their varied names between ML and Statistics:

Machine Learning	Statistics
Learning	Estimation Fitting
Hypothesis Testing	Confirmatory Data Analysis
Example/Instance	Data Point
Weights	Parameters
Supervised Learning	Regression/ Classification
Unsupervised Learning	Clustering
Feature	Covariate
Label	Response

Data types and their operations:

Regression:

Predicts values or outcomes based on given independent variables.

Dimensional Reduction:

Uses an algorithm to turn raw data or "Unstructured data" into numeric values or "structured data".

Classification:

Predicts the label or category of an item based on a set of data points; labels are then assigned to characteristics.

Clustering:

Groups data points into characteristic clusters, which allows the gathering of valuable insights.

Videos to start with:

Introduction to predictive modeling

Types of predictive modeling

Stages of predictive modeling

Understanding hypothesis generation

Data extraction.

Understanding Data Exploration

Functions to read data in Python(jupyter notebook)

Variable Identification

Univariate analysis for continuous variables

Understanding Univariate Analysis for categorical variables

Understanding Bivariate Analysis

Understanding and treating missing values

Understanding Outlier Treatment

Understanding Variable Transformation

Basics of Model Building

Introduction to Problem Statement

Building first predictive model

Preparing the Dataset

Benchmark regression final

Classification Benchmark

Introduction to Evaluation Metrics

Confusion Matrix

Accuracy

Alternatives of Accuracy

Precision and Recall

Thresholding

AUC ROC

Log loss

Evaluation Metrics for regression Final

R2 and Adjusted R2

Introduction to kNN

Building a kNN model

Determining right value of k

How to calculate distance

Issue with distance based algorithms

Introduction to Overfitting and Underfitting Models.

What is Validation

Understanding Hold-Out Validation

Understanding k-fold cross validation.

Bias Variance Tradeoff.

Introduction to Linear Model

Understanding Gradient descent.

Gradient Des in Linear Regression

Convexity of cost function

Assumptions of Linear Regression

Introduction to Logistic Regression

Odds ratio.

Multiclass using Logistic Regression

Introduction to Decision Tree

Purity in Decision Trees

Terminologies Related to Decision Trees.

How to Select the Best Split Point in Decision Trees.

Chi-Square

Information Gain

Reduction in Variance

Reduction in Variance.

Optimizing Performance of Decision Trees.

Introduction to Feature Engineering.

Feature engineering

Feature Transformation.

Feature Scaling.

Combining Sparse Classes.

Feature Generation by binning.

Feature Interaction.

Feature Engineering on Date Time Features

Automated Feature Engineering

Introduction to Ensemble Models

Basic Ensemble techniques.

Why Ensemble Models Work Well

Bootstrap Sampling

Introduction to Random Forest.

Hyperparameters of Random Forest.

Introduction to Clustering.

Applications of Clustering

Evaluation Metrics for Clustering

Understanding K-Means

Challenges with K-Means

How to Choose Right k-Value