A step-by-step complete guide to become a machine learning engineer
About This Video
Learn how to apply real-world data cleansing techniques to your data
Learn advanced data cleansing techniques
Learn how to prepare data in a way that avoids data leakage, and in turn, incorrect model evaluation
In Detail
Data preparation may be the most important part of a machine learning project. It is the most time-consuming part, although it is the least …
Data Cleansing Master Class in Python
Video description
A step-by-step complete guide to become a machine learning engineer
About This Video
Learn how to apply real-world data cleansing techniques to your data
Learn advanced data cleansing techniques
Learn how to prepare data in a way that avoids data leakage, and in turn, incorrect model evaluation
In Detail
Data preparation may be the most important part of a machine learning project. It is the most time-consuming part, although it is the least discussed topic. Data preparation, sometimes referred to as data preprocessing, is the act of transforming raw data into a form that is appropriate for modeling.
Machine learning algorithms require input data to be numbered, and most algorithm implementations maintain this expectation. Therefore, if your data contains data types and values that are not numbers, such as labels, you will need to change the data into numbers. Further, specific machine learning algorithms have expectations regarding the data types, scale, probability distribution, and relationships between input variables, and you may need to change the data to meet these expectations.
In this course, you will learn data imputation and advanced data cleansing techniques, how to apply real-world data cleansing techniques to your data, advanced data cleansing techniques. Also, learn how to prepare data in a way that avoids data leakage, and in turn, incorrect model evaluation.
By the end of this course, you will perform data preprocessing and master data cleaning skills.
Who this book is for
This course is for you if you are serious about becoming a machine learning engineer in the real world. You will need a solid foundation in Python and should understand the basics of machine learning. Also, you should have some expertise with machine learning libraries.
Common Data Preparation Tasks - Feature Engineering
Common Data Preparation Tasks - Dimensionality Reduction
Data Leakage
Problem with NaÏve Data Preparation
Case Study: Data Leakage: Train / Test / Split NaÏve Approach
Case Study: Data Leakage: Train / Test / Split Correct Approach
Case Study: Data Leakage: K-Fold NaÏve Approach
Case Study: Data Leakage: K-Fold Correct Approach
Chapter 3 : Data Cleansing
Data Cleansing Overview
Identify Columns That Contain a Single Value
Identify Columns with Few Values
Remove Columns with Low Variance
Identify and Remove Rows That Contain Duplicate Data
Defining Outliers
Remove Outliers - The Standard Deviation Approach
Remove Outliers - The IQR Approach
Automatic Outlier Detection
Mark Missing Values
Remove Rows with Missing Values
Statistical Imputation
Mean Value Imputation
Simple Imputer with Model Evaluation
Compare Different Statistical Imputation Strategies
K-Nearest Neighbors Imputation
KNNImputer and Model Evaluation
Iterative Imputation
IterativeImputer and Model Evaluation
IterativeImputer and Different Imputation Order
Chapter 4 : Feature Selection
Feature Selection Introduction
Feature Selection Defined
Statistics for Feature Selection
Loading a Categorical Dataset
Encode the Dataset for Modelling
Chi-Squared
Mutual Information
Modeling with Selected Categorical Features
Feature Selection with ANOVA on Numerical Input
Feature Selection with Mutual Information
Modeling with Selected Numerical Features
Tuning a Number of Selected Features
Select Features for Numerical Output
Linear Correlation with Correlation Statistics
Linear Correlation with Mutual Information
Baseline and Model Built Using Correlation
Model Built Using Mutual Information Features
Tuning Number of Selected Features
Recursive Feature Elimination
RFE for Classification
RFE for Regression
RFE Hyperparameters
Feature Ranking for RFE
Feature Importance Scores Defined
Feature Importance Scores: Linear Regression
Feature Importance Scores: Logistic Regression and CART
Feature Importance Scores: Random Forests
Permutation Feature Importance
Feature Selection with Importance
Chapter 5 : Data Transforms
Scale Numerical Data
Diabetes Dataset for Scaling
MinMaxScaler Transform
StandardScaler Transform
Robust Scaling Data
Robust Scaler Applied to Dataset
Explore Robust Scaler Range
Nominal and Ordinal Variables
Ordinal Encoding
One-Hot Encoding Defined
One-Hot Encoding
Dummy Variable Encoding
Ordinal Encoder Transform on Breast Cancer Dataset
Make Distributions More Gaussian
Power Transform on Contrived Dataset
Power Transform on Sonar Dataset
Box-Cox on Sonar Dataset
Yeo-Johnson on Sonar Dataset
Polynomial Features
Effect of Polynomial Degrees
Chapter 6 : Advanced Transforms
Transforming Different Data Types
The ColumnTransformer
The ColumnTransformer on Abalone Dataset
Manually Transform Target Variable
Automatically Transform Target Variable
Challenge of Preparing New Data for a Model
Save Model and Data Scaler
Load and Apply Saved Scalers
Chapter 7 : Dimensionality Reduction
Curse of Dimensionality
Techniques for Dimensionality Reduction
Linear Discriminant Analysis
Linear Discriminant Analysis Demonstrated
Principal Component Analysis
Start your Free Trial Self paced Go to the Course We have partnered with providers to bring you collection of courses, When you buy through links on our site, we may earn an affiliate commission from provider.
This site uses cookies. By continuing to use this website, you agree to their use.I Accept