Navid Malekghaini's Personal Blog

My personal weblog for sharing and storing some of my activities related to computer science over the internet

Navid Malekghaini's Personal Blog

My personal weblog for sharing and storing some of my activities related to computer science over the internet

Navid Malekghaini's Personal Blog

Navid Malekghaini

Senior Software Developer @ Arctic Wolf
Prev. Senior SWE and Resaercher @ Huawei Canada, University of Waterloo x Orange R&D

University of Waterloo
Department of computer science
200 University Ave W, Waterloo, ON N2L 3G1, Canada

contact me
navidmalekedu (AT) gmail (DOT) com [ Primary Email ]
nmalekgh (AT) uwaterloo (DOT) ca

۱ مطلب با کلمه‌ی کلیدی «data cleaning» ثبت شده است

DKMA image missing

What is this repo about?

There is a dire need for effective methods to model and analyze the data and extract useful knowledge from it and to know how to act on it. In this series of notebooks you will learn the fundamental tools for assessing, preparing and analyzing data. You will learn to design a data and analysis pipeline to move from raw data to task solution. You will learn to implement a variety of analytical and machine learning algorithms to including supervised, unsupervised and other learning approaches.

Download From Github With Explanations

Part 1


  • Load and work with two famous datasets "Iris" and "Heart Disease"
  • Data cleaning approaches: filling missing values, noise reduction, normalization, and visualization
  • Visualization for understanding data: pair plots, scatter plots, correlation and data distribution analysis
  • Statistical analysis on data: correlation coefficient, statistical variables
  • KNN classifier with Sckit-learn: parameter tuning with cross validation, metrics analysis, plot analysis, AUC method analysis
  • Further tuning KNN classifier: weighted KNN approaches, algorithm selection, speed, etc.

Part 2


  • Two datasets: John Hopkins University CSSE COVID-19 ( covid_19_data), US 2020 Census
  • Preprocessing data: data cleaning, outlier dealing, normalization, missing value, etc.
  • Representation Learning: PCA, LDA, scree-plot and statistcial analysis, visualization insights, comparing the algorithms
  • Data analysis for classification: original, hybrid, or LDA/PCA constructed data
  • Tree based algorithms for classification with extensive analysis: Decision trees, Random forrest, parameter tuning, group k-fold cross validation, Gradient Tree Boosting
  • Naive bayes classifier (NB): var smoothing analysis
  • Comparing the performance of NB compared to the decision tree approaches

Part 3

  • Preprocessing data (outlier removal, feature selection, normalization, train-test split, creating 3 different training sets for the 3 targets, etc.)
  • Deep neural network: MLP, model and architecture analysis, tuning the hyperparameters, class weights
  • LSTM networks: model optimization, L2 regularization, activation functions, dropout, batch normalization
  • Deep MLP vs LSTM: thorough analysis (time, accuracy, number of parameters, etc.)
  • Convolutional neural network: Parameter and architecture tuning, padding, activation, classificaiton layer
  • ResNet CNN model: thorough analysis and comparison to the previous CNN model (time, depth, number of parameters)

Download From Github With Explanations

These codes were written by Navid Malekghaini and Soheil Johari.

موافقین ۱ مخالفین ۰ 29 December 21 ، 19:36