In machine learning problems, we often encounter with imbalanced datasets. Problems like fraud detection, claim prediction, churn prediction, anomaly detection, and outlier detection are the examples of classification problem which often consist of the imbalanced dataset.
In this article, I am going to discuss a simple approach to deal with an imbalanced dataset by using imblearn python library which is specially designed to deal with imbalanced datasets. The dataset is which I am using here is taken from Machinehack Detecting Anomalies in Wafer Manufacturing hackathon which consists of binary classes.
In this article, we discuss building a simple convolutional neural network(CNN) with PyTorch to classify images into different classes. By the end of this article, you become familiar with PyTorch, CNNs, padding, stride, max pooling and you are able to build your own CNN model for image classification. The dataset we are going to use is Intel Image Classification dataset available on Kaggle.
So let’s begin, here is an outline of what this article going to cover:
This article is based on week 2 of course Sequence Models on Coursera. In this article, I try to summarise and explain the concept of word representation and word embedding.
Generally, we represent a word in natural language processing through a vocabulary where every word is represented by a one-hot encoded vector. Suppose we have a vocabulary(V) of 10,000 words.
V = [a, aaron, …, zulu, <UNK>]
Let’s take the word ‘ Man’ is at position 5391 in the vocabulary, then it can be represented by a one-hot encoded vector (O5391 ). …