Introduction to Machine Learning ML-1

Greetings, This blog contains the required information to get started with Machine learning. The previous blog consists of basic knowledge about Artificial Intelligence. Machine Learning is a sub-domain of Artificial Intelligence. As previously discussed Artificial Intelligence is our main goal. The methods and algorithms by which we are going to achieve artificially intelligent systems are Machine learning and Deep learning. This blog focuses on :

What is Machine learning?
Types of Machine learning?
Algorithms name of Machine learning?

"Machine learning is a sub-domain of Artificial Intelligence that is responsible for creating predictive models". When our AI system is supposed to predict some values then we can say that it is a Machine learning problem. In the dataset, machine learning is generally expecting a tabular dataset. Generally, a file like CSV and TSV is generally used for ML model training.

Machine Learning Classification

Fig 1. Machine Learning Classification

machine learning is classified into 3 types :

Supervised Learning
Unsupervised Learning
Reinforcement Learning

1. Supervised Learning:

Supervised learning in simple terms is a type of learning in which a machine has questions and answers both to get trained. In technical terms, the machine will have features (X) and target feature i.e labels (y) to get trained. Let's discuss some examples of supervised learning.

Fig 2. Dog Cat Classification

If you provide the images of dogs and cats and labels like dog and cat, then that problem will be considered as a supervised learning problem. The reason behind supervised learning is because you are giving images to the system and also specifying that this image is of either dog or cat. After training, when the AI model will get an image. It will try to predict whether the image is of a dog or a cat.

Fig 3. House Price Prediction

Another example is house price prediction. In house price prediction, you are providing features of the house like a number of bedrooms, kitchen, etc. On the other hand, we are giving the price of the house. We are expecting the AI model to predict the price of the house as we give these feature values like the number of bedrooms, kitchen, etc.

Supervised learning is classified into 2 types

Regression
Classification

1.1 Regression

As discussed in the supervised learning. In the Regression problem, we have to predict some value that is continuous (the numeric value). Hence Regression simply means prediction at continuous outcomes. The main hint to find whether it is a regression problem is that you will be given some numbers to predict. An example of a regression problem is House price prediction (refer to Supervised learning example).

In house price prediction, we have to predict the price of the house which is a continuous value. The main goal of AI in house price prediction is to predict the price of the house. price of the house is a continuous i.e numeric feature. Another thing is we have some label i.e house price to predict. Hence we state this problem as Supervised learning, Regression problem.

1.2 Classification

As discussed in the supervised learning. In the Classification problem, we have to predict categorical values. Classification means the identification of different classes. In the dog and cat example mentioned in the supervised learning section, Dog and Cat are two classes. AI model has to predict whether the image given is either dog or cat. There can be n number of classes. Dog and cat classification can also be called Binary classification because they have two classes.

Another example of classification can be weather prediction. Weather is of multiple types like rainy, sunny, cloudy, windy. hence this is a multi-class classification. We are supposed to predict weather that is why it is supervised learning because the weather is given as a label to predict and this label contains categorical values. Hence, We state this problem as Supervised learning, classification problem.

2. Unsupervised Learning:

Unsupervised learning is just the opposite of Supervised learning. In supervised learning, you must mention the label which you want the AI model to predict. In unsupervised learning, you just need to give data and no labels are required. Unsupervised learning mainly consists of the grouping of similar data in the form of clusters.

Unsupervised learning is classified into 2 types

Clustering.
Dimensionality reduction.

2.1 Clustering:

The goal of clustering is to form groups of similar data points. Users just need to tell that how many groups are to be formed. AI model will assign a specific group number to each data point and that is the prediction i.e output. The problem in which you need to form groups that problem is Unsupervised learning. The important hint is Data is not having any label for prediction. AI model will assign a group number to all data points as a prediction. Technical things will be discussed in further blogs.

Fig 4. Clustering

In fig 4, All the axis contains some data columns and the plots represent the data. The output which is expected in clustering is groups number. An example of clustering will be group formation with respect to student's datasets. In this case, which label we have to predict is not mentioned hence it is Unsupervised learning. AI model is supposed to form groups hence it is a Clustering problem. Hence this problem can be stated as an Unsupervised clustering problem.

2.2 Dimensionality Reduction:

In Dataset, the Number of columns is denoted as dimensions because each column takes one axis in the graph. hence, if the number of columns in any dataset is high then we need to reduce the number of columns i.e reducing dimensions. If we are reducing dimensions then at the same time loss of information is also occurring. We have to make sure that we reduce the dimension of the dataset and at the same time loss of information should be optimum (less). If we use high dimensions of data, chances are the AI model gets confused or the AI model can memorize each and everything (technical details in further blogs). Hence to create an ML model these things data scientists have to make sure that the number of dimensions should be in a certain range.

3 Reinforcement Learning:

In reinforcement learning, the machine will make a certain set of moves, if the result is positive then that set of moves will be stored else another set of moves is used by the machine. Generally, it is used in gaming.

Fig 5. Reinforcement learning

Suppose, you are playing a chess game with a computer. In that game, the computer will try a certain set of moves in gameplay. If the computer wins then the computer will store that set of moves. If the computer losses, then it will try some different strategy. Hence firstly computer will lose. After some trial and error and the way the opponent plays, the machine tries to learn patterns of moves and the machine finally wins. After the machine wins the game it will store the patterns of moves for the next gameplay. hence, As the machine will play, the machine will get experience and the ability to win.

Algorithms for Machine Learning :

Supervised learning :

Linear Regression and Logistic regression
Decision Tree for regression and classification
Support vector machine for regression and classification

Ensemble Techniques :

Bagging, Algorithms are - Bagging, Random forest
Boosting, Algorithms are - AdaBoose, GradientBoost, XGBoost.
Stacking

Unsupervised Learning :

Kmeans for clustering
Kmeans ++ for clustering
Dbscan for clustering
Principle component analysis (PCA) for Dimensionality reduction
Linear discriminant analysis (LDA) for Dimensionality reduction.

Summary:

"Machine learning is a sub-domain of Artificial Intelligence that is responsible for creating predictive models".
In Supervised learning, labels are given for prediction.
Regression is under the supervised learning domain used to predict continuous value.
Classification is under the supervised learning domain used to predict categorical values
In Unsupervised learning, labels are not given for prediction.
In clustering, the AI model is supposed to form groups in the dataset.
In dimensionality reduction, the number of columns in the dataset should be reduced and loss of information should be optimum.

-Santosh Saxena