Commonly used Machine Learning Algorithms (with Python & R Codes)

Table of Contents

Introduction

We live in an era of constant technological progress which is backed by advances in the field of machine learning. Machine learning algorithms are highly automated, self-modifying and continue to improve over time with minimal human intervention.

Machine learning algorithms can be thought of as a BlackBox which trains on the input data and is able to predict the expected outputs for other unseen inputs that are fetched into the BlackBox. In other words, machine learning is a concept that allows the machine to learn from examples and experiences without being explicitly programmed.

The type of machine learning algorithms that best solves your problem statement depends on the business problem, nature of the dataset and the resources available.

Types of Machine Learning Algorithms

All the machine learning algorithms can be categorised into four distinct types. Before we understand what machine learning problems are popular in the industry currently, let us quickly understand the different types of machine learning algorithms.

Machine Learning Algorithm Type #1: Supervised Learning

These algorithm types train on a labelled training data to produce a derived function that can be applied to predict a label for never-before-seen testing data.

Machine Learning Algorithm Type #2: Unsupervised Learning

These algorithm types train on unstructured and unlabelled training data and club similar data subsets together.

Machine Learning Algorithm Type #3: Semi-Supervised Learning

These algorithm types train on an amalgamation of both labelled as well as unlabelled data. The unlabelled dataset is divided into smaller subgroups having more similarities within them, which is then fed to supervised techniques to generate appropriate labels.

Machine Learning Algorithm Type #4: Reinforcement Learning

These algorithm types keep improving their performance and accuracy incrementally with the help of reward feedback. This addition of new data into the training data over time helps the systems become more sophisticated and better trained.

Popular Machine Learning Algorithms

Each machine learning problem is unique and brings with itself several constraints. Since there is no single machine learning model which can help solve all the problems, as a beginner, you should know all the popular machine learning algorithms, their advantages, when to use them, etc.

We have listed down some of the most popular algorithms of today’s times. These are:

Popular Machine Learning Algorithm #1: Naive Bayes Algorithm

Naive Bayes is a classification (supervised learning) algorithm that is based on the Bayes’ Theorem which assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature. Even if the features are related, the classifier would consider the individual properties independently when calculating the probability of the outcome to occur.

It is easy to build and can outperform even the sophisticated classification methods for massive datasets.

The intuition behind the Naive Bayes Algorithm, Bayes’ Theorem, provides a way of calculating the posterior probability from the likelihood, class prior probability and predictor prior probability with the help of the below equation:

P(C|X) = (P(X|C)P(C)) / P(X), where

P(C|X) -> Posterior Probability of class (target) given predictor (attribute)
P(C) -> Prior Probability of class
P(X|C) -> Likelihood of the predictor given class
P(X) -> Prior Probability of predictor

When To Use Naive Bayes Algorithm?

When the training data set is moderate or large, instances have several attributes and the input variables are categorical.
When we want to make multi-class predictions.

Some Common Applications of Naive Bayes Algorithm:

Classification

Code in R:
library(e1071)
x <- cbind(x_train,y_train)
# Fitting model
fit <-naiveBayes(y_train ~ ., data = x)
summary(fit)
#Predict Output
predicted= predict(fit,x_test)

Popular Machine Learning Algorithm #2: Regression Algorithms (Linear and Logistic)

The regression algorithm shows the relationship between two variables and how the change in one variable (independent variable) impacts the other (dependent variable). The basic goal of regression is to fit the best fit line with minimum displacement errors amongst the test independent variable to accurately predict the value of a never-seen-before dependent variable.

There are two common versions of regression algorithms. These are:

Regression Algorithm #1: Linear Regression

The goal of linear regression is to fit the best-fit straight line through the dependent variables. The equation of the regression line can be represented by the linear equation

Y = A*X + B, where

Y -> Dependent Variable
A -> Slope
X -> Independent Variable
B -> Intercept

The coefficients A and B are derived by minimising the sum of squared differences of the distance between the data points and the regression line.

Regression Algorithm #2: Logistic Regression

Also known as logit regression, the goal of logistic regression is to predict the probability of an event by fitting the data to a logit function. It helps model a binary outcome (0/1) with one or more explanatory variables.

Hence, it is actually a classification machine learning model. The equation of log odds can be given as follows:

O = P / (1-P), where

O -> Odds
P -> Probability of event occurrence
(1-P) -> Probability of event not occurring

When To Use Regression?

When the independent variable is dependent on the dependent variable, use linear regression.
When the need is to classify the elements into two categories based on the explanatory variable, use logistic regression.

Some Common Applications of Regression:

Estimating sales or risk assessment (linear regression)
Classification (logistic regression)

Code in R (Linear Regression):

#Load Train and Test datasets
#Identify feature and response variable(s) and values must be numeric and numpy arrays
x_train <- input_variables_values_training_datasets
y_train <- target_variables_values_training_datasets
x_test <- input_variables_values_test_datasets
x <- cbind(x_train,y_train)
# Train the model using the training sets and check score
linear <- lm(y_train ~ ., data = x)
summary(linear)
#Predict Output
predicted= predict(linear,x_test)

Code in R (Logistic Regression):

x <- cbind(x_train,y_train)
# Train the model using the training sets and check score
logistic <- glm(y_train ~ ., data = x,family='binomial')
summary(logistic)
#Predict Output
predicted= predict(logistic,x_test)

Popular Machine Learning Algorithm #3: Decision Trees Algorithm

Decision Trees are supervised machine learning algorithms used for classification problems. The population is split into two or more homogeneous sets based on the most significant attributes and works well for classifying both categorical as well as continuous dependent variables.

Due to its easy structure, it is easy to assess the minimum number of questions one has to ask to arrive at a logical conclusion in a structured and systematic way.

A sample decision tree can be represented as follows:

Image Source: KDnuggets

When To Use Decision Trees Algorithm?

When a visual representation of a discrete decision is needed to be analysed and the target has discrete output values.
When the idea of how a different decision affects the solution is to be estimated.

Some Common Applications of Decision Trees Algorithm:

Pattern recognition
Classification

Code in R:

library(rpart)
x <- cbind(x_train,y_train)
# grow tree
fit <- rpart(y_train ~ ., data = x,method="class")
summary(fit)
#Predict Output
predicted= predict(fit,x_test)

Popular Machine Learning Algorithm #4: K-Nearest Neighbours Algorithm (KNN)

K-Nearest Neighbours Algorithm can be applied for classification problems. It stores all available cases and classifies any new cases by taking a majority vote of its k nearest neighbours. The neighbour which the new class shares most in common with is the case that is assigned to this class.

The distance between the class and the neighbour is calculated using either of the Euclidean, Manhattan, Minkowski or Hamming Distances. However, since it is computationally expensive, data should be pre-processed by normalising the variables.

When To Use K-Nearest Neighbours Algorithm?

When classification of a class has to be done accurately on the basis of its similarity with its neighbours.
When classification of non-linear data has to be done.

Some Common Applications of K-Nearest Neighbours Algorithm:

Classification

Code in R:

library(knn)
x <- cbind(x_train,y_train)
# Fitting model
fit <-knn(y_train ~ ., data = x,k=5)
summary(fit)
#Predict Output
predicted= predict(fit,x_test)

Popular Machine Learning Algorithm #5: Support Vector Machines Algorithm (SVM)

Support Vector Machines are supervised machine learning algorithms used for classification. The data is classified into different classes by a line (hyperplane) which separates the various classes. The hyperplane can be linear as well as non-linear.

The best-fit hyperplane is chosen to try to maximise the distance between the various classes with help of a process called margin maximisation since the model can give better classifications if there is more distance present between the classes.

When To Use Support Vector Machines Algorithm?

When classification is to be performed with more accuracy and better performance.
When no strong assumptions have to be made about the data and the model doesn’t have to be overfitted.

Some Common Applications of Support Vector Machines Algorithm:

Classification

Code in R:

library(knn)
x <- cbind(x_train,y_train)
# Fitting model
fit <-knn(y_train ~ ., data = x,k=5)
summary(fit)
#Predict Output
predicted= predict(fit,x_test)

Frequently Asked Questions

Which is the best machine learning algorithm?

There is not a single machine learning algorithm that solves all use cases. Each problem brings with itself several constraints, so the best machine learning algorithm used for solving one problem might not be that useful in solving the other problem.

What are the five popular algorithms of machine learning?

Some algorithms form the basis for many machine learning algorithms and hence are more popular than others. These are:

1. Naive Bayes Algorithm
2. Regression Algorithms (Linear and Logistic)
3. Decision Trees Algorithm
4. K-Nearest Neighbours Algorithm (KNN)
5. Support Vector Machines Algorithm (SVM)

What is the basic machine learning algorithm?

All machine learning algorithms work on a probabilistic model, hence probability can be thought of as the basis of all machine learning algorithms.

How many algorithms are there in machine learning?

There are infinite number of machine learning algorithms, with new algorithms coming up every now and then. However, all the machine learning algorithms can be categorised into four types:

1. Supervised Machine Learning Algorithms
2. Semi-Supervised Machine Learning Algorithms
3. Unsupervised Machine Learning Algorithms
4. Reinforcement Machine Learning Algorithms

Key Takeaways

The field of machine learning is growing at a rapid pace. The sooner you understand the various algorithms, the sooner you will be able to get your hands dirty by solving some of the most pressing problems of today. We hope the article helped you understand the most popular machine learning algorithms that are being used today and the code will help you implement these popular algorithms and get started on your machine learning journey right away!

By Saarthak Jain