Mathematics for Machine Learning

Mathematics for Machine Learning
Mathematics for Machine Learning

Math helps in understanding logical reasoning and attention to detail. It enhances your abilities to think under pressure and increase your mental endurance.

Mathematical concepts give the real solution of hypothetical or virtual problems. It is about structure, developing principles that remain true even if you make any alteration in the components. The main branches of mathematics that constitute a thriving career in Machine Learning are mentioned below:

  • Linear Algebra
  • Multivariate Calculus
  • Multivariate Statistics
  • Probability

“Mathematics is the language with which God has written the universe.”

Galileo Galilei

Machine Learning is not magic it’s just mathematics. The idea behind thinking and prediction did by a machine learning algorithm is all done by mathematical concepts. So, if you want to drive your career in machine learning you have to make an easy understanding of some of the mathematics concepts. But Machine Learning framework provides some relief for those people who don’t want to go in deep Machine Learning Mathematics concepts.

Some of the framework or libraries that you should consider are in detail explained here. The primary concepts for Machine Learning are to create a model for the prediction or classification, these models are the mathematical concepts that are working behind. The concepts of Linear Algebra, Calculus, Game Theory, Probability, statistics, advanced logistic regressions and Gradient Descent are all major data science underpinnings.


Linear Algebra: Machine Learning uses mostly linear algebra in almost all its aspects, this will be clearer to you as we go deep down in this blog. The concept of Vectorisation in python make’s the best use of Linear Algebra. Vectorisation reduced the time complexity in comparison with the basic for-loop structure. It helps in generating new ideas, all the core portion of deep learning is Linear Algebra. Machine learning algorithms mainly uses the concepts of scalars, vectors, tensors, matrices, sets and sequences, Topology, Game Theory, Graph theory, functions, linear transformations, eigenvalues and eigenvectors.

Let’s see the some of the core concepts which everyone in machine learning must be aware of:

  • Scalar: Any raw quantity that has a value can be called as a scalar. Scalars are quantities that are fully described by a magnitude (or numerical value) alone. Scalar is nothing but just a value or any real number. The scalar in physics is the same as the scalar in mathematics.
  • Vectors: They are a representation of the scalar data in vector space in a specific direction. In mathematics and physics, a vector is an element of a vector space. Historically, vectors were introduced in geometry and physics before the formalisation of the concept of vector space. Because of the difference in features and values in a vector can be defined by many types that are mentioned below:
    • Columns Vector
    • Row Vector
    • Coordinate Vector
    • Displacement Vector
    • Position Vector
    • Velocity Vector
    • Pseudovector
    • Tangent Vector
    • Normal Vector
    • Gradient
  • Vector Operations: Different vectors have different meanings and are useful in different occasions. There are different operators that can be used to perform operations on two vectors. There can be many vectors operations but all uses mainly the dot or cross operations. Vector dot operation is the addition of two vectors, but it is not just simple arithmetic. It is actually the displacement that we achieve from the working of both these vectors.
    • Scalar Multiplication: During Scalar multiplication a vector grows or shrinks more in the dimensional space as shown below.
Here V is the vector and K is the Scalar quantity
  • Projection of a Vector onto Another: One vector can also be projected onto another.

Properties of vector operations:

Let u, v and w be vectors in the plane, let c and d be scalars.

Let u, v and w be vectors in the plane, let c and d be scalars.

 u + v = v + uCommutative Property
 (u + v) + w = u + (v + w)Associative Property
 u + 0 = uAdditive Identity Property
 u + (-u) = 0Additive Identity Property
 c(du) = (cd)u 
 (c + d)u = cu + duDistributive Property
 c(u + v) = cu + cvDistributive Property
 l(u) = u, 0(u) = 0 
  • Matrix: It is a rectangular arrangement of numbers into rows and columns. For example, matrix A has two rows and three columns.

      There are also different types of matrix’s some are named below:

  • Row Matrix
  • Column Matrix
  • Square Matrix
  • Diagonal Matrix
  • Identity Matrix
  • Sparse Matrix
  • Dense Matrix
  • Matrix Operations

There are various operations that can be performed on a matrix

blog banner 1
  • Matrix Addition
  • Matrix Multiplication
  • Transpose of a Matrix
  • Determinant of a Matrix
  • Inverse of a Matrix
  • Vector as Matrix: Vectors can have either 2(for 2D) or 3(for 3D) components along with some angle from the origin. This can be easily translated into Matrices. Because of several diverse operations as Scaling, Shearing and Rotation and existence of many algorithms in matrices that make data processing easy, Pixel can also be represented as matrices. Matrices are also widely used in deep learning to make layers of a neural network as they make data processing easy and optimised with the help of python’s vectorisation feature.

A simple example of transforming a vector into a matrix is given below:

Equation: 6x + 4y = Z1 x + y = Z2
Matrices: [6 4] [1 1]

After this we could all the matrix operations and solve the equations thus formed using two most important method:

  • Inverse Method
  • Row Echelon Method
  • Eigen Vectors: They are a special set of vectors associated with a linear system of equations. Each eigenvector is paired with a corresponding so-called eigenvalue or λ also known as a characteristic value, proper value, or latent value. Using eigendecomposition theorem, the decomposition of a square matrix into eigen values and eigen vectors is known in this work as eigen decomposition.

Av = λv
Where A € R^(mm) (Square Matrix’s) Eigenvector -> v € R^(m1) (Column Vector)
Eigenvalue -> λ € R^(m*m) (Diagonal Matrix’s)

  • Tensor: They are multi-dimensional arrays with a uniform type. They are algebraic object that can be used to describe physical properties like Scalar and Vector. Objects that tensors may map between include vectors and scalars, and even other tensors. Tensors have shapes. Tensor can have different forms such as:
  • Scalars
  • Vectors
  • Dual Vectors
  • Multilinear maps between vector spaces

Some basic terms for Tensor:

  • Shape: The length (number of elements) of each of the dimensions of a tensor.
  • Rank: Number of tensor dimensions. A scalar has rank 0, a vector has rank 1, a matrix is rank 2.
  • Axis or Dimension: A particular dimension of a tensor.
  • Size: The total number of items in the tensor, the product shape vector.

They also have their own algebra for their operations, you can refer to Tensor Algebra for some advanced topics. Tensor can be represented in python using array (ndarray). Basic implementation of Tensor in python can be done as:

#tensor

from numpy import array
T = array([
[[6,5,4], [8,56,85], [8,53,48]],
[[15,49,84], [15,51,21], [485,484,115]],
[[4,54,55], [87,8,54], [5,41,56]],
])
print(T.shape)
print(T)

  • Application of Linear Algebra in Machine Learning: Linear Algebra is most widely used in Machine Learning concepts below as some of the insights that majorly use the concepts.
  • Deep learning
  • Singular-Value Decomposition
  • Regularisation
  • Linear Regression (uses python vectorisation)
  • Image processing (pixels converted to matrices)
  • One Hot Encoding (convert feature to Categorical Value)
  • Datasets and Data Files (they are parsed in code using matrix operation)

Multivariate Calculus: Multivariate means multiple variables. Therefore, multivariate calculus is a field of calculus which involves multiple variables. Multivariate calculus is also known as partial differentiation is used for the mathematical optimisation of a given function (mostly convex as convex function tends to have minima). Using Multivariate Calculus, we can easily optimize our convex function algorithm easily to the minima or the lowest point, there are ways where it can get stuck to local minima but there are methods to avoid them also.

To learn more about multivariate calculus you can refer to this video, for a better visual explanation. Example of working of Multivariate Calculus on Linear Regression. Our sole aim in linear regression is to fit a curve (for linear, a curve is a line) as closely as possible. We use loss functions named mean squared error or root mean squared error to quantify the difference between the actual and predicted value. We are going to repeat the exercise until the loss function converges and we reach the minimum value. Using a gradient descent algorithm (uses multivariate partial calculus), we converge the error function to minima, it is known as converging.

In mathematics, calculus is a branch that deals with finding the different properties of integrals and derivatives of functions. Calculus is the study of continuous change of a function or a rate of change of a function. Most part of the physics is dependent on calculus. It has two major branches:

  • Differential calculus
  • Integral Calculus

Application of Multivariate Calculus in Machine Learning:

  • In a support vector algorithm (to find the maximal margin).
  • In the EM algorithm (to find the maxima).
  • The optimisation problems rely on the multivariate calculus.
  • In gradient descent (to find the local and global minima).

Multivariate Statistics

The term “multivariate statistics” is statistics where more than two variables simultaneously analysed. The non-multivariate case of regression is the analysis between two variables, and it is called a bivariate regression (contains independent and dependent variable). Statistically, one could consider the one-way ANOVA (Analysis of variance) as either a bivariate curvilinear regression or as a multiple regression with the K level categorical independent variable dummy coded into K-1 dichotomous variables.

For example, the pie charts of sales based on territory involve only one variable and can be referred to as univariate analysis. If the analysis attempts to understand the difference between two variables at a time as in a scatterplot, then it is referred to as bivariate analysis. For example, analysing the volume of sale and spending can be considered as an example of bivariate analysis.

There can be different technique on which the different models are dependent, thus the technique is:

  1. Independent technique: Model’s under this technique are listed below:
  • Multidimensional Scaling
  • Correspondence Analysis
  • Cluster Analysis
  • Factor Analysis
  • Dependent technique:

2. Model’s under this technique are listed below:

  • Multiple Discriminant Analysis
  • Conjoint Analysis
  • Multiple Regression
  • Structural Equation Modeling
  • Canonical Correlation Analysis
  • Multivariate Analysis of Variance and Covariance
  • Linear Probability Models

On basis of nature of data different technique can be selected, so there are:

  • Metric Data Variables
  • Non-metric Data Variables

Probability: It is a measurement of the likelihood that a particular event will occur. It ranges between 0 to 1 where 0 means event is impossible to occur and 1 mean the event will for sure (certain) will occur. Probability theory offers tools to deal with uncertainty. To analyse the frequency of happening of an event, the concepts of probability are used, as it is defined as the chance of occurrence of an event.

P(E) = m/n

Where,

m = Favorable cases
n = Exhaustive cases

e.g. Suppose you toss a coin, the possible outcomes would be Head or Tail. So, in this case, the exhaustive cases are 2 namely {Head, Tail}. Now, you can calculate the probability of getting a head as

Favorable cases/Exhaustive cases = 1/2 = 0.5 is the probability

Note that favorable case(s) of getting a head is only one.

Discrete random variables, continuous random variables, Bayes Formula and normalisation are some concepts of probability that are used in Robotics navigation and locomotion along with other concepts of linear algebra. To learn more about multivariate calculus you can refer to this video.

  • Measure Theory: The entire point of Probability is to measure something. Unlike length and weight, we have very specific values we care about, namely the interval [0,1]. The most basic point of probability is that you are measuring the likelihood of events on a scale from 0 to 1. This measurement of events from 0 to 1 is the Probability Measure, which is also popular as Measure theory.
  • Information Theory: Information theory is a subfield of mathematics concerned with transmitting data across a noisy channel. A cornerstone of information theory is the idea of quantifying how much information there is in a message. More generally, this can be used to quantify the information in an event and a random variable, called entropy, and is calculated using probability. You can also get a very good glimpse of this theory using Stanford’s book on information theory. Calculating information and entropy is used in feature selection, building decision trees, and, more generally, fitting classification models. One must have a strong understanding and intuition for information and entropy as there applications are wide and this is basis for some algorithms such as decision tree. Some basic formula that needs to be known are shown in the pic below.
  • Summary: So, we have now gone through proper structuring of the topics that are important for a person that is willing to do good at Machine Learning. These concepts will make a better understanding of the mathematics behind Machine Learning in brief. You can also relate to the topic for further understanding as these topics will take time to go through. Here topics are covered in brief you can refer to the links on the topics itself for more information about the topics.

To enrol in Machine Learning course, click here.

By Gaurav Verma