What Is EM Algorithm In Machine Learning?

What Is EM Algorithm In Machine Learning?
What Is EM Algorithm In Machine Learning?

Chicken and egg problems are major headaches for several entrepreneurs. Many machine learning (ML) problems affect an identical dilemma. If we all know A, we will determine B. Or, if we all know B, we will determine A. Either way solves the ML problem. Unfortunately, we don’t know A or B. But in ML, it is often solved by one powerful algorithm called Expectation-Maximisation Algorithm (EM).

Image Source: Google

In stats, an Expectation–Maximisation (EM) algorithm is used as an iterative method to know out (local) maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variables. The EM iteration is used as an alternate between the performing an expectation (E) step, which creates a function for the expectation of the log-likelihood evaluated using the present estimate for the parameters It is important for the maximisation (M) step, which help in computing parameters maximising the expected log-likelihood found on the E step. These parameter-estimates are then wont to determine the distribution of the latent variables within the next E step.

On the opposite hand, Expectation-Maximisation algorithm is often used for the latent variables (variables that aren’t directly observable and are literally inferred from the values of the opposite observed variables) too so as to predict their values with the condition that the overall sort of probability distribution governing those latent variables is understood to us.

This algorithm is really at the bottom of the many unsupervised clustering algorithms within the field of machine learning. It was explained, proposed and given its name during a paper published in 1977 by Arthur Dempster, Nan Laird and Donald Rubin. it’s wont to find the local maximum likelihood parameters of a statistical model within the cases where latent variables.

Algorithm:

  • Given a group of incomplete data, consider a group of starting parameters.
  • Expectation step also called as  (E – step): It is based on using data which is available in the dataset, estimate (guess) is made of the values of the missing data.
  • Maximisation step (M – step): Complete data generated after the expectation (E) step is employed so as to update the parameters.
  • Repeat step 2 and step 3 until convergence.

The essence of the Expectation-Maximisation algorithm is to use the available observed data of the dataset to estimate the missing data then using that data to update the values of the parameters. allow us to understand the EM algorithm intimately.

  • Initially, a group of initial values of the parameters are considered. a group of incomplete observed data is given to the system with the idea that the observed data comes from a selected model.
  • The next step is understood as “Expectation” – step or E-step. during this step, we use the observed data so as to estimate or guess the values of the missing or incomplete data. it’s basically wont to update the variables.
  • The next step is understood as “Maximisation”-step or M-step. during this step, we use the entire data generated within the preceding “Expectation” – a step so as to update the values of the parameters. it’s basically wont to update the hypothesis.
  • Now, within the fourth step, it’s checked thoroughly that whether the values are converging or not, if yes, then stop otherwise repeat step-2 and step-3 i.e. Expectation step is done and Maximization – step is done until the convergence occurs.

Flow chart for EM algorithm:

Usage of EM Algorithm:

  • It is often wont to fill the missing data during a sample.
  • It is often used because of the basis of unsupervised learning of clusters.
  • It is often used for the aim of estimating the parameters of the Hidden Markov Model (HMM).
  • It is often used for locating the values of latent variables.

Usage of EM Algorithm:

  • It is often wont to fill the missing data during a sample.
  • It is often used because of the basis of unsupervised learning of clusters.
  • It is often used for the aim of estimating the parameters of the Hidden Markov Model (HMM).
  • It is often used for locating the values of latent variables.

Disadvantages of EM Algorithm:

  • It has slow convergence.
  • It makes convergence to the local optima only.
  • It requires both the possibilities, forward and backward (numerical optimisation requires only forward probability).

How Does It Work in Detailed Explanation?

The basic idea behind the EM algorithm is to use the observed data to estimate the missing data then updating those values of the parameters. keeping the flowchart in mind, allow us to understand how the EM algorithm works.

  • Within the drawing board, a group of initial parameters is taken into account. A group of unobserved and incomplete data is given to the system with an assumption that the observed data is coming from a selected model.
  • The subsequent step is that the Expectation Step or E-STEP. during this step, we use the observed data to estimate missing or incomplete data. it’s basically wont to update the variables.
  • The Maximisation step or M-STEP is employed to finish the info generated within the E-STEP. This step basically updates the hypothesis.
  • Within the last step, it’s checked whether the values are converging or not. If the values match, then we do nothing, else we’ll continue with step 2 and three until the convergence is met.
  • The EM algorithm is additionally known for clustering aside from density estimation. So, allow us to attempt to understand the EM algorithm with the assistance of the Gaussian Mixture Model.

Gaussian Mixture Model and therefore the EM Algorithm:

A mixture model may be a model comprised of an unspecified combination of multiple probability distribution functions. A statistical method or learning algorithm is employed to estimate the parameters of the probability distributions to best fit the density of a given training dataset.

The Gaussian Mixture Model, or GMM for a brief , maybe a mixture model that uses a mixture of Gaussian (Normal) probability distributions and requires the estimation of the mean and variance parameters for every. There are many techniques for estimating the parameters for a GMM, although a maximum likelihood estimate is probably the foremost common.

Consider the case where a dataset is comprised of the many points that happen to be generated by two different processes. The points for every process have a Gaussian probability distribution, but the info is combined and therefore the distributions are similar enough that it’s not obvious to which distribution a given point may belong.

The processes wont to generate the info point represents a latent variable, e.g. process 0 and process 1. It influences the info but isn’t observable. As such, the EM algorithm is an appropriate approach to use to estimate the parameters of the distributions.

In the EM algorithm, the estimation-step would estimate a worth for the method latent variable for every datum and therefore the maximization step would optimise the parameters of the probability distributions in an effort to best capture the density of the info. the method is repeated until an honest set of latent values and maximum chances are achieved that matches the info.

  • E-Step: Estimate the arithmetic mean for every latent variable.
  • M-Step: Optimise the parameters of the distribution using maximum likelihood.

We can imagine how this optimization procedure might be constrained to only the distribution means, or generalised to a mix of the many different Gaussian distributions.

Expectation-Maximisation (EM) is an important algorithm for the maximum likelihood estimation which is used in models with hidden variables (usually missing data or latent variables). It involves iterative computing expectations of terms within the log-likelihood function under the present posterior, then solving for the utmost likelihood parameters. There are some common applications of the EM  which include fitting mixture models, learning Bayes net parameters with latent data and learning hidden Markov models.

To read more about Machine Learning, click here.

By Madhav Sabharwal