Unsupervised Learning

Arun Nawani
Last Updated: May 13, 2022

Introduction

Have you ever wondered how Netflix comes to know which show will keep you hooked to your laptop screens? Or you hit the gym and put on your favourite Spotify track, and then Spotify automatically keeps playing tracks on its own and you still love each and every one of them? Well, it’s not a fluke. The recommendation engines of these apps are sophisticatedly designed to provide you with products of your choice. They constantly try to learn more and more about you by keeping track of your interactions within the app and devising some underlying recurring patterns. Suppose you recently acquired a taste for the rock and roll genre. Your last four Spotify tracks have been about rock and roll. The app is likely to recommend similar tracks for at least a brief period of time until it learns a shift in your user activity. The technique used to design these intelligent recommendation engines is known as Unsupervised Machine learning. 

Unsupervised Learning is one of the three broadly classified Machine Learning techniques. The objective of unsupervised learning is to infer patterns without a target variable. Unlike supervised learning, where the objectives are well defined, i.e, we are supposed to find out the dependent or the target variable, in unsupervised learning we don’t have any defined objectives. We are supposed to extract all the relevant observations and results by critically studying the data points. 

Commonly known supervised learning algorithms cannot be directly applied in unsupervised learning techniques.

Some Popular Unsupervised Learning Techniques

Unsupervised learning models are employed for three major tasks:-

  1. Clustering
  2. Association
  3. Dimensionality Reduction

Clustering

Grouping of similar data points within an unlabelled dataset is called clustering. This is one of the most popular techniques in Data Science.  The data points are closely related to the other data points within their cluster. However, this might not mean there are no similarities between data points of different clusters. This is an ideal case scenario that is unlikely to occur in real-world applications. But the aim should be to minimise these similarities between data points of different clusters as much as possible. Clustering are of various types:- Exclusive, Overlapping, Hierarchical, and probabilistic. 

Exclusive Clustering

Exclusive clustering prescribes that a data point can’t be shared among different clusters, i.e, any data point cannot be part of more than one cluster. This clustering technique is also referred to as “Hard clustering”. K-means algorithm works on the principle of Exclusive clustering. 

In K means clustering, data points are assigned into K groups (where K refers to the number of groups or clusters) depending on their distance from the centroid. More groups (higher value of K) would imply more specificity of the clusters and vice versa. It’s important to remember that we need to take an optimised K value to avoid overfitting or underfitting. 

 

Source: Link

Above is an implementation of K-means clustering where we have clustered flowers based on their petal length and width. The dataset used is iris dataset which is available on the internet. We would recommend having a look at the dataset for a better understanding. 

Overlapping Clustering

Sometimes It might not be possible to put a data point into one cluster. This is where Soft clustering comes in. Overlapping clustering allows sharing of data points with a degree of memberships in different clusters. A well-known example of overlapping clustering is soft k means clustering. 

Source - www.TowardsDataScience.com)

The above graphs show how Soft k-means differs from conventional or Hard k-means in clustering methods.  

Hierarchical Clustering

Hierarchical Clustering can be classified into two ways:-

  • Agglomerative Clustering:- It’s a bottom-up approach of clustering the data points. Have a look at the figure below for a better understanding. 

In Agglomerative Clustering, we start from individual data points, steadily building clusters (iterative clustering) in a hierarchical manner until all the data points in the data set are a part of one single cluster. 

             The tree-like structure depicting the hierarchy is known as a dendrogram.

These clusterings are done on the basis of similarity between the data points. Various methods are employed to gauge these similarities, like ward’s linkage, average linkage, complete linkage, and single linkage. 

 

  • Divisive Clustering: Divisive Clustering is a top-down approach, polar opposite of Agglomerative Clustering. In this case, we start from a single cluster consisting of all the data points, dividing them into smaller clusters for granularity of data patterns.  Divisive Clustering isn’t commonly preferred over agglomerative clustering. 

 

Probabilistic Clustering

Probabilistic Clustering assigns data points with a probability of belonging to certain distribution. It differs from hard clustering in a way that it doesn’t assign every data point to a certain cluster with absolute certainty. This way it’s more flexible with its clustering methods. Gaussian Method Model is the most popular technique to implement probabilistic clustering. GMMs determine in which distribution a data point belongs. 

Association Rules

Association rules prescribe finding a relationship between the data points in a given data set. This unsupervised machine learning technique is extensively used for market research analysis by companies to find a relationship between different products consumed by a customer. Spotify music recommendation engine works on the same principle. 

The most widely used for association generation is Apriori Algorithm.

Apriori algorithm uses a hash tree to identify the likelihood of an item to be used given consumption of previous products. 

Dimensionality Reduction

It is a general assumption that more data yields a better machine learning model. However, this may not be the case every time. An optimal data set isn’t always the one with the most data since it can lead to overfitting. Dimensionality reduction is a data processing technique to reduce the data to a more manageable size. There are various methods to implement dimensionality reduction such as:- 

  • Principal Component Analysis:- Used to generate a set of dimensions to which are known as principal components. These are decided so as to maximize the variance of the data set. The process is iterative and depends on the total number of dimensions. 
  • Singular Value Decomposition: Similar to principal component analysis, it is also a noise reduction method that factorises a matrix into three low rank matrices. 

Singular value decomposition takes a rectangular matrix of gene expression data (defined as A, where A is an n x p matrix) in which the n rows represent the genes, and the p columns represent the experimental conditions. The SVD theorem states:

Anxp= Unxn Snxp VTpxp

Where

UTU = Inxn

VTV = Ipxp  (i.e. U and V are orthogonal)

Where the columns of U are the left singular vectors (gene coefficient vectors); S (the same dimensions as A) has singular values and is diagonal (mode amplitudes), and VT has rows that are the right singular vectors (expression level vectors). The SVD represents an expansion of the original data in a coordinate system where the covariance matrix is diagonal.

  • Autoencoders: Autoencoders are data compression tools that make use of neural networks for the compression of data. It adds a compression layer known as the “Hidden layer” between the input and the output layer. This compression layer acts as a bottleneck that segregates significant data from insignificant data or Noise. The stage when the data passes through the hidden layer is called “Encoding,” and the stage when the reconstructed data is passed on by the hidden layer to the output layer is called “Decoding”. 

Applications

Unsupervised machine learning techniques are vastly used by companies to improve user experience and subconsciously push them to consume more of their products. 

Some of the real-world applications are listed below:-

  • Recommendation Engines: Netflix recommendation engine tracks and identifies patterns of consumption by its users to recommend shows of their choice.
  • Computer Vision: Unsupervised Machine Learning techniques are employed to identify objects. 
  • Anomaly Detection: Unsupervised machine learning techniques are used to identify anomalies in the dataset which may have occurred due to some unintended errors. 

Pros and Cons of Unsupervised Machine Learning

Every technique has its pros and cons. And so does Unsupervised Machine Learning. Every technique has its pros and cons. And so does Unsupervised Machine Learning. 

Pros:

  • It takes into account a plethora of parameters during grouping, which makes it far more reliable than manual clustering. 
  • The underlying patterns it reveals are essentially invisible before clustering. This makes it an instrumental tool for enterprise to know their target audience. 
  • The data can be used without labeling it. That’s not the case with supervised machine learning. 

Cons:

  • It is computationally expensive and processing of results often requires human intervention to identify underlying patterns. 
  • It’s very difficult to measure the accuracy of results since there are no expected outputs to validate the obtained results with. 
  • The basis of clustering can sometimes be vague. 

Key Takeaways

This blog provides a good overview of Unsupervised machine learning techniques for beginners. Various popular algorithms and their use cases in real life should be your primary takeaways from this article. We would highly recommend learning more in-depth about these algorithms if you intend to implement any of these techniques on your own. 

Happy learning!!

Was this article helpful ?
0 upvotes