Recommendation System - Amazon - Application of ML

Mayank Goyal
Last Updated: May 13, 2022


There are millions of products available online, and choosing one out of them is difficult. Earlier, we ask recommendations from our acquaintances, now from the online store. So how do these online stores recommend us users?


These e-commerce companies have developed a recommendation system,i.e., a system to recommend products to users based on different factors. These recommender systems predict the most likely product that the consumer is most likely to purchase. Companies like Amazon, Flipkart, Netflix uses these systems to provide suggestions to its consumer.


There are tons of information available, but these systems filter out the essential information based on consumers' interests and other factors. Recommendation systems create similarity between the user and the product and attribute this similarity between the user and product for the recommendation. 


Both the users and the services providers have benefited from these systems. The quality and decision-making process has also improved through these kinds of systems.


What Recommendation Systems Can Solve?

  • It helps the consumer to find the best product.
  • It helps websites to increase user engagement.
  • It makes the contents more personalized.
  • It helps websites to find the most relevant product for the consumer.
  • Help item providers in delivering their items to the right user.

image_src: link


Types Of Recommendation Systems

Popularity-Based Systems

These systems work on the principle that they recommend the most viewed or purchased products by most people or recommend currently trending or most popular products.

The advantage of systems is that they do not include users' data or recommend products on day 1 of business. Still, the major disadvantage is that it suggests the same sort of products to every other user, and secondly, it is not personalized.

Particular examples are google news or youtube trending videos.

Classification Model Based

These systems work on the principle that it uses features both of the user and the product and then apply the classification model to decide whether the user will buy the product or not. Certain limitations require a tremendous amount of information about different users and products, which is a rigorous task. Secondly, it has flexibility issues.

Content-Based Recommendations

These systems work on the information of the contents of the products rather than the user view. The main idea is if the user likes an item, then the user may also like the "other" similar item. We can use various features to compute the similarity while checking identical content.

The system calculates the distances between the products to check their similarity. There are different metrics for computing the similarity. For numeric data, we use Euclidean distance. For textual data, calculate cosine similarity, and for categorical data, we compute Jaccard similarity.

Particular advantages of these systems are there is not much information about users' data. Secondly, it does not suffer a cold start, where a drawback is feature should be available to compute this similarity.

Collaborative Filtering

It is an intelligent recommender system that works on the similarity between different users and items widely used as an e-commerce website and online. It checks the taste of similar users and makes recommendations. The system efficiency improves if we provide a large volume of information about users and items. 

There are mainly two types of collaborative filtering. First, user-based filtering identifies clusters of users and utilizes the interactions of one specific user to predict the interactions of other similar users. Second, the system checks the items identical to the user's items. The similarity between different items is computed based on the items and not the predictions' users. Users X and Y purchased items A and BThe shape with similar tastes.


Amazon's Recommendation System

So far, the best recommendation is from amazon. Many e-commerce sites have adapted the Amazon recommendation system. So what does amazon do differently?


Well, to answer that, let us move back to two decades. At that time, the world was too focused on user-based collaborative filtering(analyzing history at the customer level). At the same time, amazon paid attention to item-item-based collaborative filtering(analyzes the history of recent purchases,i.e., item level). The amazon personalization team found researching at the item level provides a much better result than at the customer level. Further trying to improve the algorithm, they wanted to see the relatedness between two customers' purchase histories. After decoding the likelihood, there was a significant increase in recommendation quality.


At present, amazon uses item-item-based collaborative filtering, in which amazon recommendations are made through user purchased items and pairing them with similar items into the recommendation list.


Their recommendation algorithm effectively creates a personalized shopping experience for each customer, which helps Amazon increase the average order value and the amount of revenue generated from each customer.


Currently, Amazon is trying to improve the recommendation quality further by training models in a neural network.


We will be using the amazon dataset for the coding part. Let us first import all the required packages.

import pandas as pd 
import numpy as np 
import math
import os
import json
import time
import seaborn as sns
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.model_selection import train_test_split
from sklearn.neighbors import NearestNeighbors
import scipy.sparse
from scipy.sparse import csr_matrix
from scipy.sparse.linalg import svds
import warnings; warnings.simplefilter('ignore')
%matplotlib inline


Now, loading the dataset.



We added columns names as the original dataset did not have.


Displaying Data




Dropping the timestamp column



Statistics about ratings column




From the above output, the rating is distributed from 1 to 5, with one being the lowest.


We take a subset of the dataset as the data size is too large to make it less dense.




Handling Missing Values




As shown above, there are no missing values.


Distribution of Ratings

with sns.axes_style('dark'):
    g = sns.factorplot("ratings", data=df1, aspect=1.0,kind='count')
    g.set_ylabels("Total number of ratings")



We can see most of the users have given a rating of 5.


Unique Users and Products

print("\nTotal no of ratings :",df1.shape[0])
print("Total No of Users   :", len(np.unique(df1.userId)))
print("Total No of products  :", len(np.unique(df1.productId)))



Analyzing the Ratings Given By User

rating_byuser = df1.groupby(by='userId')['ratings'].count().sort_values(ascending=False)[:10]
print('Top 10 users based on ratings: \n',rating_byuser)



As the dataset is too large, we will only consider users who have rated more than 50 times.


new_df=df1.groupby("productId").filter(lambda x:x['ratings'].count() >=50)





(642628, 3)

As we can see, the dataset size decreased from 1000000 to 642628.


Ratings Per Product

no_of_ratings_per_product = new_df.groupby(by='productId')['ratings'].count().sort_values(ascending=False)
ax = plt.gca()



Creating a Pivot Table


ratings_matrix = new_df1.pivot_table(values='ratings', index='userId', columns='productId', fill_value=0)



As shown in the table, many cells are filled with zeroes. So it is a sparse matrix.

print('Shape of the pivot table: ', ratings_matrix.shape)


Shape of the pivot table:  (9832, 76)


X = ratings_matrix


Now let us create a model.


Collaborative filtering(Item-Item)


Importing the required Packages

from surprise import KNNWithMeans
from surprise import Dataset
from surprise import accuracy
from surprise import Reader
import os
from surprise.model_selection import train_test_split



A surprise package is a Python library used for recommendation systems. To load data from a pandas data frame, we will use the load_from_df() method. Also, we will need a  reader object having a rating_sacle value specified.


reader = Reader(rating_scale=(15))
data = Dataset.load_from_df(new_df,reader)


We can use many algorithms for a recommendation system. Some are baseline only, KNNWithMeans, KNNBaseline, SVD, Co-clustering.


We will be using the KNNWithMeans algorithm for our prediction.


trainset, testset = train_test_split(data, test_size=0.3,random_state=10)
algo = KNNWithMeans(k=5, sim_options={'name''pearson_baseline''user_based'False})
test_pred = algo.test(testset)


We use the fit() method to train our algorithm on our training dataset and the test() method to return the predictions made from the dataset.


We will be using RMSE metrics to get the accuracy of the model.


print("Item-based Model : Test Set")
accuracy.rmse(test_pred, verbose=True)


Item-based Model : Test Set

RMSE: 1.3432





Collabarative filtering(mode-based)

from sklearn.decomposition import TruncatedSVD
SVD = TruncatedSVD(n_components=10)
decomposed_matrix = SVD.fit_transform(X)


Here X is the ratings_matrix which we created before. To avoid memory overfollow or reduce the dimension we use SVD.

correlation_matrix = np.corrcoef(decomposed_matrix)


We find the co-relation between each product.

correlation_product_ID = correlation_matrix[75]


After we have made a co-relation matrix. Let us check the recommendation for the index 75 product.



Recommend = list(X.index[correlation_product_ID > 0.5])


It will return all the movies which have a co-relation greater than 0.5.


That's the end of the implementation. That's the basic model of the Amazon recommendation system. For more reference, do check out the surprise documentation.


Frequently Asked Questions

  1. How does Amazon make use of the recommendation system?
    Amazon currently uses item-item-based filtering, which produces high-quality recommendations in real-time. This algorithm makes each user's homepage unique, based on their interests and previous purchase history.
  2. How does Amazon personalize using a recommendation system?
    Amazon uses Amazon personalize, a machine learning service, to share its personalization and recommendation technology with other companies. The primary purpose of this is to overcome problems that create custom recommendations such as delivering high-quality recommendations, preferences, behavior o users, users with no data, etc.
  3. What part of the Amazon sale is due to the recommendation system?
    According to reports, 35% of the amazons sales are due to its recommendations. So we can see how many recommendations have become a vital part of amazon sales or any other e-commerce site.


Key Takeaways

Let us brief the article; firstly, we saw the recommendation system, moving on, we saw different algorithms for the recommendation system. Furthermore, we saw how Amazon uses this technique to achieve high sales, and lastly, we saw the basic implementation on an amazon dataset.


Thus, I want to conclude this article by stating recommendation system has changed the current scenario by making the user choose the product of their choice and interests.

They make the content more personalized. And the heights amazon achieved just by making proper use of this advanced technology. 


Stay tuned for more awesome eyecatching articles.


Happy Learning Ninjas!

Was this article helpful ?
1 upvote


No comments yet

Be the first to share what you think