New update is available. Click here to update.

Contents

What is Netflix?

Problem Statement

Architecture and Components

Onboarding Content

Microservices

Databases

Searching and Data Processing

More Interview Questions

Additional Quips

Further Readings

Designing Twitter

Contents

Twitter and it's Features

Problem Statement

Low Level Design

Architecture and Components

User Timeline

Home Timeline

Trending Hashtags

Searching Inside Twitter

Databases

More Interview Problems

Introduction and Features

Problem Statement

Architecture and Components

Low-Level Design

Different Approaches to Store URL

Databases

More Interview Problems

Contents

Problem Statement

Introduction and Features

Architecture and Components

Uber Challenges

Demystification of Uber System Design

Uber Map and ETA Calculation

More Interview Problems

Contents

Problem Statement

Introduction and Features

Requirements and Approximations

Low-Level Design

Architecture and Components

News Feed Generation

Databases

More Interview Problems

Contents

What is BookMyShow?

Problem Statement

Low Level Design

Architecture and Components

How does our System talk with Theatres?

How to Sync Ticket Availability with Theatres?

Database

Interview Questions

Contents

Introduction

Requirements

Architecture and Components

Services

Database Schema

Working Flow

Contents

What is Dropbox?

Problem Statement

Low Level Design

Architecture and Components

Contents

What is Pastebin?

Low Level Design

Architecture and Components

Contents

Introduction

Problem Statement

Object Oriented Design

System Design
Report an issue
More Interview Problems

 

More Important Interview Questions

footer line

 


Q 1. As you have a table for users, tweets, user-follower mapping, and other things, which sharding techniques you’ll use to manage this gigantic data?  (Facebook)

 

Ans. A huge number of new tweets are generated every day due to which the load is extremely high and a single database can’t handle it. We need to distribute our data into multiple machines by sharding the data. The sharding of data can be done in the following ways:

 

1. Sharding based on UserID: It is based on hashing UserID where we will map each user to a server where all of the user’s tweets, favorites, follows, etc are stored. This approach does not work well if users are trending and we end up having more data.

 

2. Sharding based on TweetID: based on TweetID, we map each tweet to a server that stores the tweet information. To search for tweets, we have to run a query for all servers, and each server will return a set of tweets. This approach solves the problem of trending users but increases the latency.

 

3. Sharding based on TweetID & create time: generating TweetId based on the creation time. We then shard the database based on TweetId. This approach is similar to the second approach and we’ll still run a query to search for tweets. However, the latency is improved as we don’t need a separated index for timestamp when sorting tweets by create time.

 

 

Q 2. How you are able to do Synchronous Database queries for managing tweets? (Oracle)
 

Ans. Synchronous database query is the first thing to consider when we are talking about designing a big network like Twitter. It will help us towards high-level architecture. We can design a solution for two things:

 

Data modeling: We can use a relational database like MySQL and you can consider two tables: user table (id, username) and a tweet table[id, content, user(primary key of user table)]. User information will be stored in the user table and whenever a user will tweet anything it will be directly stored in the tweet table. Two relations are also necessary here. One is the user can follow each other, the other is each feed should have a user owner. So there will be a one-to-many relationship between these two tables.

 

Serve feeds: We need to fetch all the feeds from all the accounts a user follows and arrange them in chronological order.

 

 

Q 3. How the user is able to see Notification for activities related to his news feed? (LinkedIn)

 

Ans.  There are different options for displaying new posts to the users.

 

  1. Pull model: Data can be pulled on a regular basis or manually whenever the client wants. The problem of this approach is the delay in updates as new information is not shown until the client issues a pull request. Most of the time pull requests result in an empty response as there is no new data, and it results in a waste of resources.
  2. Push model: In this model, whenever a user creates a new tweet, it will send a push notification to all the followers. A possible problem with this approach is that an account having millions of followers creates a new tweet, the server has to push updates to a lot of people at the same time.
  3. Hybrid: It will combine both pull and push models. The system only pushes data for those who have hundreds of (or thousand) followers. For users having millions of followers, we can let the followers pull the updates.


 

Q 4. Suggest any way to improve the Timeline update process. (Twitch)
 

Ans. If the system treats all users the same, the interval of timeline generation for a user will be long and there will be a huge delay in his/her timeline for new posts. One way to improve that is by prioritising the users who have new updates. The new tweets will be added to the message queue, timeline generator services pick up the message from the queue and re-generate the timeline for all followers.

 

 

Q 5. Can you tell how home timeline generation takes place for the user if he/she is following any celebrities(users with too many followers)?
 

Ans. The timeline should contain the most recent posts from all the followers. It will be super slow to generate the timeline for users with a lot of followers as the system has to perform querying/merging/ranking of a huge number of tweets. Hence, the system should pre-generate the timeline of the user instead of generating it when the user loads the page.

There should be dedicated servers that are continuously generating users’ timelines and stores them in memory. Whenever users load the application, the system can simply serve the pre-generated timeline from the cache. Using this scheme, the user's timeline is not compiled on load, but rather on a regular basis and returned to users whenever they request it.