Data is one of the most important assets that an organisation has because it can be used strategically to make sure that an organisation remains relevant to the trends and needs of users. The primary purpose of organising data is to make analysing it easy and get results on the go.
All the decisions that an organisation takes are driven by the analysis done on the data. For example, whenever a company plans to release a new product, it carries out significant research based on data to ensure that it would be successful or not.
It gets really complicated for a tech giant like Google to store a humongous amount of data. To overcome this problem, the concept of data warehousing was introduced. A data warehouse is a central repository of information that can be analyzed to make more informed decisions. Data flows into a data warehouse from different sources and stored for further use.
But the data is of no use if it is messy and unorganised as it can’t be analysed to get concrete results. That’s where the two most popular data warehousing models Star and Snowflake come to the rescue.
In this article, we will discuss Star Vs Snowflake Schema and will also list out the key differences between these two. Let’s get started.
Star schema is one of the simplest data schemas for data warehousing. The structure of the star schema is similar to the structure of the star. In a star schema, a fact table is placed in the center, which references multiple dimension tables that look like a star when arranged in a diagram.
To understand the above definition clearly, we need to know about the following terms.
- Fact table: it is the central table in the star and snowflake schema. All the foreign keys of the associated dimension table are mapped to the fact table.
- Dimension table: a table that describes all the measurements recorded in the fact table. Contains all records that are needed to perform analysis.
It is also known as star join schema and is highly optimised to perform queries on large data sets.
Image Source: datawarehouseinfo.com
In the above example, we have a fact table fact_sales, and the primary key of the fact_sales table is composite and made by a combination of the foreign key from all five dimension tables. The arrangement of these tables in the diagram resembles a star shape.
Features of Star schema
- Star schema provides high-speed aggregation and performs operations like read and write very fast.
- Star schema has a very simple design.
- The design of the star schema is very flexible and can be changed easily as per requirements.
The snowflake schema is similar to the star schema, but the dimension table is normalized into multiple related tables in the snowflake schema. A complex shape like a snowflake emerges when the snowflake schema is arranged in a diagram. It only affects the dimension table and not the fact table.
Image Source: datawarehouseinfo.com
The above tables are the same that we discussed in the star schema but notice how the dimension tables have been normalized, which leads to the creation of sub-dimension tables. The arrangements look like a snowflake and hence the name.
Features of Snowflake schema
- The disk space usage is very low in the snowflake schema.
- There is no redundancy in the snowflake schema.
- The data is very structured in a snowflake schema which reduces the problem of data integrity.
Difference between Star Vs Snowflake Schema
Since we know everything about the star and snowflake schema, let’s create a table that discusses the difference between star and snowflake schema.
|Star Schema||Snowflake Schema|
|In star schema, we have only fact and dimension table||In the snowflake schema, we have a fact, dimension, and sub-dimension table.|
|Star schema uses more space in comparison to snowflake schema.||Snowflake schema requires less space in comparison to star schema.|
|The design of star schema is very simple and easy||The design of snowflake schema is complex|
|Execution of queries takes less time in the star schema||Execution of queries is a bit slower in the snowflake schema.|
|Dimension tables are not normalized in the star schema||Dimension tables are normalized in the star schema.|
|Star schema has high data redundancy||Snowflake schema has very low data redundancy|
|It has fewer foreign keys||It has more foreign keys|
Frequently Asked Questions
Star schema is one of the simplest data schemas for data warehousing. The structure of the star schema is similar to the structure of the star. Here, a fact table is placed in the centre, which references multiple dimension tables that look like a star when arranged in a diagram.
The snowflake schema is similar to the star schema, but the dimension table is normalised into multiple related tables in the snowflake schema. A complex shape like a snowflake emerges when the snowflake schema is elaborated.
The most notable disadvantage of the snowflake schema is that it requires complex queries with an increase in the number of joins. Due to an increase in joins, the queries become very complex which results in decreased performance.
Advantages of snowflake schema are as follows:
1. Disk space usage is very less in the snowflake schema
2. Snowflake schema has no redundant data
3. Maintenance is simple due to less data redundancy
Snowflake is OLAP (Online analytical processing) as it is used for complex and aggregated queries and is very useful in analytical purposes.
A snowflake schema is a data model for data warehousing. The snowflake scheme consists of a fact table linked to many-dimensional tables. Dimensional tables are further normalised, due to which each dimensional table has several
The article talked about Star vs snowflake schema. The article started with a formal definition of both the schemas and listed out each schema’s features. The article concluded with a table that discusses the difference between the star and snowflake schema.
By Pranchal Agrahari