Dimensional Data Modeling

Amisha Purswani
Last Updated: May 13, 2022

Introduction

Data in a warehouse are usually multidimensional, having measure and dimension attributes. Dimensional data modeling is a set of rules to design database tables structure for easier and faster data retrieval. It gives an idea of how to store data in database tables, making databases understandable and straightforward. We design this model to support end-user queries in the data warehouse.

The basic idea of designing a dimensional data model is based on two factors.  

  • Simplicity
    • The dimensional model should be easy to understand.
    • We design separate models for each business process.
  • Performance
    • The dimensional model enhanced query performance
    • Dimensional models are scalable, and their structure is flexible for easy expansion.

Step to create a Dimensional Data Model. 

  1. A business process
  2. Business process summarization 
  3. Determining the granularity of the fact table
  4. Identifying the dimensions and hierarchies 
  5. Establishing referential relationships
  6. Fragmentation (Storage distribution strategies)

Steps to review Dimensional Data Model 

  1. Choose the business process 
  2. Declare the grain 
  3. Identify the dimensions 
  4. Identify the fact

Fact Table and Dimension Table

A fact table contains the business process's measurements, metrics, or facts. It is located at the center of a star schema or a snowflake schema, whereas the dimension table stores all attributes or dimensions that describe the objects in a fact table.

Schemas in Dimensional Data Modeling  

A database is made up of one or more tables, and the database schema is the set of relationships between all of the tables in the database. Dimensional data modeling is done on both a logical and physical level. Dimensional data modeling offers another layer to your data models, and it's compatible with a wide range of database management software.

We can get the following benefits from dimensional data modeling:

  1. We can design queries that answer business-related inquiries. A query, in most cases, calculates some measure of performance in several business dimensions.
  2. We can write SQL queries. Most RDBMS companies utilize the SQL language.

The measurements that quantify the business are physically separated from the descriptive components that categorize the business in a dimensional schema.

A physical or logical schema is referred to as a dimensional schema.

A physical dimensional schema is commonly shown as a star or snowflake schema, with the objects in the star or snowflake schema representing database tables.

The dimensional schema can even be represented as a single table or view, with all facts and dimensions included inside separate columns of that table or view.

The fact, measurements, and dimensions are represented as entities and attributes in a logical dimensional schema independent of the database vendor and may thus be transformed into a physical dimensional schema for any database vendor.

There are three types of commonly used schemas in Dimensional Data Modeling.

  1. Star Schema

A star schema is a relational database schema with a single, central fact table surrounded by dimension tables.

   2. Snowflake Schema

The snowflake schema comprises a single fact table linked to several dimension tables, each of which can be linked to another dimension table via a many-to-one connection.

   3. Fact Constellation Schema

A fact constellation combines many fact tables that share dimension tables and can be viewed as a constellation of stars.

Star Schema (Star Join Schema)

The star schema is a multidimensional data representation format for relational databases. One or more fact tables link to any number of dimension tables in the star schema. The star schema is a subtype of the snowflake schema that is more efficient when dealing with simple queries.



 

The center of the Star consists of a fact table, and the points of the Star are the dimension tables. It is a star schema because the diagram resembles a star, with points radiating from a center.

Advantages of star schema

  • It is the simplest and easiest schema to understand design.
  • It optimizes the navigation through the database.
  • It is suitable for query processing on large data sets.

Snowflake Schema

The star schema is expanded into the snowflake schema. Each point of the Star generates additional points, or we can say that in the snowflake schema, the fact tables share a dimension, and that dimension table links with another dimension table. Star and snowflake schemas are most widespread in dimensional data warehouses and data marts, where retrieval speed is more important than data manipulation efficiency. As a result, the tables in these schemas are rarely normalized and are usually created at a normalization level lower than the third normal form.

 

Advantages of Snowflakes

  • Fewer redundancies due to the normalized dimension table.
  • Dimension table in the snowflake schema saves storage space and is easier to update and maintain.

Disadvantages of Snowflakes

  • It is a complex schema.

Fact Constellation Schema

It is also known as Galaxy Schema. In this, multiple facts tables share the dimension tables.

It is a widely used schema that is more complex than the star and snowflake schema.




Advantages of Fact Constellation Schema

  • It's a user-friendly schema that encourages people to use it.
  • Tables are subdivided into fact and dimensional to understand the relation between them.

Disadvantages of Fact Constellation Schema

  • It is a complex schema due to multiple fact tables.
  • It isn't easy to manage.
  • Dimension tables are enormous.

Difference between Star and snowflake schema

Star schema

Snowflake schema

  • Star schema is highly denormalized.
  • It has a category-wise single dimension table.
  • Star schema has more data dependency and redundancy.
  • There is no need for complicated joins in star schemas. 
  • Dimension tables in snowflakes are highly normalized.
  • In the snowflake schema, dimension tables are further split into additional tables.
  • Snowflake schema has less data dependency and redundancy.
  • In the snowflakes schema, complicated joins are required.

 

Star Schema Vs. Snowflake Schema Vs. Fact constellation Schema

Star schema

Snowflake Schema

Fact Constellation Schema

  • It has a single large central fact table and one table for each dimension

 

  • Every fact points to one tuple in each dimension and has additional attributes.

 

  • Star scheme does not capture hierarchies directly.

 

  • It is a variant of the star schema model.

 

  • It has a single, large and central fact table and one or more tables for each dimension.

 

  • Dimension tables are normalized. Data is split into different dimensional tables.

 

  • Constellation schema multiple fact tables share dimension tables.

 

  • This schema is viewed as a collection of stars called galaxy schema or fact constellation.

 

  • Sophisticated applications require such schema.

 

FAQs

  1. Why do we need database schemas?
    Database schemas are very important because they help us visualize how a database should be structured in the most optimized way.
     
  2. What is a fact table?
    A fact table contains the business process's measurements, metrics, or facts. It is located at the center of a star schema or a snowflake schema.
     
  3. What is normalization?
    Normalization is the process of organizing data in a database. This involves creating tables and linking them according to principles that attempt to protect the data while also making the database more flexible by reducing redundancy and conflicting dependencies.
     
  4. What is dimensional data modeling?
    Dimensional data modeling is a database design approach to support end-user queries in a data warehouse.
     
  5. What is a dimension?
    Dimensions are logically related attributes that function as an axis for data modeling. A  dimension table is a table related to each dimension and helps further the description of that dimension.

Key Takeaways

We have learned "dimensional data modeling" and " different kinds of schema'' in this blog.

The Dimensional model is a database design technique to support end-user queries in a data warehouse. Then we learned about schemas. A database schema is like a skeleton structure that represents the logical view of the entire database.

There are three types of schemas: Star, snowflake, and fact constellation.

Visit here to learn more about different topics related to database management systems.

Also, try CodeStudio to practice programming problems for your complete interview preparation. Ninja, don't stop here; check out the Top 100 SQL Problems to get hands-on experience with frequently asked interview questions and land your dream job.

 

Was this article helpful ?
1 upvote

Comments

No comments yet

Be the first to share what you think