Fast Changing Dimension

Lesson Notes

Fast Changing Dimension Lesson Notes:

  1. Presentation Mode
  2. Static Mode

Video

Fast Changing Dimension (Mini Dimension)

  • When we have a dimension with one or more of its attributes changing very fast.

  • It causes a performance issue if we tried to handle this case similar SCD Type 2 because of the rapidly changing in this dimension and the table will includes a lot of rows for this dimension

  • We solve this case by separation the attributes into one or more dimensions. This technique also called mini-dimensions.

Fast Changing Dimension Handling

  • How to handle FCD?

    1. Identify the fast changing columns in dimension.
    2. Split the fast changing columns to a separate junk dimension.
    3. Map the junk dimension with the main dimension using mini-dimension
  • In the following example, we have Weight & B_pressure columns are FCD.

Patient_idNameGenderBirthDateWeightB_PressureUpdateDt
123AnnaF1968-01-1250110.02019-01-01
123AnnaF1968-01-1255130.02019-01-07
123AnnaF1968-01-1259115.02019-01-14
123AnnaF1968-01-1265120.02019-01-21
  • Split FCD columns from the other column.

    • Static columns
    Patient_idNameGenderBirthDate
    123AnnaF1968-01-12
    • FCD
    Patient_KeyWeightB_Pressure
    150110.0
    255130.0
    359115.0
    465120.0
    • Patient Mini Dimension*
    Patient_idPatient_KeyStart_DateEnd_Date
    12312019-01-012019-01-07
    12322019-01-072019-01-14
    12332019-01-142019-01-21
    12342019-01-21null

Fast Changing Dimensions Example

Fast Changing Dimension Design Discussion

  • Fast changing dimension design came from the need to handle the performance issue when we have a fact table that needs to join with a dimension that contains static and fast changing columns together.

  • The main reason to introduce this design was to help to enhance the performance for retrieving the information between the Fact and the dimension with minimum data redundancy.

  • If we check the first table in our example, we could think about the frequency of the change and the application which we need to handle. In this case, we have three scenarios as following:

    • First use case, assume we have a fact named clm_svc_dtl (claim service details) table that needs to get some of the static information from the patient dimension for example: patient gender, age, or just the reference for the patient_id. In this case, we will join directly with the patient dim on Patient_id.
      This case will be very fast as it will be one to one join with one record for each patient.

      • clm_svc_dtl
      clm_idclm_svc_dtpatient_idclm_typeclm_amount
      1232020-01-011231110.0
      2452020-01-011231130.0
      3672020-01-011231115.0
    • Second use case, assume we have a fact table membr_vst_dtl (member visit detail) that has the transactional data for the patient, and it needs to get the latest Weight and Blood Pressure. In this case, the fact will include the junk dimension surrogate key and join with the junk dimension to get the values.

      • membr_vst_dtl
      clm_idvst_dtPatient_Key
      1232019-01-011
      2452019-01-072
      3672019-01-143
      3672019-01-214
    • Third use case, assume we have a fact table membr_rsk_score (member risk_score) that calculates risk score based on Weight and Blood Pressure. This risk score needs to check historical data every day and compare the results with the previous for simple example: assume for each day we get the log of (the difference between current blood pressure minus the previous one plus the same for the Weight). In this case, we will join fact with junk with the mini dimension. Note: This case is not the normal case to check the historical data every day for all facts. So, the requirements to access the start and end date for this fact is not frequent request case.

      • membr_rsk_score
      rsk_scorecalc_dtPatient_Key
      0.52019-01-011
      0.62019-01-072
      0.32019-01-143
      0.672019-01-214
  • Let’s analyze the implementation strategies which can handle the above three use cases.

    • Keep the patient dimension as is and join with fact directly. The drawback of this design is that the size of the patient dimension will be huge, with the redundancy of data and performance issues when joining between Fact and dimension.

    • Split the patient dimension into two-dimension static and FCD. This solution has an example below, and it will work, but it has some problems. So, Let’s analyze why we can’t remove the junk table and use only the mini-dimension table?

      • The new dimension which contains the fast changing columns will have a bigger size (for more detail about how to choose the number of junk dimensions per table, please refer to the junk dimension lecture here). Now we will have a performance issue when joining between the new dimension and fact table.
      • The new dimension will include some columns which not required for the daily query results (start_data and end_date). These columns will cause to make the new table to be the bigger size and affect the performance.
      • One more issue here if we have a bigger number of the fast changing column the combination (as it is the cartesian product of the junk columns values) will make the table records to be bigger.
        Patient_idPatient_KeyWeightB_PressureStart_DateEnd_Date
        123150110.02019-01-012019-01-07
        123255130.02019-01-072019-01-14
        123359115.02019-01-142019-01-21
        123465120.02019-01-21null
  • The recommended solution as following

    • Split the patient dimension into three tables:

      • Patient Dimension.
      • Junk Dimension(s).
      • Mini Dimension(s).
    • This solution has some perspectives which we need to clarify it.

      • In case we have multi fast changing columns with lots of combinations, we could split it into one or more junk and handle this with one or mini dimension tables.

      • When we need to get the junk dimension information, we join with a smaller table, which will increase the performance as it is fewer table columns and size.

      • In case we need to get the historical data with the date columns, we can still get this information by joining between the fact, junk, and mini-dimension.

      • Patient Dimension (static columns)

      Patient_idNameGenderBirthDate
      123AnnaF1968-01-12
      • Junk Dimension
      Patient_KeyWeightB_Pressure
      150110.0
      255130.0
      359115.0
      465120.0
      • Mini Dimension
      Patient_idPatient_KeyStart_DateEnd_Date
      12312019-01-012019-01-07
      12322019-01-072019-01-14
      12332019-01-142019-01-21
      12342019-01-21null

Further Reading

Dimensional Modeling: In a Business Intelligence Environment. The book is free, and you can download it from this link. You can read the following:

  • Chapter 5.4.7 Slowly Changing Dimensions page 162.
  • Chapter 6.3.6 Fast Changing dimensions page 269.