Data Join and Blend in Tableau


Joining and Blending Data in Tableau



There are 2 main ways to combine data:
  • Joins
    • Combine tables by adding more columns of data across similar row structures.
    • This can cause data loss or duplication if tables are at different levels of detail
  • Blends
    • Unlike joins, never truly combine the data
    •  Blends query each data source independently, the results are aggregated to the appropriate level, then the results are presented visually together in the view.
    •  Because of this, blends can handle different levels of detail and working with published data sources.
    • Blends are also established individually on every sheet and can never be published, because there is no true “blended data source”, simply blended results from multiple data sources in a visualization.
  • Differences between joins and data blending
    • Data blending simulates a traditional left join. (The Primary Data left join Secondary Data)
    • The main difference between the two is when the aggregation is performed:
      • A join combines the data and then aggregates.
      • A blend aggregates and then combines the data.
  • Resource: Tableau Forum
  • Example:
    •  Data Source:
      • Region Table:

      • Sales Transaction Table:

    • Join
      • Data table outcome:

        • We join data by dragging the data table into the data source area

        • Note that our data got duplicated due to different level of detail of each data set. The One end of One-to-Many relationship got duplicated for each match with the ID.
        • We can create a calculated field using FIXED formula to remove the duplication for the measure on the right-hand side of the join.
        • Note that, we can no longer use the original measure field (Sales Goals) anymore in the dashboard, we must use the new calculated field create by FIXED formula.
      • Bring it into a visual

    • Blend
      • Data table outcome:

        • Note: the data source are both separate. We added new data by adding new data source (CTRL + D)

        • After that, we need to setup relationship for the data source. The “Many” side will be the “Primary data source” and the “One” side will be “Secondary data source” with the custom relationship on the common columns from the two tables.

        • We start building the visual by dragging elements from the Primary data source first then Secondary data source.
        • Make sure the connection icon on the Secondary data source is active to enforce the relationship.

      • Bring it into a visual

  • Final thought
  • In my opinion, when building dashboard using combination of data sources with different level of details, we should use blending method to avoid confusion and unnecessary need for Level of Detail (FIXED) calculated fields. It requires user to be more aware with the relationship but it definitely more effective and cleaner.
  • Sample data and tableau workbook: https://github.com/Anhsnotes/Tableau_join_and_Blend






Comments

Popular posts from this blog

R and Shiny App, a path from data engineering to web app deployment in one language

PostgreSQL Date table

PTO Tracker