Sunday, August 22, 2021

5 Ways to do YoY and YTD

YoY and YTD calculations are fundamental business metric - afterall who doesn't want to know how they compared to last year, last month or how they are tracking on a year to date basis.  It seems that users at all experience levels have questions on how to complete period over period and period to date type calculations or have different expectations for the outcome. (See the recording  VizConnect Tableau Community Data Doctor Session August 2021 - YouTube )

Here we will look at 5 different approaches starting with the most straightforward - table calculations to the advanced duplicating the dataset and joining it to itself.

I will be using subsets of Superstore data in the calculations. (Note:there is a link to downloadable workbook)

1 Table Calculations

The easiest way to do a period over period calculation is with a standard table calculation  Percent Difference:




Easy enough, but most users want to eliminate the empty column for the initial year in the series.  That can not be done with a table calculation

2 LOD and a fixed starting point

The second method uses 2 LODs and requires a fixed starting point.  That starting point be from a parameter, the current date (Today())  or the latest date in the data set {max(date)}  The example uses Today() -

The current YTD value can be determined using the LOD:

{ FIXED [Category]: sum(  if DATETRUNC("year",[Order Date])= DATETRUNC("year",TODAY())and DATETRUNC("day",[Order Date])<= DATETRUNC("day",TODAY()) then [Sales] end )}

Where the year of the date dimension is checked against the year of Today() and all the days in the year prior to and including Today() and included in the total. 

Similarly the prior YTD value calculation is:

{ FIXED [Category]: sum(  if DATETRUNC("year",[Order Date])= DATETRUNC("year",dateadd("year",-1,TODAY())) and DATETRUNC("day",[Order Date])<= DATETRUNC("day",dateadd("year",-1,TODAY())) then [Sales] end )}

and compares the year of the date to 1 year prior to Today()  using a Dateadd function.

Then the YoY percent difference is simply calculated using:

(sum([Current YTD Sales (today)])-sum([Prior YTD Sales (today) ]))/

sum([Prior YTD Sales (today) ])




The LOD approach is the one I recommend most often - the LODs for the Current and Prior YTD are portable from sheet to sheet and can be used in any other calculation.  They are also easily converted to use with a parameter for the end date or the max date in the data set by substituting [Parameter Date] or {Max(date)}  for Today() in the forumlas.

3 Variable period  POP

Often users want to compare the YTD versus PYTD or MTD versus PMTD (or more)  based on a parameter selection by the end user.  First a Parameter is need to select the basis for comparison




then 2 Case statements for the Current or Prior period to date totals - the case statements use LODs similar to those used in the previous example. (Note:The formulas can be downloaded in the workbook).  




Selecting the Year returns the YTD (July 9, 2020) versus Prior YTD (July 9, 2019) 

And selecting Month will compare MTD (July 9, 2020) to Prior MTD (June 9, 2020)


  


More levels can be added to the parameter and the case statements to look at different data part levels.

4 Dynamic YoY comparison

Less frequently, a user will want to compare make a YoY comparison over a range of dates and that presents a problem with the first 3 approaches.  Table calculations do a comparison by position in a data table, not by the actual date and the LOD solutions depend on fixing an end date.

A dynamic or running YoY comparison can be done but it will take duplicating the data set and joining it to itself offsetting the date by a year.  But it can only be done at levels where the data set is solid - i.e. no voids in the data set

The first example uses a very simple data subset of the Superstore data that only includes the order date, the category, subcategory and the sales value.  Step 1 is to join a copy of the data with itself at the subcategory, month and year level but offsetting the year - (Note I have used the physical layer to explicitly join the files together )



When you open any worksheet there will be 2 data sets in the data frame - one is the original data set



and the other is the copy that was joined to create a Prior year data set



The comparison can be made directly by placing the current and prior year sales on in viz and the current years on columns 


So much of the basics, now let's look at expanding the model at creating a rolling YoY using the approach. To keep the visual simple I have expanded the date to quarterly level data and compressed the categories so we are looking at totals only
For the current and prior year rolling totals I used a table calculation

Current year:

window_sum(sum([Sales]),-3,0)

and Prior year

window_sum(sum([Sales Prior Year]),-3,0)  

They result in this:


 
5 Using a scaffold to fill the nulls

But there is a problem when the data is sparse - where there are nulls or voids in the data. 

I expanded the dataset to the Month and day level and that introduced voids in the data



The prior year data will no longer total properly due to the nulls in the data 



To correct the problem requires first creating the missing cells. That is done by scaffolding the data - I used Prep to create a ladder at the date/category level and joining it back to the original data


resolving the nulls and the prior year now totals correctly



I hope this helps the next time you need to do a YoY analysis and helps select the best method to use in your situation

The workbook use here can be downloaded at :Link to workbook


Jim

Wednesday, May 26, 2021

Zero to Zen and 5 Stops Along The Way

 Zero to Zen covers the 5 foundational concepts I needed to learn on the journey from spreadsheets to Tableau

Like many, I struggled when first introduced to Tableau 5 years ago. I don't come from a data scientist background. - Actually I'm and engineer, very analytic and expert level spread sheet skills but I just didn't get it at first.

Looking back there were 5 concepts that I needed to understand which were not at all like spreadsheets before I was comfortable working in Tableau.  Over a 5 week series I shared those lessons learned - Below are brief summaries of each and links to the recordings and the workbook used in each session.  

Hope you enjoy

Jim

Session 1 - Data Structure and Connections 

Three types of connections, Joins, Unions and Blending and the limitations of each were covered.  Also reviewed were several how to use cases include why it is sometimes necessary to use scaffolding to restructure the data set 

Session recording can be found at:  Session 1 Recording

The session workbook can be downloaded at: Session 1 Workbook  


Session 2 - Order of Operation 


Misuse of the Order of Operation is the usual cause when users get unexpected results in their analysis.  It is the sequence that Tableau goes through to first create the data table for each worksheet ( That's right - each worksheet has its own individual data table that is a subset of the full data set uploaded into Tableau). Once the structure of the data table is established, values are loaded, calculations performed, and the visual presentation created.  Frequent areas of error are misapplication of context filter, trying to filter across blended data sources, and attempting to "fix" table calculations for use on other sheets 


Session recording can be found at:  Session 2 Recording

The session workbook can be downloaded at: Session 2 Workbook  


Session 3 - Simple Calculations and Date Functions


This first of 2 sessions on calculations covers the use of the Calculation Wizard, using aggregations to do simple math calculations, If - Then conditional statement and date functions. Dates are a special data type in Tableau which self sequence, sort and allow the use of the 4 types Date Functions - those which return dates, refer to dates, are used in date math, and the 2 special Today() and Now().  Use case show how dates are treated as discrete or continuous or can still be continuous yet presented as discrete

Session recording can be found at:  Session 3 Recording

The session workbook can be downloaded at: Session 3 Workbook 


 Session 4 - LOD Expressions and Table Calculations


My favorite, LOD expressions provide versatility by allowing the user to create layers in the data set and then use the layer like any other measure in calculations, nested in other LOD's or in table calculations.  Table calculations are last in the order of operation and are executed on the data table for the specific worksheet making the unique the sheet where they are applied.  Use cases reviewed nesting LOD's in table calculations, nesting table calculations, totally Top N and Other and the use of Show Hide filters


Session recording can be found at:  Session 4 Recording

The session workbook can be downloaded at: Session 4 Workbook


 Session 5 - Filters, Sets and Parameters 


Filters, sets and parameters are the user interface to the viz, allowing users to probe and drill into the data and gain meaning and understanding.  Filters remove data  and change the structure of the data table for the worksheet, sets retain all the data in the data table but categorized it into the In and the Out sets and finally parameters are the other way users can input a single, static value into Tableau.  Over the series a number of seemingly fundamental concept were introduced or review yet in combination those basic idea can be used to create any more complex visualization -


Session recording can be found at:  Session 5 Recording

The session workbook can be downloaded at: Session 5 Workbook


Feel free to download and use the recordings and workbook or any other material on my blog or Public Profile - that's why it is there


Jim









Friday, February 19, 2021

FAQ Series - When and How to Scaffold Data

 Data structure has a huge affect on the results you get from your analysis. Like most of you, I trained using well structured and full data sets - where there was a value in every record (cell) - like a tightly fit puzzle

  



With real world data that is often not the case. Data can come from ERP systems, stand alone third party or public sources and contains voids or is sparse or otherwise badly structured resulting in gaps in the data.




Scaffolding is one way to re-structure the data and fill the gaps.  


We are going to look at 3 different frequently encountered use case where data needs to be scaffolded to solve the problem.  The cases are adapted from actual questions that were posted on the Forum - data and names changed

  






1 - The 2 Date Problem

 

The '2 Date Problem' is a classic case - each record in the data has a start date and end date and you need to look at all the periods in between.  It comes up when you need to count the active programs, total cash flow over a series of mortgages or annuities, or, as in this example count head count over time.

The question came from an HR manager who had a number of open positions. Each had an "open date", some had been filled and others were still open.  The manager wanted to count all the open positions and separately those filled by month.  

The data looked like this:


 

Each requisition had start and end dates as independent values but there was no calendar common to every record.  The solution was to build a calendar using a scaffold. In this case, the manager was only interested in monthly totals so a monthly date ladder would suffice: 


The scaffold and the base data were joined so that every record in the base data was connected to each record in the scaffold - a "Cartesian join"  - with a 1=1 join clause



Using the scaffold date resulted in a calendar that included each job requisition


All that was need were 2 LOD expressions to count the filled or open jobs by month

the filled LOD is 


{ FIXED [Job Ref ID1] , [Scaffold Date]:  countd(  if  not ISNULL([Job open date]) 

and DATETRUNC('month',[Job filled date])=DATETRUNC('month',[Scaffold Date])

then [Job Ref ID1]  end )}


and the open LOD is 

{ FIXED [Job Ref ID1] , [Scaffold Date]:  countd(    

if DATETRUNC('month',[Job open date])<= DATETRUNC('month',[Scaffold Date]) 

and ( DATETRUNC('month',[Job filled date])>DATETRUNC('month',[Scaffold Date])

or ISNULL([Job filled date]))then [Job Ref ID1]  end )}


and the result looked like this



By using the date scaffold to create a calendar, the solution was reduced to just checking each month to see which positions were still open and which had been filled.


2 When drilling down breaks the analysis

Data that appear to be solid a high levels of aggregation can be sparse or have voids when drilling down a dimension hierarchy or to lower date levels.  The problem is common with data that comes from ERP systems (like Oracle or SAP).  ERP systems are transaction driven and only create a record when a transaction takes place e.g. if there are no sales for a particular SKU on a date there is no record in the data set.  

To illustrate the issue we will look at an inventory analysis where a production planner wanted to calculate a very common metric used as an indicator that inventory levels were either too high or low.  

For those not familiar with EPR systems and the forecasting to production process, a quick review. The ERP system records sales directly from the sales invoice. That data is precise - we know the customer, the date, the SKU (stock keeping unit), the unit quantity and the dollar value.  The data can then be aggregate along several different hierarchies - here along the product and is still very accurate although somewhat less useful 




Forecasters and product managers will use the sales history to create a rolling forecast that is used by the production planning and purchasing departments to  develop the production schedule. The process starts at the top on the pyramid and works downs, each level being a lower level of aggregation that is more useful to the planner but less accurate than the preceding 

Forecast are usually adjusted monthly along a 3 or 4 month forecasting horizon.  The production schedulers take the lowest level -Units by SKU by week and schedule production, purchase components, and plan inventory levels. While the forecasting and planning process are numbers in the system and the results become less accurate at lower levels in the pyramid, actual production and inventory levels are precise.  You know exactly how many units were produced for each SKU and the inventory levels.  

With the uncertainties in the process, planners track a metric as an early warning when the forecast is too high or the inventory too low so they can adjust the production schedule. The metric is Days on Hand (or days Sales, or just Days). The calculation is just 

Sum(Inventory level) / Average daily sales

That is where the user question fits in.  He was an production planner manager and was planning a single item, had 4 customers to consider and was pulling sales, inventory and production data from their ERP system. The data looked something like this with a separate file for each customer:




He was getting different Days on Hand results that could not be reconciled for individual customers or across the group as a whole.

After filtering the results to show only the last date and they looked something like this across all 4 companies:


But some results look odd


So lets see what is going on:

First the formula for the Seven Day Avg Sales is  window_avg(sum([Sales]),-6,0)
and that is a clue to the first problem - Window average is a table calculation that looks at the last 7 records, not days but 
this is what the actual data looked like when all the records were visible 



Many Nulls - very sparse data  and it is clear that for Customer D there was only 1 sale in the final 7 days and that was for 1000 units so the average did not take into account the nulls. The nulls in the data were "no record null" see Null Types and a scaffold is needed to be used to force a record into the data for each combination of customer name and date.

Tableau Prep was used to force the record into the data - the first step joins the customer list to the date list (Cartesian)


The second joins the original data to the ladder 




after scaffolding the last seven data data looks like this ( note zn() functions and previous applied ) - and the data now runs through the end of February


and the last day summary across the 4 companies is


The actual Days on Hand when the missing data has been include is very different than the original results. This is a case where the planner (the Subject Matter Expert) would not know what the data structure looks like nor should he - that is the role of the dashboard designer and data scientist - It is also a by product of the data source - the ERP system that could affect your next analysis


3 Voids in the data

Voids, single or small groups of no records nulls in the data, will return a null (blank screen) if the unsuspecting user filters to that combination of dimensions.  
The user had created a viz using small multiples - a great way to see the changes across all values of a dimension in a single view  - here using Superstore data (note there are 17 images one for each sub-category in the data set)



When the date is filtered to on February 2016, 4 images disappeared (returned a null) 



The user had 2 questions - Why did it happen and then how to correct the problem

When the user filtered to February there are no records in 4 subcategories - the voids returned Nulls causes the 4 image to "disappear"


 Here using a scaffold to force a record into each void will solve the problem.  

I used Tableau Prep to create the scaffold. First Cartesian join the date and subcategory fields



Then connect that to the Superstore dataset




But scaffolding exploded the data set to 28,000 records - at the same time it did solve the users problem




Sometimes it is necessary to think beyond calculations and functions to create the viz you want to present and then it is often that the data need to be re-structured. When the data contains voids or is missing records, scaffolding is an option to make the data set solid.


Hope this helps you better understand when and how to use different forms of scaffolding.  The workbook used here can be downloaded at Link to Examples

Enjoy
Jim

also see the video presentation on VizConnect at Video at

Post splash image

5 Ways to do YoY and YTD