Holiday Shift Algorithm
About:
To identify the best day of the week threshold for holidays across years to shift/not shift by -1 week by revenue contribution
Data:
- Historical sales data (2021 - current) - @ wholesaler, product pack, brand, week level
- State - wholesaler mapping
- 2021 - current national holidays data extracted from the holidays package (user input to select from the holidays list)
- 2021 - current state holidays data (user input)
Process:
- Create a consolidated list of all national and state holidays along with revenue at the week level
- For each holiday
- Get the day of the week (Sun:1, Mon:2, …., Sat:7) the holiday occurred across years and sort the day of the week in the ascending order
- Record the revenue by not shifting any year
- For each day of the week(sored), record the revenue by shifting the years with the day of the week <= the sorted day of the week and repeat this process for all sorted day of the weeks
- Get the best shift combination (including no shift) by the maximum revenue
- Shift the holidays by the best combination
- Post shifting the holidays, create lag(-1) and lead(+1) holidays for each holiday - year. For example, if Christmas of year Y1 is shifted to W49, then lag and lead weeks for Christmas in Y1 would be weeks W48 and W50 respectively and the holidays are named as Christmas_-1, Christmas_0, Christmas_1 for lag, exact and lead holidays respectively
- Broadcast the shifted holidays along with the window to wholesalers and generate the holidays data for the model
Illustration - Shift algorithm:
For example, - If Christmas occurred on Saturday, Monday, and Sunday on years Y1, Y2 and Y3, then the day of the week across the years Y1, Y2 and Y3 are 7, 2 and 1 respectively - Sorted day of the week list is [1, 2, 7] - Record the total revenue by not shifting any of the years by -1 week - Iterate through each sorted day of the week - For 1: Shift all years with day of the week <= 1. In this case, only Y3 would be shifted by -1 week and the total revenue is recorded - For 2: Shift all years with day of the week <= 2. In this case, only Y2 and Y3 would be shifted by -1 week and the total revenue is recorded - For 7: Shift all years with day of the week <= 7. In this case, Y1, Y2 and Y3 would be shifted by -1 week and the total revenue is recorded
- If the maximum revenue is identified when years with day of the week <= 2 are shifted, then 2 is selected as the best day of the week threshold for the holiday and the data across years Y1, Y2 and Y3 is shifted accordingly
Unit Test:
If lead/lag holidays are present in the holidays output data - For each exact holiday (Christmas_0, Superbowl_0, Labour_day_0, etc.) - Get the weeks where the holiday happened across years and compare the weeks with the original holidays data. The weeks in the output file should lie either lie in the same week or same week -1 from the original holidays - For each lag holiday (Christmas_-1, Superbowl_-1, Labour_day_-1, etc.) - Get the weeks where the lag holiday happened across years and compare the weeks with the original holidays data. The weeks in the output file should lie either lie in the same week -1 or same week -2 from the original holidays - For each lead holiday (Christmas_1, Superbowl_1, Labour_day_1, etc.) - Get the weeks where the lead holiday happened across years and compare the weeks with the original holidays data. The weeks in the output file should lie either lie in the same week or same week +1 from the original holidays
If lead/lag holidays are absent in the holidays output data - For each exact holiday (Christmas, Superbowl, Labour_day, etc.) - Get the weeks where the holiday happened across years and compare the weeks with the original holidays data. The weeks in the output file should lie either lie in the same week or same week -1 from the original holidays Refer the unit test logic table below