Science Side data transformation and data fixes

We interpolate certain signals from annual, quarterly and monthly frequency to weekly. We also apply certain fixes to other signals' data bugs which could make the impact the model training adversely.

Data transformations

1) FRED Data:

For all state and national level economic data we interpolate any annual, quarterly, and monthly data to weekly data by back-filling ad forward-filling the data to all week dates.

Data Hacks

1) Distribution data Hacks: 1. We interpolate values where distribution signals exceed 3 standard deviations from the rolling mean. Interpolation is performed only if distribution is non-zero. 2. We forward fill the simple distribution wherever the signal value exceeds 1 and replace beginning and end values with their nearest dates for all distribution signals. 3. We impute the simple and multiple distribution signals for weeks where sales data is observed but distribution signal values are missing. 4. We impute missing/outlier values in distribution signals for specific brands and using either mean imputation, forward fill, backward fill, rolling mean or linear interpolation. 5. We clip the values of simple distribution signal to be between 0-1 and of multiple distribution to be always more than 1. 2) Investment Data Hacks: 1. We fill the investment data of previous years for wholesaler code “1408” as it is missing data from 16th December 2023 to latest, but sales and volume exists for this period. 2. We fill the investment data for the vehicles that have a big spike on 31st Dec 2022 based on sales on December 2022. 3. We fill the investment data for the vehicles that have a big spike on Superbowl time based on sales on that time. 4. We take Kona Bwave Ale’s (KGA) sale as thrice the original sale. KGA brand is invested in as a massive growth brand which we as a business believe it can handle efficiently. We don't have the data behind this business belief and so our model would always differ in recommendations compared to business expectations. This hack is to make the model's recommendation closer to business expectations. 5. We drop certain vehicles investments from all wholesaler-brands. 6. We drop the vehicle “vehicle_Commerce_E_Commerce” and add its investment data to vehicle “vehicle_E_Commerce” . 3) Price Data Hacks: 1. We impute the spiky price data for wholesaler-brands to by either clipping, back-filling, or forward-filling them.

4) Sponsorship Data Hacks: 1. For Michelob Ultra, the sponsorship spend percentage spread across calendar year 2021 is made to be the same as in 2023. 2. For certain brand-league event pair we take the investment data out from certain months and spread them across rest of the months based on the rest of the months spend percentage spread.