IOptimization

Bases: ABC

The IOptimization interface serves as a base for all optimization classes to inherit from. It encompasses the following principles:

  1. It provides fundamental functionalities shared by all optimizations.
  2. Non-abstract methods and properties are generally not meant to be overridden.
  3. Methods and properties marked as 'final' should not be overridden.
  4. Strict adherence to data types is expected.
  5. The batch input and output dataclass types are primarily for typing purposes and can accommodate various data types. However, it's essential to have the specified required arguments available.
Source code in wt_ml/optimizer/base/optimizer_base.py
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
class IOptimization(ABC, metaclass=IOptimizationMetaClass):
    """
    The `IOptimization` interface serves as a base for all optimization classes to inherit from.
    It encompasses the following principles:

    1. It provides fundamental functionalities shared by all optimizations.
    2. Non-abstract methods and properties are generally not meant to be overridden.
    3. Methods and properties marked as 'final' should not be overridden.
    4. Strict adherence to data types is expected.
    5. The batch input and output dataclass types are primarily for typing purposes and can accommodate various
       data types. However, it's essential to have the specified required arguments available.
    """

    AXIS_MAPPING: dict[AxisType, int] = {
        "wholesaler": 0,
        "brand": 1,
        "time": 2,
        "vehicle": 3,
    }
    _losses: dict[str, CalculatedMetric] = {}
    _metrics: dict[str, CalculatedMetric] = {}

    @final
    @property
    def _default_metrics(self) -> dict[str, CalculatedMetric]:
        """Default metrics that all optimizers will use."""
        return {
            "ROI": lambda batch: np.sum(self.simulate(batch).vehicle_impacts) / (np.sum(self.vehicle_spends) + EPSILON),
            "Profit": lambda batch: np.sum(self.simulate(batch).vehicle_impacts) - np.sum(self.vehicle_spends),
            "constraint_loss": lambda _: np.mean(np.minimum(self.get_constraints(), 0)),
            "Impact": lambda batch: np.sum(self.simulate(batch).vehicle_impacts),
            "Spend": lambda _: np.sum(self.vehicle_spends),
        }

    def __init__(self, name: str, hyperparameters: HyperparameterConfig):
        self.name = name
        self._hyperparameters = hyperparameters

    def _post_init(self):
        for metric_name, metric in self._default_metrics.items():
            IOptimization.add_metric(self, metric_name, metric)

    # TODO(legendof-selda): get constraints in a user intuitive format. similar to old OptimizationIngestion.
    @abstractproperty
    def constraints(self) -> Constraints:
        """Returns list of constraints used for optimization."""

    def hyperparameters(self) -> HyperparameterConfig:
        """Hyperparameters used."""
        return self._hyperparameters

    @abstractproperty
    def vehicle_spends(self) -> VehicleSpendType:
        """
        The vehicle investments variable which is being optimized.
        Investment amounts for each batch (location*product), time and vehicle.
        """

    @abstractmethod
    def simulate(self, batch: Type[OptimizationInput]) -> Type[OptimizationOutput]:  # noqa: U100
        """For the given `batch` input simulate the impacts received.

        Args:
            batch (OptimizationInput): Optimization input that contains `vehicle_spends: VehicleSpendType` & others.

        Returns:
            Type[OptimizationOutput]: The impacts for the given investment amounts.
        """

    def __call__(self, batch: OptimizationInput) -> Type[OptimizationOutput]:
        return self.simulate(batch)

    @abstractmethod
    def optimize(self, dataset_factory: DatasetFactory, epochs: int, **_kwargs):  # noqa: U100
        """Optimize for the given number of epochs.

        Args:
            dataset_factory (DatasetFactory): A generator that returns `OptimizationInput` in batches.
            epochs (int): Number of epochs to optimize.
        """

    def get_constraints(self) -> np.ndarray:
        """Function to apply and gather all the constraints.

        Returns:
            np.ndarray: Stacked constraints applied on vehicle_spends.
        """
        if len(self.constraints) == 0:
            return np.array([0])
        gathers = []
        for constraint in self.constraints:
            gathered = self.vehicle_spends
            for axis_name, indices in constraint.gathers:
                axis = self.AXIS_MAPPING[axis_name]
                gathered = np.take(gathered, indices, axis=axis)
            constrained = np.sum(gathered) - constraint.max_value
            if not constraint.negate:
                constrained = -constrained
            gathers.append(constrained)
        return np.stack(gathers, axis=0)

    @final
    def create_result(
        self,
        location_type: Literal[LOCATION_TYPES],
        dataset_factory: DatasetFactory,
        encodings: dict[str, Any],
        return_dataframe: bool = True,
    ) -> dict[str, dict[tuple[str, ...], dict[str, float]]] | pd.DataFrame:
        """
        Returns results of current optimized state.

        Args:
            location_type (str): The location type ("wholesaler", "state", "region") the results should be in.
            dataset_factory (DatasetFactory): A generator that returns `OptimizationInput` in batches.
            encodings (dict[str, Any]): Encodings to decode the values in dataset.
            return_dataframe (bool, optional): Return a dataframe instead of dict. Defaults to True.

        Returns:
            dict[str, dict[tuple[str], dict[str, float]]] | pd.DataFrame: The results of the current state of optimizer.
                If `return_dataframe is True`, a DataFrame is returned.
        """
        if location_type == "wholesaler":
            loc_encoding = encodings["wholesaler"]
        else:
            loc_encoding = {
                loc: i
                for i, loc in enumerate(dict.fromkeys(encodings[f"wholesaler_{location_type}_lookup"].values()).keys())
            }
        loc_lookup = np.array(get_lookups(loc_encoding))
        brand_lookup = np.array(get_lookups(encodings, "brand"))
        vehicle_lookup = np.array(get_lookups(encodings, "vehicle"))
        time_lookup = np.array(get_lookups(encodings, "date"))
        veh_dfs = []
        baseline_dfs = []
        df_axes = [location_type, "brand", "date", "signal"]
        batch: Type[OptimizationInput]
        for batch in dataset_factory():
            brands = brand_lookup[np.array(batch.brand_index)]
            locations = loc_lookup[np.array(batch.location_index)]
            vehicles = vehicle_lookup[np.array(batch.vehicle_index)]
            dates = time_lookup[np.array(batch.date_index)]

            output = self.simulate(batch)

            index = tuple(zip(locations, brands))
            index = [(*lb, t) for lb, t in product(index, dates)]
            veh_index = [(*lbt, v) for lbt, v in product(index, vehicles)]

            veh_impacts_df = get_other_rev_components(
                output.vehicle_impacts,
                batch,
                encodings["normalization_factor"],
                pd.MultiIndex.from_tuples(veh_index, names=df_axes),
            )
            total_impacts_df = get_other_rev_components(
                output.yhat,
                batch,
                encodings["normalization_factor"],
                pd.MultiIndex.from_tuples(index, names=df_axes[:-1]),
            )
            baseline_impacts_df = total_impacts_df - veh_impacts_df.groupby(df_axes[:-1], axis=0).sum()
            veh_impacts_df["amount"] = np.reshape(output.vehicle_spends, -1) * encodings["normalization_factor"]
            veh_impacts_df["cores"] = veh_impacts_df["maco"] - veh_impacts_df["amount"]

            veh_dfs.append(veh_impacts_df)
            baseline_dfs.append(baseline_impacts_df)

        vehicle_results_df = pd.concat(veh_dfs, axis=0)
        baseline_results_df = pd.concat(baseline_dfs, axis=0)
        add_col_level(baseline_results_df, "baseline", axis=0, levelname="signal")
        # we agg as location could be repeated since some optimizers may work in lower granularity
        # and location are directly mapped. agg can avoid this issue.
        if vehicle_results_df.index.duplicated().any():
            vehicle_results_df = vehicle_results_df.groupby(vehicle_results_df.index.names).sum()
            baseline_results_df = baseline_results_df.groupby(baseline_results_df.index.names).sum()

        if return_dataframe:
            return pd.concat([baseline_results_df, vehicle_results_df], axis=0).sort_index(axis=0)
        else:
            return {
                "vehicle_results": vehicle_results_df.T.to_dict(),
                "baseline_results": baseline_results_df.T.to_dict(),
            }

    def add_loss(self, name: str, loss: CalculatedMetric):
        """Add the following loss function for tracking.

        Args:
            name (str): Name of the loss function.
            loss (CalculatedMetric): The loss function that will be evaluated.
        """
        self._losses[name] = loss

    def add_metric(self, name: str, metric: CalculatedMetric):
        """Add the following metric function for tracking.

        Args:
            name (str): Name of the metric function.
            metric (CalculatedMetric): The metric function that will be evaluated.
        """
        self._metrics[name] = metric

    def all_losses(self, batch: Type[OptimizationInput]) -> dict[str, float]:
        """Returns a dict of all losses. This property can be overriden if you have a custom dict of losses you track.

        Args:
            batch (OptimizationInput): Optimization input that contains `vehicle_spends: VehicleSpendType` & others.

        Returns:
            dict[str, CalculatedMetric]: Dict of computed losses.
        """
        return {loss: value(batch) for loss, value in self._losses.items()}

    def all_metrics(self, batch: Type[OptimizationInput]) -> dict[str, float]:
        """Returns a dict of all metrics. This property can be overriden if you have a custom dict of metrics you track.

        Args:
            batch (OptimizationInput): Optimization input that contains `vehicle_spends: VehicleSpendType` & others.

        Returns:
            dict[str, CalculatedMetric]: Dict of computed metrics.
        """
        return {metric: value(batch) for metric, value in self._metrics.items()}

add_loss(name, loss)

Add the following loss function for tracking.

Parameters:

Name Type Description Default
name str

Name of the loss function.

required
loss CalculatedMetric

The loss function that will be evaluated.

required
Source code in wt_ml/optimizer/base/optimizer_base.py
365
366
367
368
369
370
371
372
def add_loss(self, name: str, loss: CalculatedMetric):
    """Add the following loss function for tracking.

    Args:
        name (str): Name of the loss function.
        loss (CalculatedMetric): The loss function that will be evaluated.
    """
    self._losses[name] = loss

add_metric(name, metric)

Add the following metric function for tracking.

Parameters:

Name Type Description Default
name str

Name of the metric function.

required
metric CalculatedMetric

The metric function that will be evaluated.

required
Source code in wt_ml/optimizer/base/optimizer_base.py
374
375
376
377
378
379
380
381
def add_metric(self, name: str, metric: CalculatedMetric):
    """Add the following metric function for tracking.

    Args:
        name (str): Name of the metric function.
        metric (CalculatedMetric): The metric function that will be evaluated.
    """
    self._metrics[name] = metric

all_losses(batch)

Returns a dict of all losses. This property can be overriden if you have a custom dict of losses you track.

Parameters:

Name Type Description Default
batch OptimizationInput

Optimization input that contains vehicle_spends: VehicleSpendType & others.

required

Returns:

Type Description
dict[str, float]

dict[str, CalculatedMetric]: Dict of computed losses.

Source code in wt_ml/optimizer/base/optimizer_base.py
383
384
385
386
387
388
389
390
391
392
def all_losses(self, batch: Type[OptimizationInput]) -> dict[str, float]:
    """Returns a dict of all losses. This property can be overriden if you have a custom dict of losses you track.

    Args:
        batch (OptimizationInput): Optimization input that contains `vehicle_spends: VehicleSpendType` & others.

    Returns:
        dict[str, CalculatedMetric]: Dict of computed losses.
    """
    return {loss: value(batch) for loss, value in self._losses.items()}

all_metrics(batch)

Returns a dict of all metrics. This property can be overriden if you have a custom dict of metrics you track.

Parameters:

Name Type Description Default
batch OptimizationInput

Optimization input that contains vehicle_spends: VehicleSpendType & others.

required

Returns:

Type Description
dict[str, float]

dict[str, CalculatedMetric]: Dict of computed metrics.

Source code in wt_ml/optimizer/base/optimizer_base.py
394
395
396
397
398
399
400
401
402
403
def all_metrics(self, batch: Type[OptimizationInput]) -> dict[str, float]:
    """Returns a dict of all metrics. This property can be overriden if you have a custom dict of metrics you track.

    Args:
        batch (OptimizationInput): Optimization input that contains `vehicle_spends: VehicleSpendType` & others.

    Returns:
        dict[str, CalculatedMetric]: Dict of computed metrics.
    """
    return {metric: value(batch) for metric, value in self._metrics.items()}

constraints()

Returns list of constraints used for optimization.

Source code in wt_ml/optimizer/base/optimizer_base.py
223
224
225
@abstractproperty
def constraints(self) -> Constraints:
    """Returns list of constraints used for optimization."""

create_result(location_type, dataset_factory, encodings, return_dataframe=True)

Returns results of current optimized state.

Parameters:

Name Type Description Default
location_type str

The location type ("wholesaler", "state", "region") the results should be in.

required
dataset_factory DatasetFactory

A generator that returns OptimizationInput in batches.

required
encodings dict[str, Any]

Encodings to decode the values in dataset.

required
return_dataframe bool

Return a dataframe instead of dict. Defaults to True.

True

Returns:

Type Description
dict[str, dict[tuple[str, ...], dict[str, float]]] | DataFrame

dict[str, dict[tuple[str], dict[str, float]]] | pd.DataFrame: The results of the current state of optimizer. If return_dataframe is True, a DataFrame is returned.

Source code in wt_ml/optimizer/base/optimizer_base.py
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
@final
def create_result(
    self,
    location_type: Literal[LOCATION_TYPES],
    dataset_factory: DatasetFactory,
    encodings: dict[str, Any],
    return_dataframe: bool = True,
) -> dict[str, dict[tuple[str, ...], dict[str, float]]] | pd.DataFrame:
    """
    Returns results of current optimized state.

    Args:
        location_type (str): The location type ("wholesaler", "state", "region") the results should be in.
        dataset_factory (DatasetFactory): A generator that returns `OptimizationInput` in batches.
        encodings (dict[str, Any]): Encodings to decode the values in dataset.
        return_dataframe (bool, optional): Return a dataframe instead of dict. Defaults to True.

    Returns:
        dict[str, dict[tuple[str], dict[str, float]]] | pd.DataFrame: The results of the current state of optimizer.
            If `return_dataframe is True`, a DataFrame is returned.
    """
    if location_type == "wholesaler":
        loc_encoding = encodings["wholesaler"]
    else:
        loc_encoding = {
            loc: i
            for i, loc in enumerate(dict.fromkeys(encodings[f"wholesaler_{location_type}_lookup"].values()).keys())
        }
    loc_lookup = np.array(get_lookups(loc_encoding))
    brand_lookup = np.array(get_lookups(encodings, "brand"))
    vehicle_lookup = np.array(get_lookups(encodings, "vehicle"))
    time_lookup = np.array(get_lookups(encodings, "date"))
    veh_dfs = []
    baseline_dfs = []
    df_axes = [location_type, "brand", "date", "signal"]
    batch: Type[OptimizationInput]
    for batch in dataset_factory():
        brands = brand_lookup[np.array(batch.brand_index)]
        locations = loc_lookup[np.array(batch.location_index)]
        vehicles = vehicle_lookup[np.array(batch.vehicle_index)]
        dates = time_lookup[np.array(batch.date_index)]

        output = self.simulate(batch)

        index = tuple(zip(locations, brands))
        index = [(*lb, t) for lb, t in product(index, dates)]
        veh_index = [(*lbt, v) for lbt, v in product(index, vehicles)]

        veh_impacts_df = get_other_rev_components(
            output.vehicle_impacts,
            batch,
            encodings["normalization_factor"],
            pd.MultiIndex.from_tuples(veh_index, names=df_axes),
        )
        total_impacts_df = get_other_rev_components(
            output.yhat,
            batch,
            encodings["normalization_factor"],
            pd.MultiIndex.from_tuples(index, names=df_axes[:-1]),
        )
        baseline_impacts_df = total_impacts_df - veh_impacts_df.groupby(df_axes[:-1], axis=0).sum()
        veh_impacts_df["amount"] = np.reshape(output.vehicle_spends, -1) * encodings["normalization_factor"]
        veh_impacts_df["cores"] = veh_impacts_df["maco"] - veh_impacts_df["amount"]

        veh_dfs.append(veh_impacts_df)
        baseline_dfs.append(baseline_impacts_df)

    vehicle_results_df = pd.concat(veh_dfs, axis=0)
    baseline_results_df = pd.concat(baseline_dfs, axis=0)
    add_col_level(baseline_results_df, "baseline", axis=0, levelname="signal")
    # we agg as location could be repeated since some optimizers may work in lower granularity
    # and location are directly mapped. agg can avoid this issue.
    if vehicle_results_df.index.duplicated().any():
        vehicle_results_df = vehicle_results_df.groupby(vehicle_results_df.index.names).sum()
        baseline_results_df = baseline_results_df.groupby(baseline_results_df.index.names).sum()

    if return_dataframe:
        return pd.concat([baseline_results_df, vehicle_results_df], axis=0).sort_index(axis=0)
    else:
        return {
            "vehicle_results": vehicle_results_df.T.to_dict(),
            "baseline_results": baseline_results_df.T.to_dict(),
        }

get_constraints()

Function to apply and gather all the constraints.

Returns:

Type Description
ndarray

np.ndarray: Stacked constraints applied on vehicle_spends.

Source code in wt_ml/optimizer/base/optimizer_base.py
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
def get_constraints(self) -> np.ndarray:
    """Function to apply and gather all the constraints.

    Returns:
        np.ndarray: Stacked constraints applied on vehicle_spends.
    """
    if len(self.constraints) == 0:
        return np.array([0])
    gathers = []
    for constraint in self.constraints:
        gathered = self.vehicle_spends
        for axis_name, indices in constraint.gathers:
            axis = self.AXIS_MAPPING[axis_name]
            gathered = np.take(gathered, indices, axis=axis)
        constrained = np.sum(gathered) - constraint.max_value
        if not constraint.negate:
            constrained = -constrained
        gathers.append(constrained)
    return np.stack(gathers, axis=0)

hyperparameters()

Hyperparameters used.

Source code in wt_ml/optimizer/base/optimizer_base.py
227
228
229
def hyperparameters(self) -> HyperparameterConfig:
    """Hyperparameters used."""
    return self._hyperparameters

optimize(dataset_factory, epochs, **_kwargs) abstractmethod

Optimize for the given number of epochs.

Parameters:

Name Type Description Default
dataset_factory DatasetFactory

A generator that returns OptimizationInput in batches.

required
epochs int

Number of epochs to optimize.

required
Source code in wt_ml/optimizer/base/optimizer_base.py
252
253
254
255
256
257
258
259
@abstractmethod
def optimize(self, dataset_factory: DatasetFactory, epochs: int, **_kwargs):  # noqa: U100
    """Optimize for the given number of epochs.

    Args:
        dataset_factory (DatasetFactory): A generator that returns `OptimizationInput` in batches.
        epochs (int): Number of epochs to optimize.
    """

simulate(batch) abstractmethod

For the given batch input simulate the impacts received.

Parameters:

Name Type Description Default
batch OptimizationInput

Optimization input that contains vehicle_spends: VehicleSpendType & others.

required

Returns:

Type Description
Type[OptimizationOutput]

Type[OptimizationOutput]: The impacts for the given investment amounts.

Source code in wt_ml/optimizer/base/optimizer_base.py
238
239
240
241
242
243
244
245
246
247
@abstractmethod
def simulate(self, batch: Type[OptimizationInput]) -> Type[OptimizationOutput]:  # noqa: U100
    """For the given `batch` input simulate the impacts received.

    Args:
        batch (OptimizationInput): Optimization input that contains `vehicle_spends: VehicleSpendType` & others.

    Returns:
        Type[OptimizationOutput]: The impacts for the given investment amounts.
    """

vehicle_spends()

The vehicle investments variable which is being optimized. Investment amounts for each batch (location*product), time and vehicle.

Source code in wt_ml/optimizer/base/optimizer_base.py
231
232
233
234
235
236
@abstractproperty
def vehicle_spends(self) -> VehicleSpendType:
    """
    The vehicle investments variable which is being optimized.
    Investment amounts for each batch (location*product), time and vehicle.
    """

OptimizationInput dataclass

Bases: ABC

This is an abstract OptimizationInput class that is mainly used for typing. It is not mandatory that your Input must be a dataclass. It can be a NamedTuple or any other class that resembles a dataclass. Only ensure that the following attributes below exists.

Source code in wt_ml/optimizer/base/optimizer_base.py
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
@dataclass
class OptimizationInput(ABC):
    """
    This is an abstract OptimizationInput class that is mainly used for typing.
    It is not mandatory that your Input must be a dataclass. It can be a NamedTuple or any other class that resembles a
    dataclass.
    Only ensure that the following attributes below exists.
    """

    vehicle_spends: VehicleSpendType
    price: Annotated[TensorLike, TensorMeta((Batch, Time), np.float32)]
    price_normalization: Annotated[TensorLike, TensorMeta((Batch,), np.float32)]
    maco_cost: Annotated[TensorLike, TensorMeta((Batch, Time), np.float32)]
    vehicle_index: Annotated[TensorLike, TensorMeta((Vehicle,), np.int32)]
    brand_index: Annotated[TensorLike, TensorMeta((Batch,), np.int32)]
    location_index: Annotated[TensorLike, TensorMeta((Batch,), np.int32)]
    date_index: Annotated[TensorLike, TensorMeta((Time,), np.int32)]
    ...

OptimizationOutput dataclass

Bases: ABC

This is an abstract OptimizationOutput class that is mainly used for typing. It is not mandatory that your Output must be a dataclass. It can be a NamedTuple or any other class that resembles a dataclass. Only ensure that the following attributes below exists.

Source code in wt_ml/optimizer/base/optimizer_base.py
116
117
118
119
120
121
122
123
124
125
126
127
128
@dataclass
class OptimizationOutput(ABC):
    """
    This is an abstract OptimizationOutput class that is mainly used for typing.
    It is not mandatory that your Output must be a dataclass. It can be a NamedTuple or any other class that resembles a
    dataclass.
    Only ensure that the following attributes below exists.
    """

    vehicle_spends: VehicleSpendType
    vehicle_impacts: ImpactsType
    yhat: YHatType
    ...