Welcome to the developer guide for data validator module. 1. The data validator has 3 important elements: - data loader: - load_data function is the central point of data_loader. - It takes in parquet, json and hdf5 directories as input and generates the necessary inputs for checks. - The return type of load_data function should always be DataLoader. - It makes all the 2D dataframes and renames the columns and sorts the dataframes for data consistency. - It returns encodings, date index, parquet, json and hdf5 paths to be used incase the check does not rely on dataframes. - checks: - All the checks go inside checks directory as scripts. - Further details on how to add a check can be referred inside the init.py of checks directory - export: - This defines the supported exporting formats for the reports. - Currently, 3 formats are supported for exporting: html, markdown and excel. 2. In order to run data validator, use the following command in the terminal: python -m wt_ml.dataset.data_validator.data_validator <mode> <export_type>

Here,
    <mode> can be anyone of 'dbg' or 'full'.
          'full' is the default behavior if nothing is passed.
    <export_type> can be anyone of 'html', 'markdown', 'excel' or 'all'.
                 'excel' is the default behavior if nothing is passed.
  1. All the reports are published inside the results directory of wt_ml repository.