Skip to main content

limen.data

Fetch raw market data, form optional bars, and hand experiment-ready splits to the rest of Limen.

Canonical docs

What this package owns

Owns raw data access, optional threshold-bar formation, train/validation/test splitting, and the helpers that turn raw frames into the data_dict schema used by model code. Does not own indicators, higher-level features, manifests, or model training.

Key entry points

Entry pointUse it whenNotes
HistoricalDataYou need file-backed BTCUSDT spot klines or raw file ingestionThe main public class exported by limen.data
compute_data_bars()You want to aggregate kline rows into threshold bars before feature engineeringUsed by manifests through set_bar_formation()
split_sequential()You need ordered train/validation/test windowsUsed by manifest-driven prep
split_data_to_prep_output()You need the standard data_dict structureConverts split frames into model-ready keys like x_train and y_test

Adjacent modules

  • limen.experiment consumes this package through manifests and the Universal Experiment Loop.
  • limen.indicators and limen.features run after data retrieval and optional bar formation.
  • limen.utils.data_dict_to_numpy is commonly used one layer downstream inside model functions.

Quick orientation

data/
├── historical_data.py # HistoricalData class
├── _internal/
│ └── binance_file_to_polars.py # Binance archive download and parsing
├── bars/
│ └── standard_bars.py # Threshold bar implementation
└── utils/
├── compute_data_bars.py # Public bar-formation entry point
├── splits.py # Train/validation/test split helpers
└── random_slice.py # Random window slicing helper

Things to know

  • HistoricalData is stateful. Each get_* call mutates self.data and self.data_columns, and also returns the resulting pl.DataFrame.
  • get_spot_klines() reads the Hugging Face BTCUSDT 1-minute dataset by default.
  • get_binance_file() normalizes millisecond timestamps automatically when the source file stores them as large integers.
  • get_any_file() is the generic loader for local paths and URLs.