Data bars

Limen currently supports threshold-based bar formation over existing kline data. This optional manifest preprocessing step starts from regular time-based klines, then aggregates consecutive rows until a volume, trade-count, or liquidity threshold is reached.

Use bars when time-based candles do not match the studied behavior. Skip bars when fixed-interval klines already match the research rhythm.

Prerequisites

a kline-like Polars frame with the columns required by the chosen bar type
vaquum-limen[data] when the source frame comes from HistoricalData
a declared research reason for replacing fixed-time rows with activity-threshold rows

Current scope

The implemented bar surface today is:

volume_bars
trade_bars
liquidity_bars

Limen does not currently expose imbalance bars, run bars, or tick bars as a supported public surface.

When bars help

Bars help when each row should represent a comparable amount of market activity instead of a fixed amount of clock time.

Suitable fits:

high-volatility periods where fixed-time candles contain very uneven activity
strategies that depend more on activity intensity than wall-clock spacing
research where volume or liquidity concentration matters more than elapsed time

Unsuitable fits:

experiments where explicit time-of-day structure matters
workflows that depend on regular calendar spacing
cases where the added aggregation step makes the experiment harder to interpret than the base klines

Shared output schema

All supported bar functions return a pl.DataFrame with this shared schema:

Column	Meaning
`datetime`	start time of the aggregated bar
`open`, `high`, `low`, `close`	OHLC values of the aggregated bar
`volume`	cumulative volume inside the bar
`no_of_trades`	cumulative trade count inside the bar
`liquidity_sum`	cumulative liquidity inside the bar
`maker_ratio`	trade-count-weighted maker ratio
`maker_volume`	cumulative maker volume
`maker_liquidity`	cumulative maker liquidity
`mean`	trade-count-weighted mean price
`bar_count`	number of source klines merged into the bar
`base_interval`	source kline interval in seconds

The source dataframe must already contain the columns needed to compute the chosen bar type. Kline-style input must include fields such as volume, no_of_trades, and liquidity_sum.

Supported functions

`volume_bars(data, volume_threshold)`

Aggregate rows until cumulative volume reaches volume_threshold.

`trade_bars(data, trade_threshold)`

Aggregate rows until cumulative trade count reaches trade_threshold.

`liquidity_bars(data, liquidity_threshold)`

Aggregate rows until cumulative liquidity reaches liquidity_threshold.

Manifest usage

Bar formation is configured through Manifest.set_bar_formation() and is applied separately inside each split. That matters because Limen's manifest pipeline is split-first by design.

from limen.data import HistoricalData
from limen.data.utils import compute_data_bars
from limen.experiment import MLManifest

def params():
    return {
        'bar_type': ['base', 'volume', 'trade'],
        'volume_threshold': [50_000, 100_000],
        'trade_threshold': [2_000, 5_000],
    }

def manifest():
    return (
        MLManifest()
        .set_data_source(
            method=HistoricalData.get_spot_klines,
            params={'kline_size': 3600, 'start_date_limit': '2025-01-01'},
        )
        .set_test_data_source(
            method=HistoricalData.get_spot_klines,
            params={'kline_size': 7200, 'row_count_limit': 5000},
        )
        .set_bar_formation(
            compute_data_bars,
            bar_type='bar_type',
            volume_threshold='volume_threshold',
            trade_threshold='trade_threshold',
        )
        .set_required_bar_columns([
            'datetime',
            'open',
            'high',
            'low',
            'close',
            'volume',
            'no_of_trades',
            'liquidity_sum',
        ])
    )

Two important details:

bar_type must be present in round_params when the bar step switches between bar modes.
set_required_bar_columns() is an assertion layer. It verifies that the bar step still leaves the downstream columns required by the experiment.

`compute_data_bars()`

limen.data.utils.compute_data_bars() is Limen's convenience router for manifest-driven bar selection.

It currently supports these bar_type values:

base
trade
volume
liquidity

base returns the input data unchanged. The other values dispatch to the corresponding threshold-bar function and require the matching threshold parameter. Unknown bar_type values raise ValueError; they are not silently treated as base.

How bars fit into the manifest pipeline

Inside a manifest-driven experiment, the order is:

fetch raw input data
optionally apply a pre-split selector
split into train, validation, and test
apply bar formation inside each split
run indicators, features, targets, and scaling on the resulting bars

This keeps train-only fitting and test-only evaluation aligned with the actual post-bar data seen by each fold.

Prerequisites​

Current scope​

When bars help​

Shared output schema​

Supported functions​

volume_bars(data, volume_threshold)​

trade_bars(data, trade_threshold)​

liquidity_bars(data, liquidity_threshold)​

Manifest usage​

compute_data_bars()​

How bars fit into the manifest pipeline​

Read next​