Log

Log is Limen's post-run analysis layer. It sits on top of a finished experiment and turns raw round results into round-level prediction tables, benchmark-style summaries, backtest summaries, and parameter-correlation views.

For YAML CLI runs, start from the generated result directory and results.csv. For direct Python UEL runs, pass post_processing=True when you need uel._log, confusion metrics, and backtest summaries on the live object.

Prerequisites

a completed results.csv, or a successful direct UEL run
post_processing=True when analysis needs retained predictions, scalers, alignment, or the live uel._log
compatible price columns for price-derived confusion and backtest analysis

Two ways to use `Log`

UEL-backed `Log`

This is the direct Python analysis path.

import limen
from limen.data import HistoricalData

historical = HistoricalData()
data = historical.get_spot_klines(kline_size=7200, row_count_limit=2000)

uel = limen.UniversalExperimentLoop(data=data, sfd=limen.sfd.logreg_binary)
uel.run(
    experiment_name='logreg-first',
    n_permutations=4,
    prep_each_round=True,
    post_processing=True,
)

log = uel._log

This mode has access to:

the experiment dataframe
the round parameters
stored predictions
alignment metadata
prep logic needed to reconstruct per-round test windows

That is why the full post-run surface works from a UEL-backed Log.

File-backed `Log`

A CSV log can also be loaded from disk:

import limen

log = limen.Log(file_path='my_experiment.csv')

This path supports the cleaned experiment log itself and experiment-log-only analysis such as parameter correlation.

Important limitation:

file-backed Log does not have data, prep, preds, or _alignment
methods that reconstruct per-round predictions or test windows require a UEL-backed Log

The post-run workflow

The standard sequence is:

inspect one round's prediction table
compare rounds with benchmark summaries
compare rounds with backtest summaries
inspect which parameters move with the target metric

log = uel._log

round0 = log.permutation_prediction_performance(round_id=0)
benchmark = log.experiment_confusion_metrics('price_change')
backtest = log.experiment_backtest_results()
correlation = log.experiment_parameter_correlation('auc', min_n=10)

`permutation_prediction_performance(round_id)`

This method reconstructs a single round's test-period table and joins:

model predictions
actual outcomes
hit/miss flags
aligned price data

perf = uel._log.permutation_prediction_performance(round_id=0)

The resulting table has these columns:

datetime when the reconstructed price frame contains it
predictions
actuals
hit
miss
open
close
price_change

The row count follows the reconstructed test window; it is not a fixed contract.

Use this table for round-level inspection before summary statistics. It is also the direct input to Limen's snapshot backtest.

Benchmark surfaces

The benchmark layer measures directional signal quality before translation into trading results.

`experiment_confusion_metrics(x)`

Produces one row per round.

bench = uel._log.experiment_confusion_metrics('price_change')

This table combines:

positive-rate diagnostics
precision and recall
TP and FP counts
mean and median of x within TP and FP
TP-versus-FP separation through Cohen's d and KS

The same summary is exposed directly on UEL as:

uel.experiment_confusion_metrics

because UEL computes:

uel._log.experiment_confusion_metrics('price_change')

automatically at the end of the run.

`permutation_confusion_metrics(x, round_id)`

Produces the same style of summary for one specific round.

round0_conf = uel._log.permutation_confusion_metrics(
    x='price_change',
    round_id=0,
)

This view isolates benchmark behavior for one selected round.

Reading the benchmark table

Benchmark-table review should inspect whether high precision_pct comes from selectivity or low activity, whether recall_pct captures actual positives, whether tp_x_mean and tp_x_median exceed fp_x_mean and fp_x_median, and whether tp_fp_cohen_d indicates separation rather than noise.

Benchmark and backtest stay separate because statistical signal quality and trading economics can diverge.

When the confusion table includes:

tp_mean_return_pct
fp_mean_return_pct
tn_mean_return_pct
fn_mean_return_pct

those four fields use the same immediate-next-execution-row contract as snapshot backtests for completed-bar pipelines. They are not same-row feature-bar returns.

Backtest surface

`experiment_backtest_results()`

Produces one snapshot backtest row per experiment round.

bt = uel._log.experiment_backtest_results()

The same table is exposed directly on UEL as:

uel.experiment_backtest_results

The current summary columns are the 20 bar-based backtest ledger fields — every column is computed per bar over all bars in the window.

Per-bar distributions (p5 / p50 / p95):

edge_bps_* — gross per-bar return
pnl_bps_* — net per-bar return
cost_bps_* — per-bar cost (gross minus net)
drawdown_bps_* — net equity against its running peak

Intensive scalars:

wins_per_bar, pnl_per_bar_bps, avg_win_bps, avg_loss_bps, cvar_95_pnl_bps, trades_per_bar, inventory_per_bar, cost_per_bar_bps

Use this table to compare trading economics after benchmark inspection.

Post-run snapshot backtests currently support:

binary 0/1 predictions directly
directional regression scores via sign (pred > 0 -> long, otherwise flat)

Logged multiclass outputs are not supported on this surface and raise explicitly instead of being silently collapsed.

Parameter correlation surface

`experiment_parameter_correlation(metric)`

This method looks for robust relationships between experiment parameters and a chosen metric across explicit cohorts.

corr = uel.experiment_parameter_correlation(
    'auc',
    min_n=10,
)

The result is a dataframe indexed by:

cohort_pct
feature

The helper works over numeric columns that remain after cleaning. It drops constants and all-NaN columns. Framework bookkeeping columns (id, _id, _round_index, execution_time, _warnings) are excluded from feature correlation by default, so the review stays focused on experiment parameters, diagnostics, and metrics.

with columns:

n_rows
corr
corr_med
ci_lo
ci_hi
sign_stability

The default min_n=10 skips smaller cohorts. If every requested cohort is smaller than min_n, or cleaning leaves no usable numeric cohort, the method raises ValueError instead of returning an unstable table.

Analyzing perturbation impact

Every key in a round's round_params is copied onto that round's result row, so perturbation drivers become columns in experiment_log alongside the metrics. A perturbation sweep therefore leaves its own audit trail: scaler_type, feature_groups, any use_* toggle, feature_drop_count, feature_drop_seed, and the recorded _dropped_features list all appear as columns.

Because experiment_log is a pandas DataFrame, the simplest analysis is direct filtering and grouping:

robust_only = log.experiment_log[log.experiment_log['scaler_type'] == 'robust']
by_groups = log.experiment_log.groupby('feature_groups')['auc'].mean()

experiment_parameter_correlation coerces every column to numeric, so string and list columns — scaler_type, feature_groups, and the list-valued _dropped_features — become NaN and drop out. Only numeric perturbation columns survive; boolean use_* flags survive because they coerce to 0/1 and group naturally.

To bring a categorical perturbation into the correlation, one-hot it when constructing the Log:

log = limen.Log(file_path='my_experiment.csv', cols_to_multilabel=['scaler_type'])

That replaces scaler_type with scaler_type_robust, scaler_type_logreg, … 0/1 columns (and casts bool columns to int), which then reach experiment_parameter_correlation. It cannot expand the list-valued _dropped_features; deriving feature importance from ablation needs a per-feature boolean membership column, covered in Perturbation Strategies.

Persisting predictions and complex artifacts

Log needs these fields for round-level reconstruction:

store test predictions as round_results['_preds']
keep prep deterministic with respect to round_params
use prep_each_round=True when the prep stage depends on round parameters

Large model or prep objects that should not be flattened into the experiment log belong under:

round_results['extras'] = {'model': fitted_model}

UEL preserves those in uel.extras.

Determinism matters

Log reconstructs test data by replaying the relevant prep path. That means non-deterministic prep logic can break alignment between:

stored predictions
reconstructed actuals
reconstructed prices

For reliable post-run analysis:

prefer deterministic prep
if randomness is necessary, make it explicit in round_params
use prep_each_round=True when round parameters affect preparation

`read_from_file(file_path)`

read_from_file() is the CSV-cleaning helper behind file-backed Log.

It:

removes duplicated header rows that may appear in streamed CSV logs
trims whitespace in object columns
returns a cleaned pandas dataframe

Use read_from_file() to recover or inspect an experiment log outside a live UEL object.

Prerequisites​

Two ways to use Log​

UEL-backed Log​

File-backed Log​

The post-run workflow​

permutation_prediction_performance(round_id)​

Benchmark surfaces​

experiment_confusion_metrics(x)​

permutation_confusion_metrics(x, round_id)​

Reading the benchmark table​

Backtest surface​

experiment_backtest_results()​

Parameter correlation surface​

experiment_parameter_correlation(metric)​

Analyzing perturbation impact​

Persisting predictions and complex artifacts​

Determinism matters​

read_from_file(file_path)​

Read next​