Log
Log is Limen's post-run analysis layer. It sits on top of a finished experiment and turns raw round results into round-level prediction tables, benchmark-style summaries, backtest summaries, and parameter-correlation views.
In most workflows you do not instantiate it yourself. UniversalExperimentLoop creates uel._log automatically at the end of a successful run and also exposes the most-used derived tables directly on the uel object.
Two Ways To Use Log
UEL-backed Log
This is the normal path.
uel = limen.UniversalExperimentLoop(...)
uel.run(...)
log = uel._log
This mode has access to:
- the experiment dataframe
- the round parameters
- stored predictions
- alignment metadata
- prep logic needed to reconstruct per-round test windows
That is why the full post-run surface works from a UEL-backed Log.
File-backed Log
You can also load a CSV log from disk:
import limen
log = limen.Log(file_path='my_experiment.csv')
This path is useful when you mainly want the cleaned experiment log itself, or experiment-log-only analysis such as parameter correlation.
Important limitation:
- file-backed
Logdoes not havedata,prep,preds, or_alignment - methods that reconstruct per-round predictions or test windows require a UEL-backed
Log
The Main Post-Run Workflow
The most common sequence is:
- inspect one round's prediction table
- compare rounds with benchmark summaries
- compare rounds with backtest summaries
- inspect which parameters move with your target metric
log = uel._log
round0 = log.permutation_prediction_performance(round_id=0)
benchmark = log.experiment_confusion_metrics('price_change')
backtest = log.experiment_backtest_results()
correlation = log.experiment_parameter_correlation('auc', min_n=10)
permutation_prediction_performance(round_id)
This is the most concrete place to start. It reconstructs a single round's test-period table and joins:
- model predictions
- actual outcomes
- hit/miss flags
- aligned price data
perf = uel._log.permutation_prediction_performance(round_id=0)
The resulting table has these columns:
predictionsactualshitmissopencloseprice_change
On a live local run in this repo, the table for one round contained 218 test rows with exactly that schema.
Use this table when you want to understand a round before jumping to summary statistics. It is also the direct input to Limen's snapshot backtest.
Benchmark Surfaces
The benchmark layer answers: is the signal making useful directional calls before we translate those calls into trading results?
experiment_confusion_metrics(x, disable_progress_bar=False)
Produces one row per round.
bench = uel._log.experiment_confusion_metrics('price_change')
This table combines:
- positive-rate diagnostics
- precision and recall
- TP and FP counts
- mean and median of
xwithin TP and FP - TP-versus-FP separation through Cohen's d and KS
The same summary is exposed directly on UEL as:
uel.experiment_confusion_metrics
because UEL computes:
uel._log.experiment_confusion_metrics('price_change')
automatically at the end of the run.
permutation_confusion_metrics(x, round_id, ...)
Produces the same style of summary for one specific round.
round0_conf = uel._log.permutation_confusion_metrics(
x='price_change',
round_id=0,
)
This is the right view when a round looks interesting and you want to inspect its benchmark behavior in isolation.
Reading the benchmark table
Good questions to ask:
- is
precision_pcthigh because the signal is selective, or because it barely predicts positives? - does
recall_pctstay useful, or is the model missing most real positives? - are
tp_x_meanandtp_x_medianmaterially better thanfp_x_meanandfp_x_median? - is
tp_fp_cohen_dstable enough to suggest real separation rather than noise?
This is exactly why benchmark and backtest are separate in Limen: a round can look statistically interesting before it proves itself economically.
When the confusion table includes:
tp_mean_return_pctfp_mean_return_pcttn_mean_return_pctfn_mean_return_pct
those four fields use the same immediate-next-execution-row contract as snapshot backtests for completed-bar pipelines. They are not same-row feature-bar returns.
Backtest Surface
experiment_backtest_results(disable_progress_bar=False)
Produces one snapshot backtest row per experiment round.
bt = uel._log.experiment_backtest_results()
The same table is exposed directly on UEL as:
uel.experiment_backtest_results
The current summary columns are the 22 decoder-level backtest ledger fields:
edge_per_signal_bps_p5,edge_per_signal_bps_p50,edge_per_signal_bps_p95trade_pnl_net_bps_p5,trade_pnl_net_bps_p50,trade_pnl_net_bps_p95cost_drag_bps_p5,cost_drag_bps_p50,cost_drag_bps_p95rolling_return_net_bps_p5,rolling_return_net_bps_p50,rolling_return_net_bps_p95return_on_exposure_p5,return_on_exposure_p50,return_on_exposure_p95drawdown_depth_bps_p5,drawdown_depth_bps_p50,drawdown_depth_bps_p95drawdown_duration_days_p5,drawdown_duration_days_p50,drawdown_duration_days_p95cvar_95_return_bps
Use this table to compare the trading-economics side of rounds after you have already inspected the benchmark layer.
Post-run snapshot backtests currently support:
- binary
0/1predictions directly - directional regression scores via sign (
pred > 0-> long, otherwise flat)
Logged multiclass outputs are not supported on this surface and raise explicitly instead of being silently collapsed.
Parameter Correlation Surface
experiment_parameter_correlation(metric, ...)
This method looks for robust relationships between experiment parameters and a chosen metric across explicit cohorts.
corr = uel.experiment_parameter_correlation(
'auc',
min_n=10,
)
The result is a dataframe indexed by:
cohort_pctfeature
with columns:
n_rowscorrcorr_medci_loci_hisign_stability
This is most useful after a run is large enough to support meaningful cohorts. On tiny runs, the output is still legal but usually too unstable to interpret with confidence.
Persisting Predictions And Complex Artifacts
If you want Log to have enough information for round-level reconstruction:
- store test predictions as
round_results['_preds'] - keep prep deterministic with respect to
round_params - use
prep_each_round=Truewhen the prep stage depends on round parameters
If your model or prep returns large objects that should not be flattened into the experiment log, put them under:
round_results['extras'] = ...
UEL preserves those in uel.extras.
Determinism Matters
Log reconstructs test data by replaying the relevant prep path. That means non-deterministic prep logic can break alignment between:
- stored predictions
- reconstructed actuals
- reconstructed prices
For reliable post-run analysis:
- prefer deterministic prep
- if randomness is necessary, make it explicit in
round_params - use
prep_each_round=Truewhen round parameters affect preparation
read_from_file(file_path)
read_from_file() is the CSV-cleaning helper behind file-backed Log.
It:
- removes duplicated header rows that may appear in streamed CSV logs
- trims whitespace in object columns
- returns a cleaned pandas dataframe
Use it when you need to recover or inspect an experiment log outside a live UEL object.