Reference Architecture
The reference architecture is Limen's class-based model layer. It sits underneath the foundational SFDs and underneath Trainer.
This is the page to read when you want to understand:
- what a
ReferenceModelmust implement - how the built-in model classes behave
- what
evaluate(..., inline_metrics=True)really adds - how class-based models relate to the simpler function wrappers used in manifests
Public Surface
The current public reference-architecture exports are:
ReferenceModelLogRegBinaryRandomBinaryXGBoostRegressorTabPFNBinarywhentabpfnis installed
Each model module also exposes a function-style wrapper with the same behavioral surface used by foundational manifests.
ReferenceModel
ReferenceModel is the base contract. Every subclass must implement:
train(data, **params)
predict(data)
evaluate(data, inline_metrics=True)
Data expectations
In a live local reference-architecture run in this repo, the prepared data_dict included:
x_train,y_trainx_val,y_valx_test,y_testprice_data_for_backtest_feature_names_alignment_scaler
Not every model needs every key, but this is the standard Limen shape that the class-based models are designed around.
The Built-In Model Classes
| Class | Task shape | Deterministic | Notes |
|---|---|---|---|
LogRegBinary | binary classification | yes | sklearn logistic regression wrapper; manifest wrapper exposes constructor params |
RandomBinary | binary baseline | no | intentionally stochastic |
XGBoostRegressor | regression | no | requires xgboost |
TabPFNBinary | binary classification | no | optional, requires tabpfn |
RuleBasedStrategy | rule-based long/flat | yes | no training step; boolean predicate logic |
The deterministic flag matters because Trainer uses it to choose its validation tolerance.
Probability Support for Cohort
For Cohort, “probability” always means P(1): the probability that the positive class is 1.
Architectures that expose valid P(1) may use Cohort's probability-weighted aggregation path. Architectures that do not expose valid P(1) use Cohort's majority-vote fallback path instead.
| Architecture | Returns probabilities P(1) | Cohort mode | Notes |
|---|---|---|---|
LogRegBinary | yes | probability | predict() returns _probs = predict_proba(... )[:, 1], which is directly the class-1 probability P(1). |
RandomBinary | yes | probability | predict() returns _probs, but they are synthetic confidence values (0.9 for predicted 1, 0.1 for predicted 0), not model-derived calibrated probabilities. Still usable as P(1)-shaped output if Cohort accepts implementation-defined probability-like outputs. |
TabPFNBinary | yes | probability | predict() returns _probs as positive-class probability. When a CalibrationConfig is configured, probabilities are optionally recalibrated and the threshold optimised before _preds are produced. This is compatible with P(1). |
XGBoostRegressor | no | fallback | predict() returns only _preds and does not expose _probs. Since this is a regressor, any Cohort use would have to fall back unless a separate binary-probability wrapper is introduced. |
predict() Versus evaluate()
predict() is the small inference surface. For the built-in binary models it always returns:
_preds_probs
When a CalibrationConfig is configured on the manifest and injected into the architecture, predict() additionally returns:
optimal_threshold— the threshold chosen by the optimizer (or0.5when only probability calibration is configured)val_score— the metric score at that threshold;Nonewhen no threshold function is set
evaluate() passes these keys through into the results dict alongside the standard binary metrics.
evaluate() is the richer offline evaluation surface.
inline_metrics=False
With inline_metrics=False, evaluate() returns the task metrics only.
On a live local LogRegBinary evaluation in this repo, that plain result included:
accuracyaucprecisionrecallfpr
inline_metrics=True
With inline_metrics=True, evaluate() adds:
confusion_*metricsbacktest_*metrics whenprice_data_for_backtestis present
On that same live local run, LogRegBinary.evaluate(..., inline_metrics=True) added keys such as:
backtest_edge_per_signal_bps_p50backtest_trade_pnl_net_bps_p50backtest_cvar_95_return_bpsconfusion_tpconfusion_fpconfusion_precision
That is why the reference-architecture layer is the bridge between raw model output and the experiment-level analytics surfaces.
Example: Class-Based Usage
from limen.sfd.reference_architecture import LogRegBinary
model = LogRegBinary().train(
data,
solver='lbfgs',
penalty='l2',
C=0.1,
class_weight=0.55,
max_iter=60,
)
pred = model.predict({'x_test': data['x_test']})
results = model.evaluate(data, inline_metrics=True)
This is the same contract that Trainer eventually relies on when it promotes finished experiment rounds into Sensor objects.
Function Wrappers Versus Classes
Most foundational manifests call the function wrapper:
.with_reference_architecture(logreg_binary)
That wrapper typically:
- instantiates the matching class
- trains it
- evaluates it with
inline_metrics=True
The class is the canonical reusable architecture surface. The function wrapper is the convenient manifest-facing adapter.
Trainer Relationship
Trainer resolves the ReferenceModel subclass from the model module used by the original manifest.
That is why the class-based layer matters even if your day-to-day work mostly touches foundational SFDs:
- foundational SFDs package the experiment
- reference architecture owns the model contract
Trainerpromotes selected rounds back into trained class-based models
On a live local logreg trainer run in this repo:
- deterministic validation passed with no mismatches
Sensor.predict()returned_predsand_probs- the promoted sensor produced predictions for
884test bars
On a live local random_binary trainer run, promotion raised ReconstructionError because the stochastic rerun did not reproduce the original logged metrics closely enough.
RuleBasedStrategy
RuleBasedStrategy is the reference architecture for rule-based SFDs. It differs from the ML models in two important ways:
train()is a no-op — rule-based strategies have no learnable parametersevaluate()runs across all three splits (train/val/test) and returns cross-split stability metrics in addition to per-split backtest metrics
It expects a data_dict produced by a manifest configured with with_strategy():
{
'train': pl.DataFrame, # with pre-computed boolean predicate columns
'val': pl.DataFrame,
'test': pl.DataFrame,
'strategy': {'conditions': [...], 'entry': 'entry_id'},
}
The strategy walks the boolean logic tree defined in strategy['conditions'], resolves each leaf condition by reading its pre-computed boolean column from the split DataFrame, and folds compound conditions via AND/OR/NOT. Positions are 0 (flat) or 1 (long) per bar.
Metrics produced
evaluate() returns a flat dict with three tiers:
- Tier 1 — position stats:
num_trades_{split},position_rate_{split} - Tier 2 — per-split backtest: all
backtest_snapshotoutput columns suffixed with_{split}(e.g.trade_pnl_net_bps_p50_train,drawdown_depth_bps_p5_test) - Tier 3 — cross-split diagnostics:
drawdown_std_bps,is_stable
is_stable is False until a replacement stability rule is defined against the new decoder-level ledger.
NOTE: Tier 2 and Tier 3 metrics require open and close columns in the split DataFrames to run the backtest. When those columns are absent, per-split backtest results are empty and drawdown_std_bps is returned as None with is_stable falling back to False.
_preds is present; _probs is intentionally absent (not applicable to rule-based strategies).
Optional Dependencies
xgboost_regressorrequiresxgboosttabpfn_binaryrequirestabpfn
In a live local smoke pass in this repo:
logreg_binary,random_binary, andxgboost_regressorall rantabpfn_binarywas unavailable becausetabpfnwas not installed
Read Next
- Continue to Built-In SFDs to see how the shipped foundational SFDs package these model surfaces.
- Continue to Trainer for the promotion workflow that reconstructs and retrains selected rounds.
- Continue to Standard Metrics Library for the low-level metric helpers used inside these model classes.