Standard Metrics Library

Limen's metrics layer provides the low-level evaluation helpers used inside SFD model functions and reference architectures.

These helpers are intentionally small. They compute the core task metrics and return plain dictionaries. Higher-level experiment analytics such as benchmark summaries and backtests live in Log, not here.

Public Surface

The current public metrics exports are:

binary_metrics
multiclass_metrics
continuous_metrics
balanced_metric
safe_ovr_auc

Import Pattern

The safest import style is to import the callable from its submodule:

from limen.metrics.binary_metrics import binary_metrics
from limen.metrics.multiclass_metrics import multiclass_metrics
from limen.metrics.continuous_metrics import continuous_metrics
from limen.metrics.balanced_metric import balanced_metric
from limen.metrics.safe_ovr_auc import safe_ovr_auc

Why this matters:

balanced_metric is re-exported as a callable
limen.metrics re-exports binary_metrics, multiclass_metrics, continuous_metrics, and safe_ovr_auc as modules

The package-root import returns the module, not the function:

from limen.metrics import binary_metrics

This low-level metrics layer also sits underneath Reference Architecture. The class-based models add confusion and backtest fields later; these helpers stay at the smaller task-metric level.

`binary_metrics(data, preds, probs)`

Computes the standard binary metrics used by Limen's binary reference models.

Expected inputs:

data['y_test']
predicted labels preds
positive-class probabilities probs

Returns a dictionary with:

recall
precision
fpr
auc
accuracy

Example:

results = binary_metrics(data, preds, probs)
results['_preds'] = preds

Edge cases

binary_metrics() handles degenerate binary folds without raising. It returns NaN for auc when y_test contains one class and NaN for fpr when there are no negative examples. Precision and recall use zero_division=0.

`multiclass_metrics(data, preds, probs, average='macro')`

Computes multiclass classification metrics from:

data['y_test']
predicted labels
class probabilities

Returns:

precision
recall
auc
accuracy

This helper uses safe_ovr_auc() instead of calling raw multiclass AUC directly.

`continuous_metrics(data, preds)`

Computes the current regression metrics from:

data['y_test']
continuous predictions preds

Returns:

bias
mae
rmse
r2
mape

mape is reported in percent units.

MAPE divides by the target value. Rows with zero or near-zero financial returns can dominate or invalidate the percentage interpretation; use MAE/RMSE/bias for return series where the denominator can be zero or economically negligible.

`balanced_metric(y_true, y_pred)`

balanced_metric() is Limen's compact binary score for cases where class balance matters.

Current formula:

precision * sqrt(trade_rate)

This rewards accurate positive calls while penalizing degenerate behavior such as never trading.

If there are no positive predictions, it returns 0.0.

Example:

score = balanced_metric(y_true, y_pred)

`safe_ovr_auc(y_true, probs)`

safe_ovr_auc() computes one-vs-rest AUC more defensively than a raw direct multiclass AUC call.

Its purpose is to make multiclass evaluation more stable when not every class is present in every fold.

If no valid class-vs-rest comparisons can be made, it returns NaN.

Boundary: when probs contains only the classes present in y_true, columns are read in the sorted-present-class order. When probs still contains full integer-label columns, the class label itself is used as the probability-column index. If columns cannot be aligned safely, the helper returns NaN instead of reading the wrong column.

Where These Helpers Fit

A reference-architecture model function has this shape:

import numpy as np

from limen.metrics.binary_metrics import binary_metrics

def model(data):
    preds = np.zeros(len(data['y_test']), dtype=int)
    probs = np.zeros(len(data['y_test']), dtype=float)

    results = binary_metrics(data, preds, probs)
    results['_preds'] = preds
    return results

That is the level of abstraction these helpers are meant for.

Public Surface​

Import Pattern​

binary_metrics(data, preds, probs)​

Edge cases​

multiclass_metrics(data, preds, probs, average='macro')​

continuous_metrics(data, preds)​

balanced_metric(y_true, y_pred)​

safe_ovr_auc(y_true, probs)​

Where These Helpers Fit​

Read Next​