Skip to main content

Transforms

Transforms in Limen are lightweight helpers used either during data preparation or immediately after model scoring. They are not the same thing as train-fitted scalers.

Use this page when you need to shape a target, clip or normalize a frame in a stateless way, or calibrate a classifier after fitting. For train-only fitted preprocessing, see Scalers.

Two Transform Families

FamilyWhat it doesTypical place in the pipeline
DataFrame transformsModify or filter columns in a pl.DataFrametarget building, small preprocessing steps, or split-local shaping
Post-model helpersWork on fitted classifiers or probability arrayscalibration and threshold selection after model training

DataFrame Transforms

These helpers operate on the frame passed into them. They do not carry learned state across splits. If you call them separately on train, validation, and test, each call uses the statistics of the frame it received.

FunctionBehaviorNotes
mad_transform(df, time_col='datetime')rescales numeric columns by median absolute deviationLeaves the time column untouched.
winsorize_transform(df, time_col='datetime')clips numeric columns to fixed 1% and 99% quantilesGood when you want to tame outliers without dropping rows.
quantile_trim_transform(df, time_col='datetime')removes rows outside fixed 0.5% and 99.5% bounds across numeric columnsMore aggressive than winsorization because rows can disappear.
zscore_transform(df, time_col='datetime')standardizes numeric columns to mean zero and unit varianceStateless per call, unlike a train-fitted scaler.
shift_column_transform(data, shift, column)shifts one column in placeCommon in target construction. Negative values shift forward in time.

Post-Model Helpers

These helpers are used after a model has already been fitted.

FunctionReturnsNotes
calibrate_classifier(clf, x_val, y_val, x_sets, method='isotonic')a tuple of calibrated positive-class probability arraysUses CalibratedClassifierCV with cv='prefit'. method is usually isotonic or sigmoid.
optimize_binary_threshold(y_val, y_val_proba, ...)(best_threshold, best_score)Sweeps thresholds on validation probabilities and optimizes balanced, f1, precision, or accuracy.

Target-Building Example

from limen.features import compute_quantile_cutoff, quantile_flag
from limen.transforms import shift_column_transform

(
manifest.with_target('quantile_flag')
.add_fitted_transform(quantile_flag)
.fit_param('_quantile_cutoff', compute_quantile_cutoff, col='roc_{roc_period}', q='q')
.with_params(col='roc_{roc_period}', cutoff='_quantile_cutoff')
.add_transform(shift_column_transform, shift='shift', column='target_column')
.done()
)

The important detail is that the fitted parameter comes from the manifest builder, not from state stored inside quantile_flag itself.

Calibration Example

from limen.transforms import calibrate_classifier, optimize_binary_threshold

val_proba_cal, test_proba_cal = calibrate_classifier(
clf,
x_val=x_val,
y_val=y_val,
x_sets=[x_val, x_test],
method='isotonic',
)

best_threshold, best_score = optimize_binary_threshold(
y_val=y_val,
y_val_proba=val_proba_cal,
metric='balanced',
)

Boundaries

  • Use a transform when the operation is lightweight and local to the frame or prediction arrays you already have.
  • Use a scaler when the operation must be fitted on train and then reused unchanged on validation and test.
  • If you need split-safe learned parameters inside a target, compute them through the manifest target builder rather than hiding the fitting inside the transform itself.
  • Scalers for train-fitted preprocessing
  • Features for target and regime helpers that often pair with transforms
  • Experiment Manifest for where transforms live in the split-first execution order