mlshell.pipeline¶

The mlshell.pipeline contains pipeline related utils.

Classes

Steps(estimator[, estimator_type, th_step])

Unified pipeline steps.

class mlshell.pipeline.Steps(estimator, estimator_type=None, th_step=False)¶

Bases: object

Unified pipeline steps.

Parameters

estimator (sklearn estimator) –
Estimator to use in the last step. If estimator_type=regressor: sklearn.compose.TransformedTargetRegressor(regressor=`estimator`) If estimator_type=classifier and th_step=True: ``sklearn.pipeline.Pipeline(steps=[

(‘predict_proba’,
mlshell.model_selection.PredictionTransformer(estimator)),

(‘apply_threshold’,

mlshell.model_selection.ThresholdClassifier(threshold=0.5,

kwargs=’auto’)),

])``

If estimator_type=classifier and th_step=False: sklearn.pipeline.Pipeline(steps=[('classifier', `estimator`)])
estimator_type (str {'classifier`, 'regressor'}, optional (default=None)) – Either regression or classification task. If None, get from sklearn.base.is_classifier() on estimator.
th_step (bool) – If True and estimator_type=classifier: mlshell.model_selection. ThresholdClassifier sub-step added, otherwise ignored.

Notes

Assembling steps in class are made for convenience. Use steps property to access after initialization. Only OneHot encoder and imputer steps are initially activated. By default, 4 parameters await for resolution (‘auto’):

‘process_parallel__pipeline_categoric__select_columns__kw_args’ ‘process_parallel__pipeline_numeric__select_columns__kw_args’ ‘estimate__apply_threshold__threshold’ ‘estimate__apply_threshold__params’

Set corresponding parameters with set_params() to overwrite default in created pipeline or use mlshell.model_selection.Resolver .

‘pass_custom’ step allows brute force arbitrary parameters in uniform style with pipeline hp (as if score contains additional nested loops). Step name is hard-coded and could not be changed.

‘apply_threshold’ allows grid search classification thresholds as pipeline hyper-parameter.

‘estimate’ step should be the last.

Attributes

steps: list : access steps to pass in sklearn.pipeline.Pipeline .

Methods

`bining_mask`(x)	Get features indices which need bining.
`last_step`(estimator, estimator_type, th_step)	Prepare estimator step.
`scorer_kwargs`(x, **kw_args)	Mock function to custom kwargs setting.
`subcolumns`(x, **kw_args)	Get sub-columns from x.
`subrows`(x)	Get rows from x.

last_step(estimator, estimator_type, th_step)¶: Prepare estimator step.

property steps¶: list : access steps to pass in sklearn.pipeline.Pipeline .

scorer_kwargs(x, **kw_args)¶

Mock function to custom kwargs setting.

Parameters

x (numpy.ndarray or pandas.DataFrame) – Features of shape [n_samples, n_features].
**kw_args (dict) – Step parameters. Could be extracted from pipeline in scorer if needed.

Returns

result – Unchanged x.

Return type

numpy.ndarray or pandas.DataFrame

subcolumns(x, **kw_args)¶

Get sub-columns from x.

Parameters

x (numpy.ndarray or pandas.DataFrame) – Features of shape [n_samples, n_features].
**kw_args (dict) – Columns indices to extract: {‘indices’: array-like}.

Returns

result – Extracted sub-columns of x.

Return type

numpy.ndarray or pandas.DataFrame

subrows(x)¶: Get rows from x.

bining_mask(x)¶: Get features indices which need bining.