mlshell.PipelineProducer¶

class mlshell.PipelineProducer(objects, oid, path_id='path__default', logger_id='logger__default')[source]¶

Bases: pycnfg.producer.Producer

Factory to produce pipeline.

Interface: set, make, load, info.

Parameters

objects (dict) – Dictionary with objects from previous executed producers: {‘section_id__config__id’, object,}
oid (str) – Unique identifier of produced object.
path_id (str, optional (default='default')) – Project path identifier in objects.
logger_id (str, optional (default='default')) – Logger identifier in objects.

objects¶

Dictionary with objects from previous executed producers: {‘section_id__config__id’, object,}

Type: dict

oid¶

Unique identifier of produced object.

Type: str

logger¶

Logger.

Type: logging.Logger

project_path¶

Absolute path to project dir.

Type: str

Methods

`dict_api`(obj[, method])	Forwarding api for dictionary object.
`dump_cache`(obj[, prefix, cachedir, pkg])	Dump intermediate object state to IO.
`info`(pipeline, **kwargs)	Log pipeline info.
`load`(pipeline, filepath, **kwargs)	Load fitted model from disk.
`load_cache`(obj[, prefix, cachedir, pkg])	Load intermediate object state from IO.
`make`(pipeline[, steps, estimator, memory])	Create pipeline from steps.
`run`(init, steps)	Execute configuration steps.
`set`(pipeline, estimator)	Set estimator as pipeline.
`update`(obj, items)	Update key(s) for dictionary object.

__init__(objects, oid, path_id='path__default', logger_id='logger__default')[source]¶: Initialize self. See help(type(self)) for accurate signature.

Methods

`__init__`(objects, oid[, path_id, logger_id])	Initialize self.
`dict_api`(obj[, method])	Forwarding api for dictionary object.
`dump_cache`(obj[, prefix, cachedir, pkg])	Dump intermediate object state to IO.
`info`(pipeline, **kwargs)	Log pipeline info.
`load`(pipeline, filepath, **kwargs)	Load fitted model from disk.
`load_cache`(obj[, prefix, cachedir, pkg])	Load intermediate object state from IO.
`make`(pipeline[, steps, estimator, memory])	Create pipeline from steps.
`run`(init, steps)	Execute configuration steps.
`set`(pipeline, estimator)	Set estimator as pipeline.
`update`(obj, items)	Update key(s) for dictionary object.

set(pipeline, estimator)[source]¶

Set estimator as pipeline.

Parameters

pipeline (mlshell.Pipeline) – Pipeline object, will be updated.
estimator (sklearn estimator) – Estimator to set in mlshell.Pipeline interface.

Returns

pipeline – Resulted pipeline.

Return type

mlshell.Pipeline

make(pipeline, steps=None, estimator=None, memory=None, **kwargs)[source]¶

Create pipeline from steps.

Parameters

pipeline (mlshell.Pipeline) – Pipeline object, will be updated.
steps (list, class, optional (default=None)) – Pipeline steps to pass in sklearn.pipeline.Pipeline . Could be a class with steps attribute, will be initialized steps(estimator, **kwargs). If None, mlshell.pipeline. Steps is used. Set [] to use``estimator`` direct as pipeline.
estimator (sklearn estimator) – Estimator to use in the last step if steps is a class or direct if steps=[].
memory (str, joblib.Memory interface, optional (default=None)) – memory argument passed to sklearn.pipeline.Pipeline . If ‘auto’, “project_path/.temp/pipeline” is used.
**kwargs (dict) – Additional kwargs to initialize steps (if provided as class).

Returns

pipeline – Resulted pipeline.

Return type

mlshell.Pipeline

load(pipeline, filepath, **kwargs)[source]¶

Load fitted model from disk.

Parameters

pipeline (mlshell.Pipeline) – Pipeline object, will be updated.
filepath (str) – Absolute path to load file or relative to ‘project__path’ started with ‘./’.
kwargs (dict) – Additional parameters to pass in load().

Returns

pipeline – Resulted pipeline.

Return type

mlshell.Pipeline

info(pipeline, **kwargs)[source]¶

Log pipeline info.

Parameters

pipeline (mlshell.Pipeline) – Pipeline to explore (if ‘steps’ attribute available).
**kwargs (dict) – Additional parameters to pass in low-level functions.

Returns

pipeline – For compliance with producer logic.

Return type

mlshell.Pipeline

dict_api(obj, method='update', **kwargs)¶

Forwarding api for dictionary object.

Could be useful to add/pop keys via configuration steps. For example to proceed update: (‘dict_api’, {‘b’:7} )

dump_cache(obj, prefix=None, cachedir=None, pkg='pickle', **kwargs)¶

Dump intermediate object state to IO.

Parameters

obj (picklable) – Object to dump.
prefix (str, optional (default=None)) – File identifier, added to filename. If None, ‘self.oid’ is used.
cachedir (str, optional(default=None)) – Absolute path to dump dir or relative to ‘project_path’ started with ‘./’. Created, if not exists. If None, “sproject_path/ .temp/objects” is used.
pkg (str, optional (default='pickle')) – Import package and try pkg.dump(obj, file, **kwargs).
**kwargs (kwargs) – Additional parameters to pass in .dump().

Returns

obj – Unchanged input for compliance with producer logic.

Return type

picklable

load_cache(obj, prefix=None, cachedir=None, pkg='pickle', **kwargs)¶

Load intermediate object state from IO.

Parameters

obj (picklable) – Object template, for producer logic only (ignored).
prefix (str, optional (default=None)) – File identifier. If None, ‘self.oid’ is used.
pkg (str, optional default('pickle')) – Import package and try obj = pkg.load(file, **kwargs).
cachedir (str, optional(default=None)) – Absolute path to load dir or relative to ‘project_path’ started with ‘./’. If None, ‘project_path/.temp/objects’ is used.
**kwargs (kwargs) – Additional parameters to pass in .load().

Returns

obj – Loaded cache.

Return type

picklable object

run(init, steps)¶

Execute configuration steps.

Consecutive call (with decorators):

init = getattr(self, 'method_id')(init, objects=objects, **kwargs)

Parameters

init (object) – Will be passed as arg in each step and get back as result.
steps (list of tuples) – List of self methods to run consecutive with kwargs: (‘method_id’, kwargs, decorators ).

Returns

configs – List of configurations, prepared for execution: [(‘section_id__config__id’, config), …].

Return type

list of tuple

Notes

Object identifier oid auto added, if produced object has oid attribute.

update(obj, items)¶

Update key(s) for dictionary object.

Parameters

obj (dict) – Object to update.
items (dict, list, optional (default=None)) – Either dictionary or items [(key,val),] to update obj.

Returns

obj – Updated input.

Return type

dict