mlshell.Dataset¶

class mlshell.Dataset(*args, **kwargs)[source]¶

Bases: dict

Unified data interface.

Implements interface to access arbitrary data.

Interface: x, y, data, meta, subset, dump_pred and whole dict api.

Parameters

*args (list) – Passed to parent class constructor.
**kwrags (dict) – Passed to parent class constructor.

data¶

Underlying data.

Type: pandas.DataFrame

subsets¶

{‘subset_id’ : array-like subset indices, ..}.

Type: dict

meta¶

Extracted auxiliary information from data: {

‘index’: list: List of index column label(s).
‘features’: list: List of feature column label(s).
‘categoric_features’: list: List of categorical feature column label(s).
‘targets’: list: List of target column label(s),
‘indices’: list: List of rows indices.
‘classes’: list of numpy.ndarray: List of sorted unique labels for each target(s) (n_outputs, n_classes).
‘pos_labels’: list: List of “positive” label(s) for target(s) (n_outputs,).
‘pos_labels_ind’: list: List of “positive” label(s) index in numpy.unique() for target(s) (n_outputs).
categoric_ind_namedict: Dictionary with categorical feature indices as key, and tuple (‘feature_name’, categories) as value: {‘column_index’: (‘feature_name’, [‘cat1’, ‘cat2’])}.
numeric_ind_namedict: Dictionary with numeric features indices as key, and tuple (‘feature_name’, ) as value: {‘columns_index’:(‘feature_name’,)}.

}

Type: dict

Notes

Inherited from dict class, so attributes section describes keys.

Attributes

data: pandas.DataFrame : Access data.
meta: dict: Access meta.
oid: str: Dataset identifier.
x: pandas.DataFrame : Extracted features columns.
y: pandas.DataFrame : Extracted targets columns.

Methods

`clear`()
`copy`()
`dump_pred`(filepath, y_pred, **kwargs)	Dump columns to disk.
`fromkeys`(iterable[, value])	Create a new dictionary with keys from iterable and values set to value.
`get`(key[, default])	Return the value for key if key is in the dictionary, else default.
`items`()
`keys`()
`pop`(k[,d])	If key is not found, d is returned if given, otherwise KeyError is raised
`popitem`()	2-tuple; but raise KeyError if D is empty.
`setdefault`(key[, default])	Insert key with a value of default if key is not in the dictionary.
`subset`(subset_id)	`mlshell.Dataset` : Access subset.
`update`([E, ]**F)	If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]
`values`()

__init__(*args, **kwargs)[source]¶: Initialize self. See help(type(self)) for accurate signature.

Methods

`__init__`(args, *kwargs)	Initialize self.
`clear`()
`copy`()
`dump_pred`(filepath, y_pred, **kwargs)	Dump columns to disk.
`fromkeys`(iterable[, value])	Create a new dictionary with keys from iterable and values set to value.
`get`(key[, default])	Return the value for key if key is in the dictionary, else default.
`items`()
`keys`()
`pop`(k[,d])	If key is not found, d is returned if given, otherwise KeyError is raised
`popitem`()	2-tuple; but raise KeyError if D is empty.
`setdefault`(key[, default])	Insert key with a value of default if key is not in the dictionary.
`subset`(subset_id)	`mlshell.Dataset` : Access subset.
`update`([E, ]**F)	If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]
`values`()

Attributes

`data`	`pandas.DataFrame` : Access data.
`meta`	dict: Access meta.
`oid`	str: Dataset identifier.
`x`	`pandas.DataFrame` : Extracted features columns.
`y`	`pandas.DataFrame` : Extracted targets columns.

property oid¶: str: Dataset identifier.

property x¶: pandas.DataFrame : Extracted features columns.

property y¶: pandas.DataFrame : Extracted targets columns.

property meta¶: dict: Access meta.

property data¶: pandas.DataFrame : Access data.

subset(subset_id)[source]¶: mlshell.Dataset : Access subset.

dump_pred(filepath, y_pred, **kwargs)[source]¶

Dump columns to disk.

Parameters

filepath (str) – File path without extension.
y_pred (array-like) – pipeline.predict() result.
**kwargs (dict) –
Additional kwargs to pass in .to_csv(**kwargs) (`) –

Returns

fullpath – Full filepath.

Return type

str

clear() → None. Remove all items from D.¶

copy() → a shallow copy of D¶

fromkeys(iterable, value=None, /)¶: Create a new dictionary with keys from iterable and values set to value.

get(key, default=None, /)¶: Return the value for key if key is in the dictionary, else default.

items() → a set-like object providing a view on D’s items¶

keys() → a set-like object providing a view on D’s keys¶

pop(k[, d]) → v, remove specified key and return the corresponding value.¶: If key is not found, d is returned if given, otherwise KeyError is raised

popitem() → (k, v), remove and return some (key, value) pair as a¶: 2-tuple; but raise KeyError if D is empty.

setdefault(key, default=None, /)¶

Insert key with a value of default if key is not in the dictionary.

Return the value for key if key is in the dictionary, else default.

update([E, ]**F) → None. Update D from dict/iterable E and F.¶: If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

values() → an object providing a view on D’s values¶