mlshell.Dataset

class mlshell.Dataset(*args, **kwargs)[source]

Bases: dict

Unified data interface.

Implements interface to access arbitrary data.

Interface: x, y, data, meta, subset, dump_pred and whole dict api.

Parameters
  • *args (list) – Passed to parent class constructor.

  • **kwrags (dict) – Passed to parent class constructor.

data

Underlying data.

Type

pandas.DataFrame

subsets

{‘subset_id’ : array-like subset indices, ..}.

Type

dict

meta

Extracted auxiliary information from data: {

‘index’: list

List of index column label(s).

‘features’: list

List of feature column label(s).

‘categoric_features’: list

List of categorical feature column label(s).

‘targets’: list

List of target column label(s),

‘indices’: list

List of rows indices.

‘classes’: list of numpy.ndarray

List of sorted unique labels for each target(s) (n_outputs, n_classes).

‘pos_labels’: list

List of “positive” label(s) for target(s) (n_outputs,).

‘pos_labels_ind’: list

List of “positive” label(s) index in numpy.unique() for target(s) (n_outputs).

categoric_ind_namedict

Dictionary with categorical feature indices as key, and tuple (‘feature_name’, categories) as value: {‘column_index’: (‘feature_name’, [‘cat1’, ‘cat2’])}.

numeric_ind_namedict

Dictionary with numeric features indices as key, and tuple (‘feature_name’, ) as value: {‘columns_index’:(‘feature_name’,)}.

}

Type

dict

Notes

Inherited from dict class, so attributes section describes keys.

Attributes
data

pandas.DataFrame : Access data.

meta

dict: Access meta.

oid

str: Dataset identifier.

x

pandas.DataFrame : Extracted features columns.

y

pandas.DataFrame : Extracted targets columns.

Methods

clear()

copy()

dump_pred(filepath, y_pred, **kwargs)

Dump columns to disk.

fromkeys(iterable[, value])

Create a new dictionary with keys from iterable and values set to value.

get(key[, default])

Return the value for key if key is in the dictionary, else default.

items()

keys()

pop(k[,d])

If key is not found, d is returned if given, otherwise KeyError is raised

popitem()

2-tuple; but raise KeyError if D is empty.

setdefault(key[, default])

Insert key with a value of default if key is not in the dictionary.

subset(subset_id)

mlshell.Dataset : Access subset.

update([E, ]**F)

If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

values()

__init__(*args, **kwargs)[source]

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__(*args, **kwargs)

Initialize self.

clear()

copy()

dump_pred(filepath, y_pred, **kwargs)

Dump columns to disk.

fromkeys(iterable[, value])

Create a new dictionary with keys from iterable and values set to value.

get(key[, default])

Return the value for key if key is in the dictionary, else default.

items()

keys()

pop(k[,d])

If key is not found, d is returned if given, otherwise KeyError is raised

popitem()

2-tuple; but raise KeyError if D is empty.

setdefault(key[, default])

Insert key with a value of default if key is not in the dictionary.

subset(subset_id)

mlshell.Dataset : Access subset.

update([E, ]**F)

If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

values()

Attributes

data

pandas.DataFrame : Access data.

meta

dict: Access meta.

oid

str: Dataset identifier.

x

pandas.DataFrame : Extracted features columns.

y

pandas.DataFrame : Extracted targets columns.

property oid

str: Dataset identifier.

property x

pandas.DataFrame : Extracted features columns.

property y

pandas.DataFrame : Extracted targets columns.

property meta

dict: Access meta.

property data

pandas.DataFrame : Access data.

subset(subset_id)[source]

mlshell.Dataset : Access subset.

dump_pred(filepath, y_pred, **kwargs)[source]

Dump columns to disk.

Parameters
  • filepath (str) – File path without extension.

  • y_pred (array-like) – pipeline.predict() result.

  • **kwargs (dict) –

  • Additional kwargs to pass in .to_csv(**kwargs) (`) –

Returns

fullpath – Full filepath.

Return type

str

clear() → None. Remove all items from D.
copy() → a shallow copy of D
fromkeys(iterable, value=None, /)

Create a new dictionary with keys from iterable and values set to value.

get(key, default=None, /)

Return the value for key if key is in the dictionary, else default.

items() → a set-like object providing a view on D’s items
keys() → a set-like object providing a view on D’s keys
pop(k[, d]) → v, remove specified key and return the corresponding value.

If key is not found, d is returned if given, otherwise KeyError is raised

popitem() → (k, v), remove and return some (key, value) pair as a

2-tuple; but raise KeyError if D is empty.

setdefault(key, default=None, /)

Insert key with a value of default if key is not in the dictionary.

Return the value for key if key is in the dictionary, else default.

update([E, ]**F) → None. Update D from dict/iterable E and F.

If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

values() → an object providing a view on D’s values