deeptab.data_utils#

class deeptab.data_utils.MambularDataset(*args: Any, **kwargs: Any)[source]#

Custom dataset for handling structured data with separate categorical and numerical features, tailored for both regression and classification tasks.

Parameters:

Tensors) (num_features_list (list of) –
Tensors) –
Tensors (embeddings_list (list of) –
optional) (A flag indicating if the dataset is for a regression task. Defaults to True.) –
(Tensor (labels) –
optional) –
(bool (regression) –
optional) –

class deeptab.data_utils.MambularDataModule(*args: Any, **kwargs: Any)[source]#

A PyTorch Lightning data module for managing training and validation data loaders in a structured way.

This class simplifies the process of batch-wise data loading for training and validation datasets during the training loop, and is particularly useful when working with PyTorch Lightning’s training framework.

Parameters:

preprocessor – object An instance of your preprocessor class.
batch_size – int Size of batches for the DataLoader.
shuffle – bool Whether to shuffle the training data in the DataLoader.
X_val – DataFrame or None, optional Validation features. If None, uses train-test split.
y_val – array-like or None, optional Validation labels. If None, uses train-test split.
val_size – float, optional Proportion of data to include in the validation split if X_val and y_val are None.
random_state – int, optional Random seed for reproducibility in data splitting.
regression – bool, optional Whether the problem is regression (True) or classification (False).

preprocess_data(X_train, y_train, X_val=None, y_val=None, embeddings_train=None, embeddings_val=None, val_size=0.2, random_state=101)[source]#

Preprocesses the training and validation data.

Parameters:

X_train (DataFrame or array-like, shape (n_samples_train, n_features)) – Training feature set.
y_train (array-like, shape (n_samples_train,)) – Training target values.
embeddings_train (array-like or list of array-like, optional) – Training embeddings if available.
X_val (DataFrame or array-like, shape (n_samples_val, n_features), optional) – Validation feature set. If None, a validation set will be created from X_train.
y_val (array-like, shape (n_samples_val,), optional) – Validation target values. If None, a validation set will be created from y_train.
embeddings_val (array-like or list of array-like, optional) – Validation embeddings if available.
val_size (float, optional) – Proportion of data to include in the validation split if X_val and y_val are None.
random_state (int, optional) – Random seed for reproducibility in data splitting.

Return type:

None

setup(stage)[source]#: Transform the data and create DataLoaders.

test_dataloader()[source]#

Returns the test dataloader.

Returns:: DataLoader instance for the test dataset.
Return type:: DataLoader

train_dataloader()[source]#

Returns the training dataloader.

Returns:: DataLoader instance for the training dataset.
Return type:: DataLoader

val_dataloader()[source]#

Returns the validation dataloader.

Returns:: DataLoader instance for the validation dataset.
Return type:: DataLoader

deeptab.data_utils

Contents

deeptab.data_utils#