mambular.data_utils#
- class mambular.data_utils.MambularDataset(*args: Any, **kwargs: Any)[source]#
Custom dataset for handling structured data with separate categorical and numerical features, tailored for both regression and classification tasks.
- Parameters:
Tensors) (num_features_list (list of) –
Tensors) –
Tensors (embeddings_list (list of) –
optional) (A flag indicating if the dataset is for a regression task. Defaults to True.) –
(Tensor (labels) –
optional) –
(bool (regression) –
optional) –
- class mambular.data_utils.MambularDataModule(*args: Any, **kwargs: Any)[source]#
A PyTorch Lightning data module for managing training and validation data loaders in a structured way.
This class simplifies the process of batch-wise data loading for training and validation datasets during the training loop, and is particularly useful when working with PyTorch Lightning’s training framework.
- Parameters:
preprocessor – object An instance of your preprocessor class.
batch_size – int Size of batches for the DataLoader.
shuffle – bool Whether to shuffle the training data in the DataLoader.
X_val – DataFrame or None, optional Validation features. If None, uses train-test split.
y_val – array-like or None, optional Validation labels. If None, uses train-test split.
val_size – float, optional Proportion of data to include in the validation split if
X_val
andy_val
are None.random_state – int, optional Random seed for reproducibility in data splitting.
regression – bool, optional Whether the problem is regression (True) or classification (False).
- preprocess_data(X_train, y_train, X_val=None, y_val=None, embeddings_train=None, embeddings_val=None, val_size=0.2, random_state=101)[source]#
Preprocesses the training and validation data.
- Parameters:
X_train (DataFrame or array-like, shape (n_samples_train, n_features)) – Training feature set.
y_train (array-like, shape (n_samples_train,)) – Training target values.
embeddings_train (array-like or list of array-like, optional) – Training embeddings if available.
X_val (DataFrame or array-like, shape (n_samples_val, n_features), optional) – Validation feature set. If None, a validation set will be created from
X_train
.y_val (array-like, shape (n_samples_val,), optional) – Validation target values. If None, a validation set will be created from
y_train
.embeddings_val (array-like or list of array-like, optional) – Validation embeddings if available.
val_size (float, optional) – Proportion of data to include in the validation split if
X_val
andy_val
are None.random_state (int, optional) – Random seed for reproducibility in data splitting.
- Return type:
None
- test_dataloader()[source]#
Returns the test dataloader.
- Returns:
DataLoader instance for the test dataset.
- Return type:
DataLoader