Configurations#
- class mambular.configs.DefaultMambularConfig(lr=0.0001, lr_patience=10, weight_decay=1e-06, lr_factor=0.1, use_embeddings=False, embedding_activation=torch.nn.Identity, embedding_type='linear', embedding_bias=False, layer_norm_after_embedding=False, d_model=64, plr_lite=False, n_frequencies=48, frequencies_init_scale=0.01, embedding_projection=True, batch_norm=False, layer_norm=False, layer_norm_eps=1e-05, activation=torch.nn.SiLU, cat_encoding='int', n_layers=4, d_conv=4, dilation=1, expand_factor=2, bias=False, dropout=0.0, dt_rank='auto', d_state=128, dt_scale=1.0, dt_init='random', dt_max=0.1, dt_min=0.0001, dt_init_floor=0.0001, norm='RMSNorm', conv_bias=False, AD_weight_decay=True, BC_layer_norm=False, shuffle_embeddings=False, head_layer_sizes=<factory>, head_dropout=0.5, head_skip_layers=False, head_activation=torch.nn.SELU, head_use_batch_norm=False, pooling_method='avg', bidirectional=False, use_learnable_interaction=False, use_cls=False, use_pscan=False, mamba_version='mamba-torch')[source]#
Configuration class for the Default Mambular model with predefined hyperparameters.
- Parameters:
d_model (int, default=64) – Dimensionality of the model.
n_layers (int, default=4) – Number of layers in the model.
expand_factor (int, default=2) – Expansion factor for the feed-forward layers.
bias (bool, default=False) – Whether to use bias in the linear layers.
dropout (float, default=0.0) – Dropout rate for regularization.
d_conv (int, default=4) – Size of convolution over columns.
dilation (int, default=1) – Dilation factor for the convolution.
dt_rank (str, default="auto") – Rank of the decision tree used in the model.
d_state (int, default=128) – Dimensionality of the state in recurrent layers.
dt_scale (float, default=1.0) – Scaling factor for decision tree parameters.
dt_init (str, default="random") – Initialization method for decision tree parameters.
dt_max (float, default=0.1) – Maximum value for decision tree initialization.
dt_min (float, default=1e-04) – Minimum value for decision tree initialization.
dt_init_floor (float, default=1e-04) – Floor value for decision tree initialization.
norm (str, default="RMSNorm") – Type of normalization used (‘RMSNorm’, etc.).
activation (callable, default=nn.SiLU()) – Activation function for the model.
shuffle_embeddings (bool, default=False) – Whether to shuffle embeddings before being passed to Mamba layers.
head_layer_sizes (list, default=()) – Sizes of the layers in the model’s head.
head_dropout (float, default=0.5) – Dropout rate for the head layers.
head_skip_layers (bool, default=False) – Whether to skip layers in the head.
head_activation (callable, default=nn.SELU()) – Activation function for the head layers.
head_use_batch_norm (bool, default=False) – Whether to use batch normalization in the head layers.
pooling_method (str, default="avg") – Pooling method to use (‘avg’, ‘max’, etc.).
bidirectional (bool, default=False) – Whether to process data bidirectionally.
use_learnable_interaction (bool, default=False) – Whether to use learnable feature interactions before passing through Mamba blocks.
use_cls (bool, default=False) – Whether to append a CLS token to the input sequences.
use_pscan (bool, default=False) – Whether to use PSCAN for the state-space model.
mamba_version (str, default="mamba-torch") – Version of the Mamba model to use (‘mamba-torch’, ‘mamba1’, ‘mamba2’).
conv_bias (bool, default=False) – Whether to use a bias in the 1D convolution before each mamba block
AD_weight_decay (bool = True) – Whether to use weight decay als for the A and D matrices in Mamba
BC_layer_norm (bool = False) – Whether to use layer norm on the B and C matrices
-
AD_weight_decay:
bool
= True#
-
BC_layer_norm:
bool
= False#
-
batch_norm:
bool
= False#
-
bias:
bool
= False#
-
bidirectional:
bool
= False#
-
cat_encoding:
str
= 'int'#
-
conv_bias:
bool
= False#
-
d_conv:
int
= 4#
-
d_model:
int
= 64#
-
d_state:
int
= 128#
-
dilation:
int
= 1#
-
dropout:
float
= 0.0#
-
dt_init:
str
= 'random'#
-
dt_init_floor:
float
= 0.0001#
-
dt_max:
float
= 0.1#
-
dt_min:
float
= 0.0001#
-
dt_rank:
str
= 'auto'#
-
dt_scale:
float
= 1.0#
-
embedding_bias:
bool
= False#
-
embedding_projection:
bool
= True#
-
embedding_type:
str
= 'linear'#
-
expand_factor:
int
= 2#
-
frequencies_init_scale:
float
= 0.01#
-
head_dropout:
float
= 0.5#
-
head_layer_sizes:
list
#
-
head_skip_layers:
bool
= False#
-
head_use_batch_norm:
bool
= False#
-
layer_norm:
bool
= False#
-
layer_norm_after_embedding:
bool
= False#
-
layer_norm_eps:
float
= 1e-05#
-
lr:
float
= 0.0001#
-
lr_factor:
float
= 0.1#
-
lr_patience:
int
= 10#
-
mamba_version:
str
= 'mamba-torch'#
-
n_frequencies:
int
= 48#
-
n_layers:
int
= 4#
-
norm:
str
= 'RMSNorm'#
-
plr_lite:
bool
= False#
-
pooling_method:
str
= 'avg'#
-
shuffle_embeddings:
bool
= False#
-
use_cls:
bool
= False#
-
use_embeddings:
bool
= False#
-
use_learnable_interaction:
bool
= False#
-
use_pscan:
bool
= False#
-
weight_decay:
float
= 1e-06#
- class mambular.configs.DefaultFTTransformerConfig(lr=0.0001, lr_patience=10, weight_decay=1e-06, lr_factor=0.1, use_embeddings=False, embedding_activation=torch.nn.Identity, embedding_type='linear', embedding_bias=False, layer_norm_after_embedding=False, d_model=128, plr_lite=False, n_frequencies=48, frequencies_init_scale=0.01, embedding_projection=True, batch_norm=False, layer_norm=False, layer_norm_eps=1e-05, activation=torch.nn.SELU, cat_encoding='int', n_layers=4, n_heads=8, attn_dropout=0.2, ff_dropout=0.1, norm='LayerNorm', transformer_activation=torch.nn.Module, transformer_dim_feedforward=256, norm_first=False, bias=True, head_layer_sizes=<factory>, head_dropout=0.5, head_skip_layers=False, head_activation=torch.nn.SELU, head_use_batch_norm=False, pooling_method='avg', use_cls=False)[source]#
Configuration class for the FT Transformer model with predefined hyperparameters.
- Parameters:
d_model (int, default=128) – Dimensionality of the transformer model.
n_layers (int, default=4) – Number of transformer layers.
n_heads (int, default=8) – Number of attention heads in the transformer.
attn_dropout (float, default=0.2) – Dropout rate for the attention mechanism.
ff_dropout (float, default=0.1) – Dropout rate for the feed-forward layers.
norm (str, default="LayerNorm") – Type of normalization to be used (‘LayerNorm’, ‘RMSNorm’, etc.).
activation (callable, default=nn.SELU()) – Activation function for the transformer layers.
transformer_activation (callable, default=ReGLU()) – Activation function for the transformer feed-forward layers.
transformer_dim_feedforward (int, default=256) – Dimensionality of the feed-forward layers in the transformer.
layer_norm_eps (float, default=1e-05) – Epsilon value for layer normalization to improve numerical stability.
norm_first (bool, default=False) – Whether to apply normalization before other operations in each transformer block.
bias (bool, default=True) – Whether to use bias in linear layers.
head_layer_sizes (list, default=()) – Sizes of the fully connected layers in the model’s head.
head_dropout (float, default=0.5) – Dropout rate for the head layers.
head_skip_layers (bool, default=False) – Whether to use skip connections in the head layers.
head_activation (callable, default=nn.SELU()) – Activation function for the head layers.
head_use_batch_norm (bool, default=False) – Whether to use batch normalization in the head layers.
pooling_method (str, default="avg") – Pooling method to be used (‘cls’, ‘avg’, etc.).
use_cls (bool, default=False) – Whether to use a CLS token for pooling.
cat_encoding (str, default="int") – Method for encoding categorical features (‘int’, ‘one-hot’, or ‘linear’).
-
attn_dropout:
float
= 0.2#
-
batch_norm:
bool
= False#
-
bias:
bool
= True#
-
cat_encoding:
str
= 'int'#
-
d_model:
int
= 128#
-
embedding_bias:
bool
= False#
-
embedding_projection:
bool
= True#
-
embedding_type:
str
= 'linear'#
-
ff_dropout:
float
= 0.1#
-
frequencies_init_scale:
float
= 0.01#
-
head_dropout:
float
= 0.5#
-
head_layer_sizes:
list
#
-
head_skip_layers:
bool
= False#
-
head_use_batch_norm:
bool
= False#
-
layer_norm:
bool
= False#
-
layer_norm_after_embedding:
bool
= False#
-
layer_norm_eps:
float
= 1e-05#
-
lr:
float
= 0.0001#
-
lr_factor:
float
= 0.1#
-
lr_patience:
int
= 10#
-
n_frequencies:
int
= 48#
-
n_heads:
int
= 8#
-
n_layers:
int
= 4#
-
norm:
str
= 'LayerNorm'#
-
norm_first:
bool
= False#
-
plr_lite:
bool
= False#
-
pooling_method:
str
= 'avg'#
-
transformer_dim_feedforward:
int
= 256#
-
use_cls:
bool
= False#
-
use_embeddings:
bool
= False#
-
weight_decay:
float
= 1e-06#
- class mambular.configs.DefaultResNetConfig(lr=0.0001, lr_patience=10, weight_decay=1e-06, lr_factor=0.1, use_embeddings=False, embedding_activation=torch.nn.Identity, embedding_type='linear', embedding_bias=False, layer_norm_after_embedding=False, d_model=32, plr_lite=False, n_frequencies=48, frequencies_init_scale=0.01, embedding_projection=True, batch_norm=False, layer_norm=False, layer_norm_eps=1e-05, activation=torch.nn.SELU, cat_encoding='int', layer_sizes=<factory>, skip_layers=False, dropout=0.5, norm=False, use_glu=False, skip_connections=True, num_blocks=3, average_embeddings=True)[source]#
Configuration class for the default ResNet model with predefined hyperparameters.
- Parameters:
layer_sizes (list, default=(256, 128, 32)) – Sizes of the layers in the ResNet.
activation (callable, default=nn.SELU()) – Activation function for the ResNet layers.
skip_layers (bool, default=False) – Whether to skip layers in the ResNet.
dropout (float, default=0.5) – Dropout rate for regularization.
norm (bool, default=False) – Whether to use normalization in the ResNet.
use_glu (bool, default=False) – Whether to use Gated Linear Units (GLU) in the ResNet.
skip_connections (bool, default=True) – Whether to use skip connections in the ResNet.
num_blocks (int, default=3) – Number of residual blocks in the ResNet.
average_embeddings (bool, default=True) – Whether to average embeddings during the forward pass.
-
average_embeddings:
bool
= True#
-
batch_norm:
bool
= False#
-
cat_encoding:
str
= 'int'#
-
d_model:
int
= 32#
-
dropout:
float
= 0.5#
-
embedding_bias:
bool
= False#
-
embedding_projection:
bool
= True#
-
embedding_type:
str
= 'linear'#
-
frequencies_init_scale:
float
= 0.01#
-
layer_norm:
bool
= False#
-
layer_norm_after_embedding:
bool
= False#
-
layer_norm_eps:
float
= 1e-05#
-
layer_sizes:
list
#
-
lr:
float
= 0.0001#
-
lr_factor:
float
= 0.1#
-
lr_patience:
int
= 10#
-
n_frequencies:
int
= 48#
-
norm:
bool
= False#
-
num_blocks:
int
= 3#
-
plr_lite:
bool
= False#
-
skip_connections:
bool
= True#
-
skip_layers:
bool
= False#
-
use_embeddings:
bool
= False#
-
use_glu:
bool
= False#
-
weight_decay:
float
= 1e-06#
- class mambular.configs.DefaultMLPConfig(lr=0.0001, lr_patience=10, weight_decay=1e-06, lr_factor=0.1, use_embeddings=False, embedding_activation=torch.nn.Identity, embedding_type='linear', embedding_bias=False, layer_norm_after_embedding=False, d_model=32, plr_lite=False, n_frequencies=48, frequencies_init_scale=0.01, embedding_projection=True, batch_norm=False, layer_norm=False, layer_norm_eps=1e-05, activation=torch.nn.ReLU, cat_encoding='int', layer_sizes=<factory>, skip_layers=False, dropout=0.2, use_glu=False, skip_connections=False)[source]#
Configuration class for the default Multi-Layer Perceptron (MLP) model with predefined hyperparameters.
- Parameters:
layer_sizes (list, default=(256, 128, 32)) – Sizes of the layers in the MLP.
activation (callable, default=nn.ReLU()) – Activation function for the MLP layers.
skip_layers (bool, default=False) – Whether to skip layers in the MLP.
dropout (float, default=0.2) – Dropout rate for regularization.
use_glu (bool, default=False) – Whether to use Gated Linear Units (GLU) in the MLP.
skip_connections (bool, default=False) – Whether to use skip connections in the MLP.
-
batch_norm:
bool
= False#
-
cat_encoding:
str
= 'int'#
-
d_model:
int
= 32#
-
dropout:
float
= 0.2#
-
embedding_bias:
bool
= False#
-
embedding_projection:
bool
= True#
-
embedding_type:
str
= 'linear'#
-
frequencies_init_scale:
float
= 0.01#
-
layer_norm:
bool
= False#
-
layer_norm_after_embedding:
bool
= False#
-
layer_norm_eps:
float
= 1e-05#
-
layer_sizes:
list
#
-
lr:
float
= 0.0001#
-
lr_factor:
float
= 0.1#
-
lr_patience:
int
= 10#
-
n_frequencies:
int
= 48#
-
plr_lite:
bool
= False#
-
skip_connections:
bool
= False#
-
skip_layers:
bool
= False#
-
use_embeddings:
bool
= False#
-
use_glu:
bool
= False#
-
weight_decay:
float
= 1e-06#
- class mambular.configs.DefaultTabTransformerConfig(lr=0.0001, lr_patience=10, weight_decay=1e-06, lr_factor=0.1, use_embeddings=False, embedding_activation=torch.nn.Identity, embedding_type='linear', embedding_bias=False, layer_norm_after_embedding=False, d_model=128, plr_lite=False, n_frequencies=48, frequencies_init_scale=0.01, embedding_projection=True, batch_norm=False, layer_norm=False, layer_norm_eps=1e-05, activation=torch.nn.SELU, cat_encoding='int', n_layers=4, n_heads=8, attn_dropout=0.2, ff_dropout=0.1, norm='LayerNorm', transformer_activation=torch.nn.Module, transformer_dim_feedforward=512, norm_first=True, bias=True, head_layer_sizes=<factory>, head_dropout=0.5, head_skip_layers=False, head_activation=torch.nn.SELU, head_use_batch_norm=False, pooling_method='avg')[source]#
Configuration class for the default Tab Transformer model with predefined hyperparameters.
- Parameters:
n_layers (int, default=4) – Number of layers in the transformer.
n_heads (int, default=8) – Number of attention heads in the transformer.
d_model (int, default=128) – Dimensionality of embeddings or model representations.
attn_dropout (float, default=0.2) – Dropout rate for the attention mechanism.
ff_dropout (float, default=0.1) – Dropout rate for the feed-forward layers.
norm (str, default="LayerNorm") – Normalization method to be used.
activation (callable, default=nn.SELU()) – Activation function for the transformer layers.
transformer_activation (callable, default=ReGLU()) – Activation function for the transformer layers.
transformer_dim_feedforward (int, default=512) – Dimensionality of the feed-forward layers in the transformer.
norm_first (bool, default=True) – Whether to apply normalization before other operations in each transformer block.
bias (bool, default=True) – Whether to use bias in the linear layers.
head_layer_sizes (list, default=()) – Sizes of the layers in the model’s head.
head_dropout (float, default=0.5) – Dropout rate for the head layers.
head_skip_layers (bool, default=False) – Whether to skip layers in the head.
head_activation (callable, default=nn.SELU()) – Activation function for the head layers.
head_use_batch_norm (bool, default=False) – Whether to use batch normalization in the head layers.
pooling_method (str, default="avg") – Pooling method to be used (‘cls’, ‘avg’, etc.).
cat_encoding (str, default="int") – Encoding method for categorical features (‘int’, ‘one-hot’, etc.).
-
attn_dropout:
float
= 0.2#
-
batch_norm:
bool
= False#
-
bias:
bool
= True#
-
cat_encoding:
str
= 'int'#
-
d_model:
int
= 128#
-
embedding_bias:
bool
= False#
-
embedding_projection:
bool
= True#
-
embedding_type:
str
= 'linear'#
-
ff_dropout:
float
= 0.1#
-
frequencies_init_scale:
float
= 0.01#
-
head_dropout:
float
= 0.5#
-
head_layer_sizes:
list
#
-
head_skip_layers:
bool
= False#
-
head_use_batch_norm:
bool
= False#
-
layer_norm:
bool
= False#
-
layer_norm_after_embedding:
bool
= False#
-
layer_norm_eps:
float
= 1e-05#
-
lr:
float
= 0.0001#
-
lr_factor:
float
= 0.1#
-
lr_patience:
int
= 10#
-
n_frequencies:
int
= 48#
-
n_heads:
int
= 8#
-
n_layers:
int
= 4#
-
norm:
str
= 'LayerNorm'#
-
norm_first:
bool
= True#
-
plr_lite:
bool
= False#
-
pooling_method:
str
= 'avg'#
-
transformer_dim_feedforward:
int
= 512#
-
use_embeddings:
bool
= False#
-
weight_decay:
float
= 1e-06#
- class mambular.configs.DefaultMambaTabConfig(lr=0.0001, lr_patience=10, weight_decay=1e-06, lr_factor=0.1, use_embeddings=False, embedding_activation=torch.nn.Identity, embedding_type='linear', embedding_bias=False, layer_norm_after_embedding=False, d_model=64, plr_lite=False, n_frequencies=48, frequencies_init_scale=0.01, embedding_projection=True, batch_norm=False, layer_norm=False, layer_norm_eps=1e-05, activation=torch.nn.ReLU, cat_encoding='int', n_layers=1, expand_factor=2, bias=False, d_conv=16, conv_bias=True, dropout=0.05, dt_rank='auto', d_state=128, dt_scale=1.0, dt_init='random', dt_max=0.1, dt_min=0.0001, dt_init_floor=0.0001, axis=1, head_layer_sizes=<factory>, head_dropout=0.0, head_skip_layers=False, head_activation=torch.nn.ReLU, head_use_batch_norm=False, norm='LayerNorm', use_pscan=False, mamba_version='mamba-torch', bidirectional=False)[source]#
Configuration class for the Default MambaTab model with predefined hyperparameters.
- Parameters:
d_model (int, default=64) – Dimensionality of the model.
n_layers (int, default=1) – Number of layers in the model.
expand_factor (int, default=2) – Expansion factor for the feed-forward layers.
bias (bool, default=False) – Whether to use bias in the linear layers.
d_conv (int, default=16) – Dimensionality of the convolutional layers.
conv_bias (bool, default=True) – Whether to use bias in the convolutional layers.
dropout (float, default=0.05) – Dropout rate for regularization.
dt_rank (str, default="auto") – Rank of the decision tree used in the model.
d_state (int, default=128) – Dimensionality of the state in recurrent layers.
dt_scale (float, default=1.0) – Scaling factor for the decision tree.
dt_init (str, default="random") – Initialization method for the decision tree.
dt_max (float, default=0.1) – Maximum value for decision tree initialization.
dt_min (float, default=1e-04) – Minimum value for decision tree initialization.
dt_init_floor (float, default=1e-04) – Floor value for decision tree initialization.
activation (callable, default=nn.ReLU()) – Activation function for the model.
axis (int, default=1) – Axis along which operations are applied, if applicable.
head_layer_sizes (list, default=()) – Sizes of the fully connected layers in the model’s head.
head_dropout (float, default=0.0) – Dropout rate for the head layers.
head_skip_layers (bool, default=False) – Whether to skip layers in the head.
head_activation (callable, default=nn.ReLU()) – Activation function for the head layers.
head_use_batch_norm (bool, default=False) – Whether to use batch normalization in the head layers.
norm (str, default="LayerNorm") – Type of normalization to be used (‘LayerNorm’, ‘RMSNorm’, etc.).
use_pscan (bool, default=False) – Whether to use PSCAN for the state-space model.
mamba_version (str, default="mamba-torch") – Version of the Mamba model to use (‘mamba-torch’, ‘mamba1’, ‘mamba2’).
bidirectional (bool, default=False) – Whether to process data bidirectionally.
-
axis:
int
= 1#
-
batch_norm:
bool
= False#
-
bias:
bool
= False#
-
bidirectional:
bool
= False#
-
cat_encoding:
str
= 'int'#
-
conv_bias:
bool
= True#
-
d_conv:
int
= 16#
-
d_model:
int
= 64#
-
d_state:
int
= 128#
-
dropout:
float
= 0.05#
-
dt_init:
str
= 'random'#
-
dt_init_floor:
float
= 0.0001#
-
dt_max:
float
= 0.1#
-
dt_min:
float
= 0.0001#
-
dt_rank:
str
= 'auto'#
-
dt_scale:
float
= 1.0#
-
embedding_bias:
bool
= False#
-
embedding_projection:
bool
= True#
-
embedding_type:
str
= 'linear'#
-
expand_factor:
int
= 2#
-
frequencies_init_scale:
float
= 0.01#
-
head_dropout:
float
= 0.0#
-
head_layer_sizes:
list
#
-
head_skip_layers:
bool
= False#
-
head_use_batch_norm:
bool
= False#
-
layer_norm:
bool
= False#
-
layer_norm_after_embedding:
bool
= False#
-
layer_norm_eps:
float
= 1e-05#
-
lr:
float
= 0.0001#
-
lr_factor:
float
= 0.1#
-
lr_patience:
int
= 10#
-
mamba_version:
str
= 'mamba-torch'#
-
n_frequencies:
int
= 48#
-
n_layers:
int
= 1#
-
norm:
str
= 'LayerNorm'#
-
plr_lite:
bool
= False#
-
use_embeddings:
bool
= False#
-
use_pscan:
bool
= False#
-
weight_decay:
float
= 1e-06#
- class mambular.configs.DefaultTabulaRNNConfig(lr=0.0001, lr_patience=10, weight_decay=1e-06, lr_factor=0.1, use_embeddings=False, embedding_activation=torch.nn.Identity, embedding_type='linear', embedding_bias=False, layer_norm_after_embedding=False, d_model=128, plr_lite=False, n_frequencies=48, frequencies_init_scale=0.01, embedding_projection=True, batch_norm=False, layer_norm=False, layer_norm_eps=1e-05, activation=torch.nn.SELU, cat_encoding='int', model_type='RNN', n_layers=4, rnn_dropout=0.2, norm='RMSNorm', residuals=False, head_layer_sizes=<factory>, head_dropout=0.5, head_skip_layers=False, head_activation=torch.nn.SELU, head_use_batch_norm=False, pooling_method='avg', norm_first=False, bias=True, rnn_activation='relu', dim_feedforward=256, d_conv=4, dilation=1, conv_bias=True)[source]#
Configuration class for the TabulaRNN model with predefined hyperparameters.
- Parameters:
model_type (str, default="RNN") – Type of model, one of “RNN”, “LSTM”, “GRU”, “mLSTM”, “sLSTM”.
n_layers (int, default=4) – Number of layers in the RNN.
rnn_dropout (float, default=0.2) – Dropout rate for the RNN layers.
d_model (int, default=128) – Dimensionality of embeddings or model representations.
norm (str, default="RMSNorm") – Normalization method to be used.
activation (callable, default=nn.SELU()) – Activation function for the RNN layers.
residuals (bool, default=False) – Whether to include residual connections in the RNN.
head_layer_sizes (list, default=()) – Sizes of the layers in the head of the model.
head_dropout (float, default=0.5) – Dropout rate for the head layers.
head_skip_layers (bool, default=False) – Whether to skip layers in the head.
head_activation (callable, default=nn.SELU()) – Activation function for the head layers.
head_use_batch_norm (bool, default=False) – Whether to use batch normalization in the head layers.
pooling_method (str, default="avg") – Pooling method to be used (‘avg’, ‘cls’, etc.).
norm_first (bool, default=False) – Whether to apply normalization before other operations in each block.
layer_norm_eps (float, default=1e-05) – Epsilon value for layer normalization.
bias (bool, default=True) – Whether to use bias in the linear layers.
rnn_activation (str, default="relu") – Activation function for the RNN layers.
dim_feedforward (int, default=256) – Size of the feedforward network.
d_conv (int, default=4) – Size of the convolutional layer for embedding features.
dilation (int, default=1) – Dilation factor for the convolution.
conv_bias (bool, default=True) – Whether to use bias in the convolutional layers.
-
batch_norm:
bool
= False#
-
bias:
bool
= True#
-
cat_encoding:
str
= 'int'#
-
conv_bias:
bool
= True#
-
d_conv:
int
= 4#
-
d_model:
int
= 128#
-
dilation:
int
= 1#
-
dim_feedforward:
int
= 256#
-
embedding_bias:
bool
= False#
-
embedding_projection:
bool
= True#
-
embedding_type:
str
= 'linear'#
-
frequencies_init_scale:
float
= 0.01#
-
head_dropout:
float
= 0.5#
-
head_layer_sizes:
list
#
-
head_skip_layers:
bool
= False#
-
head_use_batch_norm:
bool
= False#
-
layer_norm:
bool
= False#
-
layer_norm_after_embedding:
bool
= False#
-
layer_norm_eps:
float
= 1e-05#
-
lr:
float
= 0.0001#
-
lr_factor:
float
= 0.1#
-
lr_patience:
int
= 10#
-
model_type:
str
= 'RNN'#
-
n_frequencies:
int
= 48#
-
n_layers:
int
= 4#
-
norm:
str
= 'RMSNorm'#
-
norm_first:
bool
= False#
-
plr_lite:
bool
= False#
-
pooling_method:
str
= 'avg'#
-
residuals:
bool
= False#
-
rnn_activation:
str
= 'relu'#
-
rnn_dropout:
float
= 0.2#
-
use_embeddings:
bool
= False#
-
weight_decay:
float
= 1e-06#
- class mambular.configs.DefaultMambAttentionConfig(lr=0.0001, lr_patience=10, weight_decay=1e-06, lr_factor=0.1, use_embeddings=False, embedding_activation=torch.nn.Identity, embedding_type='linear', embedding_bias=False, layer_norm_after_embedding=False, d_model=64, plr_lite=False, n_frequencies=48, frequencies_init_scale=0.01, embedding_projection=True, batch_norm=False, layer_norm=False, layer_norm_eps=1e-05, activation=torch.nn.SiLU, cat_encoding='int', n_layers=4, expand_factor=2, n_heads=8, last_layer='attn', n_mamba_per_attention=1, bias=False, d_conv=4, conv_bias=True, dropout=0.0, attn_dropout=0.2, dt_rank='auto', d_state=128, dt_scale=1.0, dt_init='random', dt_max=0.1, dt_min=0.0001, dt_init_floor=0.0001, norm='LayerNorm', head_layer_sizes=<factory>, head_dropout=0.5, head_skip_layers=False, head_activation=torch.nn.SELU, head_use_batch_norm=False, pooling_method='avg', bidirectional=False, use_learnable_interaction=False, use_cls=False, shuffle_embeddings=False, AD_weight_decay=True, BC_layer_norm=False, use_pscan=False, n_attention_layers=1)[source]#
Configuration class for the Default Mambular Attention model with predefined hyperparameters.
- Parameters:
d_model (int, default=64) – Dimensionality of the model.
n_layers (int, default=4) – Number of layers in the model.
expand_factor (int, default=2) – Expansion factor for the feed-forward layers.
n_heads (int, default=8) – Number of attention heads in the model.
last_layer (str, default="attn") – Type of the last layer (e.g., ‘attn’).
n_mamba_per_attention (int, default=1) – Number of Mamba blocks per attention layer.
bias (bool, default=False) – Whether to use bias in the linear layers.
d_conv (int, default=4) – Dimensionality of the convolutional layers.
conv_bias (bool, default=True) – Whether to use bias in the convolutional layers.
dropout (float, default=0.0) – Dropout rate for regularization.
attn_dropout (float, default=0.2) – Dropout rate for the attention mechanism.
dt_rank (str, default="auto") – Rank of the decision tree.
d_state (int, default=128) – Dimensionality of the state in recurrent layers.
dt_scale (float, default=1.0) – Scaling factor for the decision tree.
dt_init (str, default="random") – Initialization method for the decision tree.
dt_max (float, default=0.1) – Maximum value for decision tree initialization.
dt_min (float, default=1e-04) – Minimum value for decision tree initialization.
dt_init_floor (float, default=1e-04) – Floor value for decision tree initialization.
norm (str, default="LayerNorm") – Type of normalization used in the model.
activation (callable, default=nn.SiLU()) – Activation function for the model.
head_layer_sizes (list, default=()) – Sizes of the fully connected layers in the model’s head.
head_dropout (float, default=0.5) – Dropout rate for the head layers.
head_skip_layers (bool, default=False) – Whether to use skip connections in the head layers.
head_activation (callable, default=nn.SELU()) – Activation function for the head layers.
head_use_batch_norm (bool, default=False) – Whether to use batch normalization in the head layers.
pooling_method (str, default="avg") – Pooling method to be used (‘avg’, ‘max’, etc.).
bidirectional (bool, default=False) – Whether to process input sequences bidirectionally.
use_learnable_interaction (bool, default=False) – Whether to use learnable feature interactions before passing through Mamba blocks.
use_cls (bool, default=False) – Whether to append a CLS token for sequence pooling.
shuffle_embeddings (bool, default=False) – Whether to shuffle embeddings before passing to Mamba layers.
cat_encoding (str, default="int") – Encoding method for categorical features (‘int’, ‘one-hot’, etc.).
AD_weight_decay (bool, default=True) – Whether weight decay is applied to A-D matrices.
BC_layer_norm (bool, default=False) – Whether to apply layer normalization to B-C matrices.
use_pscan (bool, default=False) – Whether to use PSCAN for the state-space model.
n_attention_layers (int, default=1) – Number of attention layers in the model.
-
AD_weight_decay:
bool
= True#
-
BC_layer_norm:
bool
= False#
-
attn_dropout:
float
= 0.2#
-
batch_norm:
bool
= False#
-
bias:
bool
= False#
-
bidirectional:
bool
= False#
-
cat_encoding:
str
= 'int'#
-
conv_bias:
bool
= True#
-
d_conv:
int
= 4#
-
d_model:
int
= 64#
-
d_state:
int
= 128#
-
dropout:
float
= 0.0#
-
dt_init:
str
= 'random'#
-
dt_init_floor:
float
= 0.0001#
-
dt_max:
float
= 0.1#
-
dt_min:
float
= 0.0001#
-
dt_rank:
str
= 'auto'#
-
dt_scale:
float
= 1.0#
-
embedding_bias:
bool
= False#
-
embedding_projection:
bool
= True#
-
embedding_type:
str
= 'linear'#
-
expand_factor:
int
= 2#
-
frequencies_init_scale:
float
= 0.01#
-
head_dropout:
float
= 0.5#
-
head_layer_sizes:
list
#
-
head_skip_layers:
bool
= False#
-
head_use_batch_norm:
bool
= False#
-
last_layer:
str
= 'attn'#
-
layer_norm:
bool
= False#
-
layer_norm_after_embedding:
bool
= False#
-
layer_norm_eps:
float
= 1e-05#
-
lr:
float
= 0.0001#
-
lr_factor:
float
= 0.1#
-
lr_patience:
int
= 10#
-
n_attention_layers:
int
= 1#
-
n_frequencies:
int
= 48#
-
n_heads:
int
= 8#
-
n_layers:
int
= 4#
-
n_mamba_per_attention:
int
= 1#
-
norm:
str
= 'LayerNorm'#
-
plr_lite:
bool
= False#
-
pooling_method:
str
= 'avg'#
-
shuffle_embeddings:
bool
= False#
-
use_cls:
bool
= False#
-
use_embeddings:
bool
= False#
-
use_learnable_interaction:
bool
= False#
-
use_pscan:
bool
= False#
-
weight_decay:
float
= 1e-06#
- class mambular.configs.DefaultNDTFConfig(lr=0.0001, lr_patience=10, weight_decay=1e-06, lr_factor=0.1, use_embeddings=False, embedding_activation=torch.nn.Identity, embedding_type='linear', embedding_bias=False, layer_norm_after_embedding=False, d_model=32, plr_lite=False, n_frequencies=48, frequencies_init_scale=0.01, embedding_projection=True, batch_norm=False, layer_norm=False, layer_norm_eps=1e-05, activation=torch.nn.ReLU, cat_encoding='int', min_depth=4, max_depth=16, temperature=0.1, node_sampling=0.3, lamda=0.3, n_ensembles=12, penalty_factor=1e-08)[source]#
Configuration class for the default Neural Decision Tree Forest (NDTF) model with predefined hyperparameters.
- Parameters:
min_depth (int, default=2) – Minimum depth of trees in the forest. Controls the simplest model structure.
max_depth (int, default=10) – Maximum depth of trees in the forest. Controls the maximum complexity of the trees.
temperature (float, default=0.1) – Temperature parameter for softening the node decisions during path probability calculation.
node_sampling (float, default=0.3) – Fraction of nodes sampled for regularization penalty calculation. Reduces computation by focusing on a subset of nodes.
lamda (float, default=0.3) – Regularization parameter to control the complexity of the paths, penalizing overconfident or imbalanced paths.
n_ensembles (int, default=12) – Number of trees in the forest
penalty_factor (float, default=0.01) – Factor with which the penalty is multiplied
- batch_norm: bool = False#
- cat_encoding: str = 'int'#
- d_model: int = 32#
- embedding_bias: bool = False#
- embedding_projection: bool = True#
- embedding_type: str = 'linear'#
- frequencies_init_scale: float = 0.01#
-
lamda:
float
= 0.3#
- layer_norm: bool = False#
- layer_norm_after_embedding: bool = False#
- layer_norm_eps: float = 1e-05#
- lr: float = 0.0001#
- lr_factor: float = 0.1#
- lr_patience: int = 10#
-
max_depth:
int
= 16#
-
min_depth:
int
= 4#
-
n_ensembles:
int
= 12#
- n_frequencies: int = 48#
-
node_sampling:
float
= 0.3#
-
penalty_factor:
float
= 1e-08#
- plr_lite: bool = False#
-
temperature:
float
= 0.1#
- use_embeddings: bool = False#
- weight_decay: float = 1e-06#
- class mambular.configs.DefaultNODEConfig(lr=0.0001, lr_patience=10, weight_decay=1e-06, lr_factor=0.1, use_embeddings=False, embedding_activation=torch.nn.Identity, embedding_type='linear', embedding_bias=False, layer_norm_after_embedding=False, d_model=32, plr_lite=False, n_frequencies=48, frequencies_init_scale=0.01, embedding_projection=True, batch_norm=False, layer_norm=False, layer_norm_eps=1e-05, activation=torch.nn.ReLU, cat_encoding='int', num_layers=4, layer_dim=128, tree_dim=1, depth=6, norm=None, head_layer_sizes=<factory>, head_dropout=0.3, head_skip_layers=False, head_activation=torch.nn.ReLU, head_use_batch_norm=False)[source]#
Configuration class for the Neural Oblivious Decision Ensemble (NODE) model.
- Parameters:
num_layers (int, default=4) – Number of dense layers in the model.
layer_dim (int, default=128) – Dimensionality of each dense layer.
tree_dim (int, default=1) – Dimensionality of the output from each tree leaf.
depth (int, default=6) – Depth of each decision tree in the ensemble.
norm (str, default=None) – Type of normalization to use in the model.
head_layer_sizes (list, default=()) – Sizes of the layers in the model’s head.
head_dropout (float, default=0.5) – Dropout rate for the head layers.
head_skip_layers (bool, default=False) – Whether to skip layers in the head.
head_activation (callable, default=nn.SELU()) – Activation function for the head layers.
head_use_batch_norm (bool, default=False) – Whether to use batch normalization in the head layers.
-
batch_norm:
bool
= False#
-
cat_encoding:
str
= 'int'#
-
d_model:
int
= 32#
-
depth:
int
= 6#
-
embedding_bias:
bool
= False#
-
embedding_projection:
bool
= True#
-
embedding_type:
str
= 'linear'#
-
frequencies_init_scale:
float
= 0.01#
-
head_dropout:
float
= 0.3#
-
head_layer_sizes:
list
#
-
head_skip_layers:
bool
= False#
-
head_use_batch_norm:
bool
= False#
-
layer_dim:
int
= 128#
-
layer_norm:
bool
= False#
-
layer_norm_after_embedding:
bool
= False#
-
layer_norm_eps:
float
= 1e-05#
-
lr:
float
= 0.0001#
-
lr_factor:
float
= 0.1#
-
lr_patience:
int
= 10#
-
n_frequencies:
int
= 48#
-
norm:
Optional
[str
] = None#
-
num_layers:
int
= 4#
-
plr_lite:
bool
= False#
-
tree_dim:
int
= 1#
-
use_embeddings:
bool
= False#
-
weight_decay:
float
= 1e-06#
- class mambular.configs.DefaultTabMConfig(lr=0.0001, lr_patience=10, weight_decay=1e-06, lr_factor=0.1, use_embeddings=False, embedding_activation=torch.nn.Identity, embedding_type='linear', embedding_bias=False, layer_norm_after_embedding=False, d_model=32, plr_lite=False, n_frequencies=48, frequencies_init_scale=0.01, embedding_projection=True, batch_norm=False, layer_norm=False, layer_norm_eps=1e-05, activation=torch.nn.ReLU, cat_encoding='int', layer_sizes=<factory>, dropout=0.5, norm=None, use_glu=False, ensemble_size=32, ensemble_scaling_in=True, ensemble_scaling_out=True, ensemble_bias=True, scaling_init='ones', average_ensembles=False, model_type='mini', average_embeddings=True)[source]#
Configuration class for the TabM model with batch ensembling and predefined hyperparameters.
- Parameters:
layer_sizes (list, default=(512, 512, 128)) – Sizes of the layers in the model.
activation (callable, default=nn.ReLU()) – Activation function for the model layers.
dropout (float, default=0.3) – Dropout rate for regularization.
norm (str, default=None) – Normalization method to be used, if any.
use_glu (bool, default=False) – Whether to use Gated Linear Units (GLU) in the model.
ensemble_size (int, default=32) – Number of ensemble members for batch ensembling.
ensemble_scaling_in (bool, default=True) – Whether to use input scaling for each ensemble member.
ensemble_scaling_out (bool, default=True) – Whether to use output scaling for each ensemble member.
ensemble_bias (bool, default=True) – Whether to use a unique bias term for each ensemble member.
scaling_init ({"ones", "random-signs", "normal"}, default="normal") – Initialization method for scaling weights.
average_ensembles (bool, default=False) – Whether to average the outputs of the ensembles.
model_type ({"mini", "full"}, default="mini") – Model type to use (‘mini’ for reduced version, ‘full’ for complete model).
-
average_embeddings:
bool
= True#
-
average_ensembles:
bool
= False#
-
batch_norm:
bool
= False#
-
cat_encoding:
str
= 'int'#
-
d_model:
int
= 32#
-
dropout:
float
= 0.5#
-
embedding_bias:
bool
= False#
-
embedding_projection:
bool
= True#
-
embedding_type:
str
= 'linear'#
-
ensemble_bias:
bool
= True#
-
ensemble_scaling_in:
bool
= True#
-
ensemble_scaling_out:
bool
= True#
-
ensemble_size:
int
= 32#
-
frequencies_init_scale:
float
= 0.01#
-
layer_norm:
bool
= False#
-
layer_norm_after_embedding:
bool
= False#
-
layer_norm_eps:
float
= 1e-05#
-
layer_sizes:
list
#
-
lr:
float
= 0.0001#
-
lr_factor:
float
= 0.1#
-
lr_patience:
int
= 10#
-
model_type:
Literal
['mini'
,'full'
] = 'mini'#
-
n_frequencies:
int
= 48#
-
norm:
Optional
[str
] = None#
-
plr_lite:
bool
= False#
-
scaling_init:
Literal
['ones'
,'random-signs'
,'normal'
] = 'ones'#
-
use_embeddings:
bool
= False#
-
use_glu:
bool
= False#
-
weight_decay:
float
= 1e-06#
- class mambular.configs.DefaultSAINTConfig(lr=0.0001, lr_patience=10, weight_decay=1e-06, lr_factor=0.1, use_embeddings=False, embedding_activation=torch.nn.Identity, embedding_type='linear', embedding_bias=False, layer_norm_after_embedding=False, d_model=128, plr_lite=False, n_frequencies=48, frequencies_init_scale=0.01, embedding_projection=True, batch_norm=False, layer_norm=False, layer_norm_eps=1e-05, activation=torch.nn.GELU, cat_encoding='int', n_layers=1, n_heads=2, attn_dropout=0.2, ff_dropout=0.1, norm='LayerNorm', norm_first=False, bias=True, head_layer_sizes=<factory>, head_dropout=0.5, head_skip_layers=False, head_activation=torch.nn.SELU, head_use_batch_norm=False, pooling_method='cls', use_cls=True)[source]#
Configuration class for the SAINT model with predefined hyperparameters.
- Parameters:
n_layers (int, default=4) – Number of transformer layers.
n_heads (int, default=8) – Number of attention heads in the transformer.
d_model (int, default=128) – Dimensionality of embeddings or model representations.
attn_dropout (float, default=0.2) – Dropout rate for the attention mechanism.
ff_dropout (float, default=0.1) – Dropout rate for the feed-forward layers.
norm (str, default="LayerNorm") – Type of normalization to be used (‘LayerNorm’, ‘RMSNorm’, etc.).
activation (callable, default=nn.SELU()) – Activation function for the transformer layers.
transformer_activation (callable, default=ReGLU()) – Activation function for the transformer feed-forward layers.
transformer_dim_feedforward (int, default=256) – Dimensionality of the feed-forward layers in the transformer.
norm_first (bool, default=False) – Whether to apply normalization before other operations in each transformer block.
bias (bool, default=True) – Whether to use bias in linear layers.
head_layer_sizes (list, default=()) – Sizes of the fully connected layers in the model’s head.
head_dropout (float, default=0.5) – Dropout rate for the head layers.
head_skip_layers (bool, default=False) – Whether to use skip connections in the head layers.
head_activation (callable, default=nn.SELU()) – Activation function for the head layers.
head_use_batch_norm (bool, default=False) – Whether to use batch normalization in the head layers.
pooling_method (str, default="avg") – Pooling method to be used (‘cls’, ‘avg’, etc.).
use_cls (bool, default=False) – Whether to use a CLS token for pooling.
cat_encoding (str, default="int") – Method for encoding categorical features (‘int’, ‘one-hot’, or ‘linear’).
-
attn_dropout:
float
= 0.2#
-
batch_norm:
bool
= False#
-
bias:
bool
= True#
-
cat_encoding:
str
= 'int'#
-
d_model:
int
= 128#
-
embedding_bias:
bool
= False#
-
embedding_projection:
bool
= True#
-
embedding_type:
str
= 'linear'#
-
ff_dropout:
float
= 0.1#
-
frequencies_init_scale:
float
= 0.01#
-
head_dropout:
float
= 0.5#
-
head_layer_sizes:
list
#
-
head_skip_layers:
bool
= False#
-
head_use_batch_norm:
bool
= False#
-
layer_norm:
bool
= False#
-
layer_norm_after_embedding:
bool
= False#
-
layer_norm_eps:
float
= 1e-05#
-
lr:
float
= 0.0001#
-
lr_factor:
float
= 0.1#
-
lr_patience:
int
= 10#
-
n_frequencies:
int
= 48#
-
n_heads:
int
= 2#
-
n_layers:
int
= 1#
-
norm:
str
= 'LayerNorm'#
-
norm_first:
bool
= False#
-
plr_lite:
bool
= False#
-
pooling_method:
str
= 'cls'#
-
use_cls:
bool
= True#
-
use_embeddings:
bool
= False#
-
weight_decay:
float
= 1e-06#