I am trying to use TabCBM for my own tabular datasets which are provided in CSV file. In the model provided, it required some required models such as:
feature_to_concept_model
concept_to_feature_model
So, What does these parts means? It is not referred to it in the paper with details. For example, assume that we have tabular data for train with 1000 rows and 7 columns which are features and 1000 elements vector with binary elements. How can we use TabCBM to train a model for test data?
I have tried this code as a simple example based on what is provided out of code and paper explanation:
from models.tabcbm import TabCBM
import numpy as np
import tensorflow as tf
# Generate random data for X_train and binary labels for y_train
X_train = np.random.randint(7, 100, size=(100, 7))
y_train = np.random.randint(2, size=100) # Binary labels
latent_dims = 4 # Number of latent concepts
# Define your feature-to-concepts model
features_to_concepts_model = tf.keras.Sequential([
tf.keras.layers.Input(shape=(7,)), # Input shape should match your data
tf.keras.layers.Dense(latent_dims),
tf.keras.layers.Softmax()
])
# Define your concepts-to-labels model
concepts_to_labels_model = tf.keras.Sequential([
tf.keras.layers.Input(shape=(latent_dims,)),
tf.keras.layers.Dense(1),
tf.keras.layers.Softmax()
])
# Define your features-to-embeddings model with the correct input shape
features_to_embeddings_model = tf.keras.Sequential([
tf.keras.layers.Input(shape=(7,)), # Input shape should match your data
tf.keras.layers.Dense(latent_dims),
tf.keras.layers.Softmax()
])
# TabCBM parameters
tab_cbm_params = dict(
features_to_concepts_model = features_to_concepts_model,
features_to_embeddings_model = features_to_embeddings_model,
concepts_to_labels_model = concepts_to_labels_model,
mean_inputs=np.mean(X_train, axis=0),
loss_fn=tf.keras.losses.BinaryCrossentropy(), # Binary classification loss
latent_dims=latent_dims,
n_concepts=4,
n_supervised_concepts=0,
coherence_reg_weight=0.1,
diversity_reg_weight=0.1,
feature_selection_reg_weight=0.1,
prob_diversity_reg_weight=0.1,
concept_prediction_weight=0.1,
)
# Create and compile the TabCBM model
ss_tabcbm = TabCBM(
self_supervised_mode=True,
**tab_cbm_params,
)
# Compile the model
ss_tabcbm.compile(optimizer='adam', loss='binary_crossentropy')
# Print the model summary
ss_tabcbm.summary()
# Train the model (Ensure X_train and y_train have the correct shapes)
# ss_tabcbm.fit(X_train, y_train, validation_split=0.2, epochs=10, batch_size=256)
How ever I get error for this code!