I got a highdimensional dataset X with 128 features for classification. What I would like to do is:
- Train a linear SVM on
X. - Calculate the hyperplane seperating the data samples.
- Apply t-SNE to the dataset
Xand the hyperplane such that it can be nicely visualized in 2 dimensions.
Problems I encountered so far:
many methods I found in other posts make use of the
meshgridfunction. In detail, they dont specifically calculate the hyperplane, instead they just create a big meshgrid and use the trained SVM to assign a class to each point in the grid. The problem for me with this approach, is that if I would create a very coarse grid with 10 different values per dimension, my grid would already consists of 10^128 points, which is incredibly large.The other approach of mine, was to calculate random points on the hyperplane, tranform these with t-SNE and than plot them in the t-SNE plot of the orignial dataset
X. The problem here, is that I don't want to influence the t-SNE algorihtm when applied onXsuch that I apply t-SNE twice. This however results in two dimensionality reductions that have nothing to do with another. See code and outpt generated by it:
from sklearn.manifold import TSNE
from sklearn.svm import SVC
tsne = TSNE(n_components=2, random_state=42)
# My Data
# X.shape = (1500, 128)
# Y.shape = (1500,)
# Tranform the data to two dimensions and scatter plot it
X_embedded = tsne.fit_transform(X)
df_dataset = pd.DataFrame({'x1': X_embedded[:, 0], 'x2': X_embedded[:, 1], 'class': Y})
sns.scatterplot(data=df_dataset, x='x1', y='x2', hue='class', s=10)
clf = SVC(kernel='linear')
clf.fit(X, Y)
# Create 500 random points with the, missing the last dimension
x_min = X[train_index].numpy().min(axis=0)[:-1]
x_max = X[train_index].numpy().max(axis=0)[:-1]
x_rand = np.random.rand(500, 128 - 1) * (x_max - x_min) + x_min
# Calculate the last dimension and add it to the random points
x_last = -((clf.coef_[:, :-1] @ x_rand.T).T + clf.intercept_) / clf.coef_[:, -1]
df_plane = pd.DataFrame(np.c_[x_rand, x_last], columns=[f'x{i}' for i in range(128)])
# Apply TSNE to the plane and plot it
df_plane_tsne = pd.DataFrame({'x1': tsne.fit_transform(df_plane)[:, 0], 'x2': tsne.fit_transform(df_plane)[:, 1]})
sns.lineplot(data=df_plane_tsne, x='x1', y='x2', color='black', alpha=0.5)
Do you know any approaches for solving my issue?