Visualizing Linear Seperability of highdimensional data

20 Views Asked by Eric21 At 14 June 2023 at 08:50

I got a highdimensional dataset X with 128 features for classification. What I would like to do is:

Train a linear SVM on X.
Calculate the hyperplane seperating the data samples.
Apply t-SNE to the dataset X and the hyperplane such that it can be nicely visualized in 2 dimensions.

Problems I encountered so far:

many methods I found in other posts make use of the meshgrid function. In detail, they dont specifically calculate the hyperplane, instead they just create a big meshgrid and use the trained SVM to assign a class to each point in the grid. The problem for me with this approach, is that if I would create a very coarse grid with 10 different values per dimension, my grid would already consists of 10^128 points, which is incredibly large.
The other approach of mine, was to calculate random points on the hyperplane, tranform these with t-SNE and than plot them in the t-SNE plot of the orignial dataset X. The problem here, is that I don't want to influence the t-SNE algorihtm when applied on X such that I apply t-SNE twice. This however results in two dimensionality reductions that have nothing to do with another. See code and outpt generated by it:

from sklearn.manifold import TSNE
from sklearn.svm import SVC

tsne = TSNE(n_components=2, random_state=42)

# My Data
# X.shape = (1500, 128)
# Y.shape = (1500,)

# Tranform the data to two dimensions and scatter plot it
X_embedded = tsne.fit_transform(X)
df_dataset = pd.DataFrame({'x1': X_embedded[:, 0], 'x2': X_embedded[:, 1], 'class': Y})
sns.scatterplot(data=df_dataset, x='x1', y='x2', hue='class', s=10)

clf = SVC(kernel='linear')
clf.fit(X, Y)

# Create 500 random points with the, missing the last dimension
x_min = X[train_index].numpy().min(axis=0)[:-1]
x_max = X[train_index].numpy().max(axis=0)[:-1]
x_rand = np.random.rand(500, 128 - 1) * (x_max - x_min) + x_min

# Calculate the last dimension and add it to the random points
x_last = -((clf.coef_[:, :-1] @ x_rand.T).T + clf.intercept_) / clf.coef_[:, -1]
df_plane = pd.DataFrame(np.c_[x_rand, x_last], columns=[f'x{i}' for i in range(128)])

# Apply TSNE to the plane and plot it
df_plane_tsne = pd.DataFrame({'x1': tsne.fit_transform(df_plane)[:, 0], 'x2': tsne.fit_transform(df_plane)[:, 1]})
sns.lineplot(data=df_plane_tsne, x='x1', y='x2', color='black', alpha=0.5)

Image of Output

Do you know any approaches for solving my issue?

Original Q&A

Visualizing Linear Seperability of highdimensional data

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in SCIKIT-LEARN

Related Questions in TSNE

Trending Questions

Popular # Hahtags

Popular Questions