Finding embedding of a molecule dataset

80 Views Asked by Souvik Panda At 18 May 2025 at 22:35

Embeddings of a drug dataset using macaw (Molecular Autoencoding AutoWorkAround) which is an Accessible Tool for Molecular Embedding and InverseMolecular Design. After that I convert the embeddings into pandas dataframe and the convert it into a .csv file which includes class labels of the main dataset.

But when I try to apply the smote algorithm on MLP or Logistic Regression Classifier the classification metrices named precision, recall, F1 score remains the same that means there is no improvement after applying the smote.

So, I think there is a problem in finding the embeddings. Please help.

The code which I applied, the dataset and the paper from where I got the idea are given below.

My source code:

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.svm import SVR

from rdkit import SimDivFilters
from rdkit.Chem import rdMolDescriptors
import sys
sys.path.append('../')
import macaw
print(macaw.__version__)
from macaw import *
from google.colab import files
df=files.upload()
df=pd.read_csv("BBBP.csv")
smiles=df.smiles
print(len(smiles))
mcw = MACAW(random_state=42)
mcw.fit(smiles)
BBBP_embedding=mcw.transform(smiles)
print(BBBP_embedding)
hiv_embedding=pd.DataFrame(BBBP_embedding)
extracted_col=df["p_np"]
hiv_embedding=hiv_embedding.join(extracted_col)
hiv_embedding.to_csv("BBBP_embedding.csv")
from google.colab import files
files.download("BBBP_embedding.csv")

Dataset link: https://moleculenet.org/

Paper link: https://pubs.acs.org/doi/10.1021/acs.jcim.2c00229

I expect someone can find the code's mistake and help me to correct it. Thanks!

Original Q&A

Finding embedding of a molecule dataset

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in EMBEDDING

Related Questions in MACAW

Trending Questions

Popular # Hahtags

Popular Questions