I am trying to generate feature vectors using SMILES of corresponding polymers. Whenever SMILE is like 'C1CCCC1', I am getting vectors. But if my SMILE is like 'C', then I am getting nan values. Please let me know how to tackle this problem. My python code is as follows:
import numpy as np
from rdkit import Chem
from rdkit.Chem import PandasTools, Descriptors
from rdkit.ML.Descriptors import MoleculeDescriptors
descriptors = list(np.array(Descriptors._descList)[:,0])
calculator = MoleculeDescriptors.MolecularDescriptorCalculator(descriptors)
def computeDescriptors(mol, calculator):
res = np.array(calculator.CalcDescriptors(mol))
if not np.all(np.isfinite(res)):
return None #make is easier to identify problematic molecules (.e.g infity descriptor values) later
return res
smiles = `['*C*']#,` 'C1CCCC1', 'CCCCCC', 'CCCC(C)C',
# Convert SMILES to molecular representations (fingerprints)
mol_objects = [Chem.MolFromSmiles(smile) for smile in smiles]
features = [computeDescriptors(x, calculator) for x in mol_objects]
print((features))
Edit 1:
I do this edit after I realized that you did not put the asterisk/star sign to highlight or emphasize where the problem is.
So technically your current SMILES (
'*C*') is not a SMILES but a SMARTS. SMARTS are used to deal with fractions/substructures of molecules. The asterisk/star is a wildcard operator which means, that it could be any other atom. To compute for example the descriptorMaxAbsPartialChargethe nature/identity of the wildcard-atom has to be known! Otherwise it is not possible to return a finite value.The solution to your question is hardly found in coding advice but more in the logical approach: "Why do you want to calculate the
MaxAbsPartialChargefor a molecule where you don't have all atoms specified?"Original answer
I can not reproduce that behavior. For you it returns
Nonebecausenp.all(np.isfinite(res))evaluates toFalse, but for me it returnsTrue.python: 3.10.8
numpy: 1.23.1
rdkit: 2022.09.3