I found this code for parsing a sdf file but I cannot ignore the whitespace that's why Ki (nm) output does not show.
My file look like this:
> <Ligand InChI Key>
CPZBLNMUGSZIPR-NVXWUHKLSA-N
> <BindingDB MonomerID>
50417287
> <BindingDB Ligand Name>
Aloxi::Aurothioglucose::PALONOSETRON::PALONOSETRON HYDROCHLORIDE
> <Target Name Assigned by Curator or DataSource>
5-hydroxytryptamine receptor 3A
> <Target Source Organism According to Curator or DataSource>
Homo sapiens
> <Ki (nM)>
0.0316
> <IC50 (nM)>
> <Kd (nM)>
> <EC50 (nM)>
---------------------------
awk -v OFS='\t' '
/^>/ { tag=$2; next }
NF { f[tag]=$1 }
$0 == "$$$$" {print f["<pH>"], f["<PMID>"], f["<Ki (nM)>"] }
' P46098.sdf
Thank you!
Please try
match()function to extract the tag between<and>inclusive.match($0, /<.+>/)returns a non-zero value if the regex<.+>matches$0assigning awk variablesRSTARTandRLENGTHto the start position and the length of the matched substring.<.+>matches a substring which starts with<and ends with>. The substring may contain whitespace characters.substr($0, RSTART, RLENGTH)returns the substring of$0starting atRSTARTand length ofRLENGTHcharacters. Then the variabletagis assigned to it.