I would like to ask you all a question about superimposing and calculating the RMSD of multiple mmCIF files at once. I am creating a code that downloads a entire homologous superfamily, which then need to be trimmed down based on a specific RMSD value. I want to automate this process in python (within jupyterLab).

The mmCIF files in question contain different proteins. For now I have tried to use BIO.PDB (MMCIFPParser) to first parse the structure from the first .cif file (called mmcif_ref), and then a list of all other files. I want to compare all other protein structures with the reference and calculate a RMSD. However, the problem is that they don't have the same atoms, which I found on the internet, is one of the main criteria.

My current code doesn't work and gives an error:


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[24], line 1
----> 1 rmsd = calculate_rmsd(mmcif_list, mmcif_ref, mmcif_comp)

Cell In[21], line 7, in calculate_rmsd(mmcif_dir, mmcif_ref, mmcif_comp)
      4 parser = MMCIFParser()
      6 # Parse the structures from the MMCIF files
----> 7 structure1 = parser.get_structure("reference", mmcif_dir + '/' + mmcif_ref)
      8 structure2 = parser.get_structure("comparison", mmcif_dir + '/' + mmcif_comp)
     10 # Select the atoms for superimposition

TypeError: can only concatenate list (not "str") to list

So my question is, seeing my code, what would you advice me to change in order to be able to superimpose multiple different proteins on one reference protein and save only the cif files that meet a specific rmsd value.

I hope someone can help me. Thanks in advance!

# Initialization 
cur_dir = os.getcwd()
mmcif_dir = cur_dir + '/' + protein_name + '/input/cif_files'
output_dir = cur_dir + '/' + protein_name + '/prep'
mmcif_list = []
for file in os.listdir(mmcif_dir):
    if file.endswith('.cif'):
        mmcif_list.append(file)

mmcif_ref = mmcif_list[0]
mmcif_comp = mmcif_list[1:]
print(mmcif_ref)
print(mmcif_comp)

def calculate_rmsd(mmcif_dir, mmcif_ref, mmcif_comp):
    parser = MMCIFParser()

    # Parse the structures from the MMCIF files
    structure1 = parser.get_structure("reference", mmcif_dir + '/' + mmcif_ref)
    structure2 = parser.get_structure("comparison", mmcif_dir + '/' + mmcif_comp)

    # Select the atoms for superimposition
    atoms1 = Selection.unfold_entities(structure1, "N, CA, C")
    atoms2 = Selection.unfold_entities(structure2, "N, CA, C")

    # Create an instance of the Superimposer
    super_imposer = Superimposer()

    # Set the atoms for superimposition
    super_imposer.set_atoms(atoms1, atoms2)

    # Apply the transformation to the atoms of structure2
    super_imposer.apply(structure2.get_atoms())

    # Calculate the RMSD
    rmsd = super_imposer.rms

    return rmsd

rmsd = calculate_rmsd(mmcif_list, mmcif_ref, mmcif_comp)

0

There are 0 best solutions below