Fetching Gene Ontology Terms for a List of Genes Using Python

39 Views Asked by At

I have a CSV file containing gene information with 600 data entries. Each entry includes ENTREZ_GENE_ID, Gene Name, Species, and Ensembl ID. I need to find the Gene Ontology (GO) terms associated with these genes, but doing this manually for 600 entries is quite impractical and time-consuming. I'm looking for a Python script that could automate the process of fetching GO terms for each gene using their Ensembl IDs. Ideally, the script would read the CSV, extract the Ensembl IDs, query an appropriate API (e.g., Ensembl REST API) to get the GO terms, and then either update the CSV or create a new file containing the original data plus the fetched GO terms. Can someone help me with the code for this task, or suggest an efficient approach to accomplish it?

I initially had only Entrez ID and tried to get GO term from ensembl.org but failed. Then I thought may be I could get GO term from ensembl.org by using Ensembl ID. So I wrote a small function to fetch Ensembl ID against gene name. I tried now with Ensembl ID but still could not get GO term. Here is my function for fetching GO term using Ensembl ID

def fetch_go_terms_from_ensembll(ensembl_id):
    """Fetch GO terms for an Ensembl Gene ID using the ontology/annotations endpoint."""
    url = f"http://rest.ensembl.org/ontology/annotations/{ensembl_id}?content-type=application/json"
    response = requests.get(url)
    go_terms = []
    if response.ok:
        data = response.json()
        for annotation in data:
            # Checking if annotation is of type GO
            if annotation.get('ontology', '') == 'GO':
                go_id = annotation['accession']
                go_name = annotation.get('description', 'No description available')
                go_namespace = annotation.get('namespace', 'No namespace')
                go_term = f"{go_id} ({go_namespace}): {go_name}"
                go_terms.append(go_term)
    else:
        print(f"Response not ok for Ensembl ID {ensembl_id}. HTTP Status: {response.status_code}")
    return go_terms
0

There are 0 best solutions below