Filter triples from a turtle file using SPARQL

68 Views Asked by At

I have following triples in my turtle file on which I would like to apply a regex to filter out the last triples.

ttl file:

###  https://rmswi/#/lists/100000000001
<https://rmswi/#/lists/100000000001> rdf:type owl:Class ;
                                                        rdfs:label "Age Range" .


###  https://rmswi/#/lists/100000000001/terms/100000000029
<https://rmswi/#/lists/100000000001/terms/100000000029> rdf:type owl:Class ;
                                                                           rdfs:subClassOf <https://rmswi/#/lists/100000000001> ;
                                                                           <http://purl.obolibrary.org/obo/IAO_0000115> "Any human before birth." ;
                                                                           <http://www.geneontology.org/formats/oboInOwl#hasExactSynonym> "Fetus" ,
                                                                                                                                          "Foetus" ,
                                                                                                                                          "In utero" ;
                                                                           rdfs:label "In utero" ;
                                                                           <https://ontology/properties/Domain> "https://rmswi/#/lists/100000000004/terms/100000000012" ;
                                                                           <https://ontology/properties/Term_Status> "CURRENT" .


I have following SPARQL query to extract all the triples (with predicate as rdfs:label).

query = """

prefix oboInOwl: <http://www.geneontology.org/formats/oboInOwl#>
prefix obo: <http://purl.obolibrary.org/obo/>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
               
SELECT distinct ?s ?p ?o
WHERE {
    ?s rdfs:label ?o ;
    FILTER (strstarts(str(?s), 'https://rmswi/#/lists/')) .
    FILTER REGEX(?s, 'https:\/\/#\/lists\/\d+$' )
}

"""

qres = g.query(query)

for row in qres:
    print (row)

The expected output is:

ttl file:

(rdflib.term.URIRef('https://#/lists/100000000001'), None, rdflib.term.Literal('Age Range'))

Any help is highly appreciated

0

There are 0 best solutions below