Query a string on array element in api with url enpoint

66 Views Asked by At

I'm using the Art Institute of Chicago API (https://api.artic.edu/docs/#introduction) and there is an element called subject_titles which is an array of strings. I want to query the API to show me all the results which contain the string "landscapes" in the subject_titles, rather than scrape the API and search for the string on my end.

Some failed examples of what I have tried:

https://api.artic.edu/api/v1/artworks/search?q=[subject_titles]=landscapes

https://api.artic.edu/api/v1/artworks/search?query[terms][subject_titles]=landscape

I reckon it would be replacing '[terms]' with a different specifier, but I can't find which. All my research comes up with results that use the Elasticsearch API, but I'm pretty new to this and that seems like a can of worms I don't want to open (why do I need one API to query another API? Also DSL looks like a headache to learn synatx-wise), but I will learn it if I have to. Is there a way to do this using the simple REST style url endpoint?

1

There are 1 best solutions below

7
imotov On BEST ANSWER

TL;DR: https://api.artic.edu/api/v1/artworks/search?query[match][subject_titles]=landscape

If you want an more detailed explanation, I think they tried to make the interface powerful and concise so you can do structured queries with just an url, but I agree, it is a bit confusing.

It looks like the URL parameters are getting translated into top level elements in the DSL and if values start with something like foo[bar] they are getting translated into foo with bar nested inside. So if you have foo[bar][baz]=10 it will be translated into

{
  "foo": {
    "bar": {
     "baz": 10
    }
  }
}

With this information in mind we can reverse engineer query[term][is_public_domain]=true into

{
  "query": {
    "term": {
      "is_public_domain": true
    }
  }
}

If we now open elasticsearch documentation we can figure out that term is the type of the query and this query will search all documents were the field is_public_domain contains true. We need to search for another field and another value. So, if we replace is_public_domain with subject_titles and true with landscape. Term works well for boolean fields such as is_public_domain but it is better to search strings with another query type - match. So we should also replace term with match. At the end we will get the following query:

{
  "query": {
    "match": {
      "subject_titles": "landscape"
    }
  }
}

Now we can convert it back into URL representation: query[match][subject_titles]=landscape and if we stick it back on the URL we get

https://api.artic.edu/api/v1/artworks/search?query[match][subject_titles]=landscape

This will give us the first 10 hits. If we want more, we can add limit:

https://api.artic.edu/api/v1/artworks/search?query[match][subject_titles]=landscape&limit=100

and if we want even more we can start paging through the results using the page parameter

https://api.artic.edu/api/v1/artworks/search?query[match][subject_titles]=landscape&limit=100&page=2