convert IPTC taxonomy to Boolean Expression

96 Views Asked by At

Can one change IPTC taxonomy to boolean expression? For easing the exchange of news, the International Press Telecommunication Council (IPTC) has developed the NewsML Architecture (NAR), As part of this architecture, specific controlled vocabularies, such as the IPTC News Codes, are used to categorize news items. the Subject Codes is a thesaurus of 1300 terms used for categorizing the main topics (subjects) of each news items." as of 2021, there are 1400 plus terms. The IPTC subjectCodes (from 2012) are tree-like structure with 3 layers. My assumption is a group of vocabularies defines the category of the news. My question: is it possible to convert the hierarchy to a boolean expression like this : "armed conflict" OR "armed dispute" OR "civil riots" OR (("armed" OR "weapon") AND ("right-wing" OR "left-wing" OR "extremist" OR "dangerous" OR "confrontation")) " ?

1

There are 1 best solutions below

5
Brendan Quinn On

We at IPTC have looked at this question in the past when we built a rules-based classification engine as a Google News Initiative project. It's called IPTC EXTRA and it allows users to create rules based on boolean logic to classify documents against terms in the IPTC Media Topics controlled vocabulary (or any other CV).

The rule language, Extra Query Language (EQL) is more expressive than simple Boolean and/or/not operators. We also look at proximity of words and some other characteristics: see the EXTRA User Manual for details.

You can see a set of test rules created for the EXTRA project on our GitHub repository. But please note that this is just a small subset of the rules that would be required to classify any content against the IPTC Media Topics vocabulary. At present, we don't know of a full set of rules for classifying all Media Topics.