Associating articles and tags efficiently

85 Views Asked by icarus At 15 November 2022 at 13:33

So I have a List of articles and a Map <Article Tag(string), keywords(string[])>. For every tag (about 10 of them) we have corresponding words (about 80 per tag).
I need to iterate through every article and check if it has at least 10 of the keywords:
If yes then assign the article tag for it.

I have come up with this solution. I think it works fine, but 3 nested for-loops scare me and can affect the speed. If you have any ideas on cleaning up the code, I would appreciate the help.

private List<Article> sortByKeyWords(List<Article> articles) {
    System.out.println("STARTING TO FILTERING for array of " + articles.size());
    int matchCounter = 0;
    for (Article a : articles) {
        for (Map.Entry<String, String[]> entry : keyWords.entrySet()) {
            System.out.println("Array name --> " + entry.getKey());
            for (String key : entry.getValue()) {
                System.out.println("Searching for word --> " + key);
                if (a.getContents().contains(key)) {
                    matchCounter++;
                    System.out.println("FOUND A MATCH");
                }
            }
        }

        System.out.println("MATCH COUNTER " + matchCounter);
        if (matchCounter >= 10) {
            a.removeAllTags(Tag.RECHTSGEBIED);
            a.addTag(TagDao.findByName(entry.getKey(), Tag.RECHTSGEBIED));

        }
    }

    return articles;
}

Original Q&A

There are 2 best solutions below

Morph21 On 15 November 2022 at 14:04

You could do this in this way:

for each to change Map <Article Tag(string), keywords(string[])> into
Map<keyword(string), Article Tags(string[])
for each on all articles (a,b,c is inside this loop)

a. for each on your article text -> count all same words

b. remove words with count < 10

c. for each to get all tags for remaining words from map from point 1.

It should be O(n^2) if my counting is good

greybeard On 21 November 2022 at 08:17

First, you need a maintainable specification of what is to happen:

Currently, you check keywords for every article tag in turn.
If enough matches for the current tag are found, all tags are removed and the current one is added?!
Instead, remove tags at the start of this article's processing.
If you want a single tag, process tags in reverse order, and continue to next article as soon as a tag is matched.

Otherwise, a lexer (table driven for flexibility) is an alternative to Morph21's suggestion to use a map.
And the way I understood the requirements, you need to check the number of keywords matched.

Associating articles and tags efficiently

There are 2 best solutions below

Related Questions in JAVA

Related Questions in ALGORITHM

Related Questions in COLLECTIONS

Related Questions in DOCUMENT-CLASSIFICATION

Trending Questions

Popular # Hahtags

Popular Questions