Associating articles and tags efficiently

85 Views Asked by At

So I have a List of articles and a Map <Article Tag(string), keywords(string[])>. For every tag (about 10 of them) we have corresponding words (about 80 per tag).
I need to iterate through every article and check if it has at least 10 of the keywords:
If yes then assign the article tag for it.

I have come up with this solution. I think it works fine, but 3 nested for-loops scare me and can affect the speed. If you have any ideas on cleaning up the code, I would appreciate the help.

private List<Article> sortByKeyWords(List<Article> articles) {
    System.out.println("STARTING TO FILTERING for array of " + articles.size());
    int matchCounter = 0;
    for (Article a : articles) {
        for (Map.Entry<String, String[]> entry : keyWords.entrySet()) {
            System.out.println("Array name --> " + entry.getKey());
            for (String key : entry.getValue()) {
                System.out.println("Searching for word --> " + key);
                if (a.getContents().contains(key)) {
                    matchCounter++;
                    System.out.println("FOUND A MATCH");
                }
            }
        }

        System.out.println("MATCH COUNTER " + matchCounter);
        if (matchCounter >= 10) {
            a.removeAllTags(Tag.RECHTSGEBIED);
            a.addTag(TagDao.findByName(entry.getKey(), Tag.RECHTSGEBIED));

        }
    }

    return articles;
}
2

There are 2 best solutions below

6
Morph21 On

You could do this in this way:

  1. for each to change Map <Article Tag(string), keywords(string[])> into
    Map<keyword(string), Article Tags(string[])

  2. for each on all articles (a,b,c is inside this loop)

    a. for each on your article text -> count all same words

    b. remove words with count < 10

    c. for each to get all tags for remaining words from map from point 1.

It should be O(n^2) if my counting is good

0
greybeard On

First, you need a maintainable specification of what is to happen:

Currently, you check keywords for every article tag in turn.
If enough matches for the current tag are found, all tags are removed and the current one is added?!
Instead, remove tags at the start of this article's processing.
If you want a single tag, process tags in reverse order, and continue to next article as soon as a tag is matched.

Otherwise, a lexer (table driven for flexibility) is an alternative to Morph21's suggestion to use a map.
And the way I understood the requirements, you need to check the number of keywords matched.