Measuring Top-1 and Top-5 Accuracy using TensorFlow Model Garden

106 Views Asked by At

I've been using the TensorFlow Model Garden to train a set of models on custom datasets that I've created for image classification. Now that it's time to evaluate them, I've run into an issue when trying to measure the top-k accuracies of my networks. The repo supplies a handy evaluation script, namely eval_image_classifier.py, which nicely works for my purposes. Near the end of the script, on line 165, the metrics for evaluation are defined, where I can add measurements of my own, which I've done here:

# Define the metrics:
names_to_values, names_to_updates = slim.metrics.aggregate_metric_map({
    'Precision': slim.metrics.streaming_precision(predictions, labels),
    'Accuracy':  slim.metrics.streaming_accuracy(predictions, labels),
    'Recall@5':  slim.metrics.streaming_recall_at_k(logits, labels, 5),
    'Recall@1':  slim.metrics.streaming_recall_at_k(logits, labels, 1)
})

The TF-slim metrics functions are found here, and contain many useful metrics for evaluating a network's performance. However, I see no way to measure top-k performance using the functions provided. There are recall and precision at top-k functions, which are close, but not quite the same. Moreover, when looking into recall at k and precision at k, they seem most often applied to recommender systems, so I'm not sure these are what I need at all. This article says:

Precision at k is the proportion of recommended items in the top-k set that are relevant

and

Recall at k is the proportion of relevant items found in the top-k recommendations

while top-k accuracy is the number of times the correct item is in the top-k predictions. To me, these all seem quite similar, and I find it very hard to see what the exact differences are, even after doing research on them.

So how do you measure top-k performance using TensorFlow Model Garden? Are the recall and precision at top-k the functions I'm looking for, or do they differ? If not, what would be the best approach to implementing top-k accuracy within the script I'm working with?

1

There are 1 best solutions below

0
Mojtaba Moghri On

It seems like you're looking for a way to measure top-k accuracy. While there are recall and precision at top-k functions available, they may not be exactly what you need. Top-k accuracy is the number of times the correct item is in the top-k predictions, which is different from recall and precision at top-k. If you want to implement top-k accuracy, you need to define your own metric function. To define your own metric function for top-k accuracy, you can modify the evaluation script to calculate the number of times the correct item is in the top-k predictions.