I am trying to use Google Cloud Natural Language API to classify/categorize tweets in order to filter out tweets that are not relevant to my audience (weather related). I can understand it must be tricky for an AI solution to make a classification on a short amount of text but I would imagine it would at least have a guess on text like this:
Wind chills of zero to -5 degrees are expected in Northwestern Arkansas into North-Central Arkansas extending into portions of northern Oklahoma during the 6-9am window . #arwx #okwx
I have tested several tweets but only very few get a categorization, the rest gets no result (or "No categories found. Try a longer text input." if I try it through the GUI).
Is it pointless to hope for this to work? Or, is it possible to decrease the threshold for the categorization? An "educated guess" from the NLP-solution would be better than no filter at all. Is there an alternate solution (outside training my own NLP-model)?
Edit: In order to clarify:
I am, in the end, using the Google Cloud Platform Natural language API in order to classify tweets. In order to test it I am using the GUI (linked above). I can see that quite few of the tweets I test (in the GUI) gets a categorization from GCP NLP, i.e. the category is empty.
The desired state I want is for GCP NLP to provide a category guess of a tweet text, rather than providing an empty result. I assume the NLP model removes any results with a confidence less than X%. It would be interesting to know if that threshold could be configured.
I assume the categorization of tweets must have been done before, and if there is any other way to solve this?
Edit 2: ClassifyTweet-code:
async function classifyTweet(tweetText) {
const language = require('@google-cloud/language');
const client = new language.LanguageServiceClient({projectId, keyFilename});
//const tweetText = "Some light snow dusted the ground this morning, adding to the intense snow fall of yesterday. Here at my Warwick station the numbers are in, New Snow 19.5cm and total depth 26.6cm. A very good snow event. Photos to be posted. #ONStorm #CANWarnON4464 #CoCoRaHSON525"
const document = {
content: tweetText,
type: 'PLAIN_TEXT',
};
const [classification] = await client.classifyText({document});
console.log('Categories:');
classification.categories.forEach(category => {
console.log(`Name: ${category.name}, Confidence: ${category.confidence}`);
});
return classification.categories
}
I have dig on the current state of cloud natural language and my answer to your principal question will be that at the current state of the natural language classify text is not possible. Although, a workaround would be if you base your categories on the output you get from analyzing the text from your inputs.
Consider that we are not using a custom model for this and just using the options that cloud natural language offers, One tentative approach on this matter will be as follows:
To start, I have updated the code from the official samples to our needs to explain a bit further on this:
output
As you can see,
classify textdo not helps a lot (the result its empty). Its when we start toanalyze textthat we can get some values. We can use that to build or own categories. The trick (and hard-work too) will be to make the pool of key words that will fit each category (a category built by us) that we can use to set the data that we are analyzing. About categorization, we can check the current list of available categories made by google to have an idea of what categories should look like.I don't think there is a feature to
lower the baryet implemented with current builds but its something than can be requested to google as a feature.