How can I categorize tweets with Google Cloud Natural Language API - if possible?

Question

How can I categorize tweets with Google Cloud Natural Language API - if possible?

457 Views Asked by Christoffer At 20 January 2022 at 09:37

I am trying to use Google Cloud Natural Language API to classify/categorize tweets in order to filter out tweets that are not relevant to my audience (weather related). I can understand it must be tricky for an AI solution to make a classification on a short amount of text but I would imagine it would at least have a guess on text like this:

Wind chills of zero to -5 degrees are expected in Northwestern Arkansas into North-Central Arkansas extending into portions of northern Oklahoma during the 6-9am window . #arwx #okwx

I have tested several tweets but only very few get a categorization, the rest gets no result (or "No categories found. Try a longer text input." if I try it through the GUI).

Is it pointless to hope for this to work? Or, is it possible to decrease the threshold for the categorization? An "educated guess" from the NLP-solution would be better than no filter at all. Is there an alternate solution (outside training my own NLP-model)?

Edit: In order to clarify:

I am, in the end, using the Google Cloud Platform Natural language API in order to classify tweets. In order to test it I am using the GUI (linked above). I can see that quite few of the tweets I test (in the GUI) gets a categorization from GCP NLP, i.e. the category is empty.

The desired state I want is for GCP NLP to provide a category guess of a tweet text, rather than providing an empty result. I assume the NLP model removes any results with a confidence less than X%. It would be interesting to know if that threshold could be configured.

I assume the categorization of tweets must have been done before, and if there is any other way to solve this?

Edit 2: ClassifyTweet-code:

async function classifyTweet(tweetText) {
   const language = require('@google-cloud/language');
   const client = new language.LanguageServiceClient({projectId, keyFilename});
   //const tweetText = "Some light snow dusted the ground this morning, adding to the intense snow fall of yesterday. Here at my Warwick station the numbers are in, New Snow 19.5cm and total depth 26.6cm. A very good snow event. Photos to be posted. #ONStorm #CANWarnON4464 #CoCoRaHSON525"
   const document = {
      content: tweetText,
      type: 'PLAIN_TEXT',
   };   
   const [classification] = await client.classifyText({document});
   
   console.log('Categories:');
   classification.categories.forEach(category => {
     console.log(`Name: ${category.name}, Confidence: ${category.confidence}`);
   });
   
   return classification.categories
}

Original Q&A

There are 1 best solutions below

**Betjens** · Accepted Answer · 2022-01-28T17:41:06.450000

I have dig on the current state of cloud natural language and my answer to your principal question will be that at the current state of the natural language classify text is not possible. Although, a workaround would be if you base your categories on the output you get from analyzing the text from your inputs.

Consider that we are not using a custom model for this and just using the options that cloud natural language offers, One tentative approach on this matter will be as follows:

To start, I have updated the code from the official samples to our needs to explain a bit further on this:

from google.cloud import language_v1 
from google.cloud.language_v1 import enums 


def sample_cloud_natural_language_text(text_content):
    """ 
    Args:
      text_content The text content to analyze. Must include at least 20 words.
    """

    client = language_v1.LanguageServiceClient()
    type_ = enums.Document.Type.PLAIN_TEXT

    language = "en"
    document = {"content": text_content, "type": type_, "language": language}


    print("=====CLASSIFY TEXT=====")
    response = client.classify_text(document)
    for category in response.categories:
        print(u"Category name: {}".format(category.name))
        print(u"Confidence: {}".format(category.confidence))


    print("=====ANALYZE TEXT=====")
    response = client.analyze_entities(document)
    for entity in response.entities:
        print(f">>>>> ENTITY {entity.name}")  
        print(u"Entity type: {}".format(enums.Entity.Type(entity.type).name))
        print(u"Salience score: {}".format(entity.salience))

        for metadata_name, metadata_value in entity.metadata.items():
            print(u"{}: {}".format(metadata_name, metadata_value))

        for mention in entity.mentions:
            print(u"Mention text: {}".format(mention.text.content))
            print(u"Mention type: {}".format(enums.EntityMention.Type(mention.type).name))


if __name__ == "__main__":
    #text_content = "That actor on TV makes movies in Hollywood and also stars in a variety of popular new TV shows."
    text_content="Wind chills of zero to -5 degrees are expected in Northwestern Arkansas into North-Central Arkansas extending into portions of northern Oklahoma during the 6-9am window"
    
    sample_cloud_natural_language_text(text_content)

output

=====CLASSIFY TEXT=====
=====ANALYZE TEXT=====
>>>>> ENTITY Wind chills
Entity type: OTHER
Salience score: 0.46825599670410156
Mention text: Wind chills
Mention type: COMMON
>>>>> ENTITY degrees
Entity type: OTHER
Salience score: 0.16041776537895203
Mention text: degrees
Mention type: COMMON
>>>>> ENTITY Northwestern Arkansas
Entity type: ORGANIZATION
Salience score: 0.07702474296092987
mid: /m/02vvkn4
wikipedia_url: https://en.wikipedia.org/wiki/Northwest_Arkansas
Mention text: Northwestern Arkansas
Mention type: PROPER
>>>>> ENTITY North
Entity type: LOCATION
Salience score: 0.07702474296092987
Mention text: North
Mention type: PROPER
>>>>> ENTITY Arkansas
Entity type: LOCATION
Salience score: 0.07088913768529892
mid: /m/0vbk
wikipedia_url: https://en.wikipedia.org/wiki/Arkansas
Mention text: Arkansas
Mention type: PROPER
>>>>> ENTITY window
Entity type: OTHER
Salience score: 0.06348973512649536
Mention text: window
Mention type: COMMON
>>>>> ENTITY Oklahoma
Entity type: LOCATION
Salience score: 0.04747137427330017
wikipedia_url: https://en.wikipedia.org/wiki/Oklahoma
mid: /m/05mph
Mention text: Oklahoma
Mention type: PROPER
>>>>> ENTITY portions
Entity type: OTHER
Salience score: 0.03542650490999222
Mention text: portions
Mention type: COMMON
>>>>> ENTITY 6
Entity type: NUMBER
Salience score: 0.0
value: 6
Mention text: 6
Mention type: TYPE_UNKNOWN
>>>>> ENTITY 9
Entity type: NUMBER
Salience score: 0.0
value: 9
Mention text: 9
Mention type: TYPE_UNKNOWN
>>>>> ENTITY -5
Entity type: NUMBER
Salience score: 0.0
value: -5
Mention text: -5
Mention type: TYPE_UNKNOWN
>>>>> ENTITY zero
Entity type: NUMBER
Salience score: 0.0
value: 0
Mention text: zero
Mention type: TYPE_UNKNOWN

As you can see, classify text do not helps a lot (the result its empty). Its when we start to analyze text that we can get some values. We can use that to build or own categories. The trick (and hard-work too) will be to make the pool of key words that will fit each category (a category built by us) that we can use to set the data that we are analyzing. About categorization, we can check the current list of available categories made by google to have an idea of what categories should look like.

I don't think there is a feature to lower the bar yet implemented with current builds but its something than can be requested to google as a feature.

How can I categorize tweets with Google Cloud Natural Language API - if possible?

There are 1 best solutions below

Related Questions in GOOGLE-CLOUD-PLATFORM

Related Questions in NLP

Related Questions in GOOGLE-NATURAL-LANGUAGE

Trending Questions

Popular # Hahtags

Popular Questions