Is there a language detection that detects Arabic and Persian languages?

192 Views Asked by At

I have a dataset of twitter texts. Most of the tweets in this dataset are in Persian and some of them are in Arabic. I want to find Arabic tweets. Is there an API or a tool that can do it for me? If I want to explain more, I want a language detection that classifies tweets in Persian and Arabic languages. Thanks.

3

There are 3 best solutions below

0
user22031507 On

you can try langdetect

! pip install langdetect
from langdetect import detect 

You can then create a function for the same like

def detecting(x):
    y=detect(x)
    return y

Then you can store the results in other column so you then get an idea of each tweet language

df['detect']=df['tweet_language'].apply(detecting)

Hope this helps!!!!

0
Asdoost On

There are several options that you can see in this post:

https://stackoverflow.com/a/47106810/9204500

If you are looking for Persian tweets, based on my experience, you will end up with some Dari, Pashto, Urdu, Arabic, Kurdish, and Azeri tweets. None of these tools recognize Persian clearly, specifically in the case of Dari, Azeri, and Kurdish tweets.

0
Moeen Dehqan On

Sure, to detect whether a given string contains Arabic or Persian text in Python, you can use the langid library. First, install the library with:

pip install langid

Then, you can use the following code:

import langid

def detect_language(text):
    lang, confidence = langid.classify(text)
    return lang, confidence

# Example usage:
text_to_check = "Your text to detect the language"
lang, confidence = detect_language(text_to_check)

print(f"Language: {lang}, Confidence: {confidence}")

The detect_language function takes a text input and identifies its language. The lang variable indicates the detected language, and confidence represents the model's confidence in the detection (a value between 0 and 1).

Note that this method may have some inaccuracies, especially with specific words or local expressions. For more accurate results, advanced NLP models may be necessary.