I have a database and API for hindi wordnet. I want to access this wordnet from NLTK python. Is there any way to add our own wordnet into NLTK?

1.2k Views Asked by Aniruddha Tammewar At 05 June 2014 at 06:50

I have a database and API for hindi wordnet. I want to access this wordnet from NLTK python, so as to use NLTK Wordnet functions with our wordnet. Is there any way to add our own wordnet into NLTK? Or Are there any tools for Word Sense Disambiguation in Hindi (that can work with any Language Wordnet with some modifications) (which gives most suitable sense from wordnet)?

Original Q&A

There are 1 best solutions below

Everst On 05 June 2014 at 22:24 BEST ANSWER

If you look in your nltk_data folder, you'll see that wordnet like every other NLTK corpus is just a bunch of plain-text files. So, there must be a way to format your Hindi wordnet the same way as the NLTK one to use the functions. Here is the excerpt from the nltk.corpus.reader.wordnet object where these files are being read:

#: A list of file identifiers for all the fileids used by this
#: corpus reader.
_FILES = ('cntlist.rev', 'lexnames', 'index.sense',
          'index.adj', 'index.adv', 'index.noun', 'index.verb',
          'data.adj', 'data.adv', 'data.noun', 'data.verb',
          'adj.exc', 'adv.exc', 'noun.exc', 'verb.exc', )

def __init__(self, root):
    """
    Construct a new wordnet corpus reader, with the given root
    directory.
    """
    super(WordNetCorpusReader, self).__init__(root, self._FILES,
                                              encoding=self._ENCODING)

I suppose you don't really need to generate all these files but more importantly have to use the "index.sense" file for Word Sense Disambiguation. This is not generated by NLTK but have to be pre-processed before that or must be coming with your Hindi wordnet in the following format - http://wordnet.princeton.edu/wordnet/man/senseidx.5WN.html.

After you've done all steps I would just go to ../nltk/corpus/reader/wordnet.py and either create a copy of it where you can change the root and filenames and maybe some other dependencies but still use the functionality OR change what you need within existing classes (not recommended).

P.S. A little of googling gave me the link to http://www.cs.utexas.edu/~rashish/cs365ppt.pdf, which references a bunch of other sources on the subject.

I have a database and API for hindi wordnet. I want to access this wordnet from NLTK python. Is there any way to add our own wordnet into NLTK?

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in NLTK

Related Questions in WORDNET

Related Questions in HINDI

Related Questions in WSD

Trending Questions

Popular # Hahtags

Popular Questions