How can I get the same WordNet output from the terminal in Python/NLTK?

53 Views Asked by At

I have WordNet installed on my machine, and when I run the terminal command

wn funny -synsa

I get the following output:

enter image description here

Now I would like to get the same information within Python using the NLTK package. For example, if I run

synset_name = 'amusing.s.02'

for l in wordnet.synset(synset_name).lemmas():
    print('Lemma: {}'.format(l.name()))

I get all the lemmas I see in the terminal output (i.e.: amusing, comic, comical, funny, laughable, mirthful, risible). However, what does the "=> humorous (vs. humorless), humourous" part in the terminal output mean and how can I get this with NLTK? It looks kind of like a hypernym, but adjectives don't have hypernym relationships.

1

There are 1 best solutions below

0
alvas On BEST ANSWER

From https://wordnet.princeton.edu/documentation/wn1wn

-syns (n | v | a | r ) Display synonyms and immediate hypernyms of synsets containing searchstr . Synsets are ordered by estimated frequency of use. For adjectives, if searchstr is in a head synset, the cluster's satellite synsets are displayed in place of hypernyms. If searchstr is in a satellite synset, its head synset is also displayed.

To emulate the behavior in NLTK, you'll need to:

  • filter the synset by the POS
  • loop through the synsets
  • print the .lemma_names() per synset
  • if there is an immediate hypernyms, print it
    • else,
      • print the satellite synsets in place of hypernyms
      • if synset is a satellite synset,
        • also print the head synset

In code:

import nltk
from nltk.corpus import wordnet as wn

nltk.download('wordnet')

word = 'funny'

for ss in wn.synsets('funny', 'a'):
  print(', '.join(ss.lemma_names()))
  # if there are immediate hypernyms
  # print the hypernyms
  if ss.hypernyms(): 
    print(ss.hypernyms()[0])
  # if the synset is a satellite sense
  # print the head synsets, i.e. with 'a' POS
  elif str(ss.pos()) == 's': 
    head_ss = ss.similar_tos()[0]
    head_ss_lemma_names = ss.similar_tos()[0].lemma_names()
    head_ss_first_lemma = head_ss_lemma_names[0]
    head_ss_other_lemmas = ""
    if len(head_ss_lemma_names) > 1:
      head_ss_other_lemmas = ", " + ", ".join(ss.similar_tos()[0].lemma_names()[1:])
    head_ss_anton = ""
    if hasattr(head_ss, "_antonyms"):
      first_anto_lemma = head_ss.antonyms()[0].lemma_names()[0]
      head_ss_anton = f" (vs {first_anto_lemma})"
    print(f"   ==> {head_ss_first_lemma}{head_ss_anton}{head_ss_other_lemmas}")
  print()

[out]:

amusing, comic, comical, funny, laughable, mirthful, risible
   ==> humorous, humourous

curious, funny, odd, peculiar, queer, rum, rummy, singular
   ==> strange, unusual

fishy, funny, shady, suspect, suspicious
   ==> questionable

funny
   ==> ill, sick

Note: Somehow the NLTK interface didn't get the antonyms() part of the head synset of the satellite so the (vs ...) lemmas are missing. (Looks like a bug, might be good to raise an issue in nltk and wn pypi library maintainers.