How to augment udpipe models with custom dictionary?

192 Views Asked by Afiq Johari At 11 June 2021 at 12:00

Is there a way to add a dictionary of custom user defined words to the udpipe models?

For example, below using the default english model, some of the words should have been identified as the keywords, such as R, Python, SQL, javascript, Excel, noSQL.

I would like to augment the default english model with my own custom words, so that the textrank_keywords function will be able to better identify relevant keywords.

library(udpipe)
library(dplyr)
tagger <- udpipe_download_model("english")
tagger <- udpipe_load_model(tagger$file_model)

# read data
rawdata <- c("Automating and R/Python package development.","You have a sound knowledge of another data analysis language (R,Python, SQL, javascript) and you don't care in which relational database, Excel, bigdata or noSQL store your data is located.")

# annotate
rawdata_annotate <- udpipe_annotate(tagger, rawdata) %>% as_tibble()

keyw <- textrank_keywords(rawdata_annotate$lemma,
                          relevant = rawdata_annotate$upos %in% c("PROPN","NOUN", "VERB", "ADJ"))

have <- keyw$terms
[1] "package"    "analysis"   "sound"      "relational"

rawdata_annotate %>% dplyr::filter(token %in% c('R', 'Python', 'SQL', 'javascript', 'Excel', 'noSQL')) %>% dplyr::select(token, lemma, upos)

  token      lemma      upos 
  <chr>      <chr>      <chr>
1 R          R          PROPN
2 Python     python     NOUN 
3 R          r          NOUN 
4 Python     python     NOUN 
5 SQL        sql        NOUN 
6 javascript javascript NOUN 
7 Excel      Excel      PROPN
8 noSQL      nosql      AUX

Original Q&A

There are 1 best solutions below

Afiq Johari On 11 June 2021 at 14:38

I think I found the answer. Basically I would need to create a custom CONLL-U file for the custom annotation. And then train the model.

https://bnosac.github.io/udpipe/docs/doc3.html

How to augment udpipe models with custom dictionary?

There are 1 best solutions below

Related Questions in R

Related Questions in NLP

Related Questions in UDPIPE

Trending Questions

Popular # Hahtags

Popular Questions