I want to add the character vector EU_CFSP_INT_all <- c(...) as metadata to my dfm, so that I can further when performing an stm, set the prevalance to EU_CFSP_INT_all. The character vector includes 62 expressions and my corpus/dfm consists of 201 documents. It might sound trivial, but how do I manage to include EU_CFSP_INT_all as a column in the dfm, in which the 62 expressions are featured on every row (201) of the dfm?
The closest I have gotten was by using the following code:
EU_CFSP_INT_all_EV <- rep_len(EU_CFSP_INT_all, length.out = 201)
dfmat_PRs_trim_c$EUint <- EU_CFSP_INT_all_EV
However, it just looped the singularly the 62 expressions until 201 were reached. Accordingly, only one, instead of all 62 were matched with each document in the dfm.
Also converting the vector to a tokens object got me closer to the goal with the tokens object consisting of 201 documents each with the length of 62:
EU_CFSP_INT_all_vector <- rep(list(EU_CFSP_INT_all), 201)
EU_CFSP_vector_toks <- tokens(EU_CFSP_INT_all_vector)
summary(EU_CFSP_vector_toks)
But when I then continued to create another dfm to merge, the values got scrambled. I feel like there must be a very easy way to do this which I am unaware of. Thanks a lot if anyone can help me out!
If you want to add
EU_CFSP_INT_allto your tokens object as a docvar, it's simple:These will remain as docvars then in any dfm you create from
EU_CFSP_vector_toks.Even without that step, however, you could have specified the
EU_CFSP_vector_toksasprevalencein the call tostm(), as long as you also supplied it as a data.frame inmeta.