My apologises in advance, I'm new to R and using my school's codes as a reference. I do not know why the Max value of the TF-IDF value could be above 1 when I closely followed the example I was given considering that I have normalised my values. I'm not sure why that is the case. Appreciate any help and do tell if more info is needed. Thank you.
# Create Document-Term Matrix
dtm_bumble <- DocumentTermMatrix(bumble)
# Find the unique indexed numbers from each document
ui = unique(dtm_bumble$i)
# If dtm$i does not contain a particular row index p, then row p is empty
new_dtm_bumble = dtm_bumble[ui,]
# Create Document-Term Matrix with TF-IDF values
dtm_tfidf_bumble <- weightTfIdf(new_dtm_bumble, normalize=TRUE)
# Info on DTM
inspect(new_dtm_bumble)
<<DocumentTermMatrix (documents: 84146, terms: 23016)>>
Non-/sparse entries: 645486/1936058850
Sparsity : 100%
Maximal term length: 277
Weighting : term frequency (tf)
Sample :
Terms
Docs date good match messag money pay peopl profil swipe time
33615 0 1 2 0 3 0 0 0 3 0
36782 0 0 0 1 1 0 0 0 0 1
37333 0 0 0 0 2 0 1 0 0 0
40474 1 2 1 0 1 2 0 0 0 1
49551 1 0 1 0 2 1 0 2 0 2
58630 3 0 3 0 2 2 0 0 3 0
63130 1 0 12 0 1 1 0 3 4 8
66277 2 2 0 0 1 0 1 0 0 1
73764 0 1 3 1 0 0 2 2 1 2
83079 0 0 1 0 0 0 0 0 0 0
# Retrieve statistical summary of TF-IDF
summary(dtm_tfidf_bumble$v)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.01849 0.30264 0.50189 0.86867 0.91498 16.36061