I am using the package intsvy to analyze PISA data. Using the merge function, I am trying to combine the 2015 student file with the school file. However, I get an error telling me there are duplicate labels.
What is strange is that this code has worked for the past two months, and then inexplicably stopped working and produced the error message below. The two files do have similar labels, but it was my understanding the the merge function recognizes this and combines the two datasets.
Any insights as to why this error is suddenly occurring?
library(intsvy)
PISA2015 <- pisa.select.merge(folder = "/Users/x/Desktop/x/EPICER/Analysis/R Script and Supporting Datasets",
school.file = "2015_SCHQ.sav",
student.file = "2015_STUQ1.sav",
student = c("ESCS", "PARED"),
school = c("CLSIZE", "SCHSIZE"),
countries = c("PRT"))
File character set is 'WINDOWS-1252'.
Converting character set to UTF-8.
File character set is 'WINDOWS-1252'.
Converting character set to UTF-8.
Error in as.factor(x) : Duplicate labels
In addition: Warning messages:
1: 11 variables have duplicated labels:
CNTRYID, Region, STRATUM, SUBNATIO, ST011D17TA, ST011D18TA,
ST011D19TA, PROGN, OCOD1, OCOD2, OCOD3
2: 4 variables have duplicated labels:
CNTRYID, Region, STRATUM, SUBNATIO
I have tried deleting the original PISA data files, then redownloading them. However the issue persists. I have also tried uninstalling the package and Rstudio, then reinstalling both but that did not work either.
Can't say why the issue is occurring, maybe a bug, but here is a repex using data from the PISA 2015 database. You can replace the file paths with your own.
The approach outlined below bypasses the
intsvypackage and instead uses thedplyrandhavenpackages. I tried your method usingintsvyand received the same error. I have never usedintsvybut perhaps some other settings need to be declared. Either way, this works: