Unable to convert serial number date to Date format in R

122 Views Asked by At

I am working with a somewhat messy dataset of dates.

Participants essentially completed anywhere from 1-10 different tests in a time period of 3-4 years. For instance, one participant could have completed 1 test in 2005-09-29. Another participant could have completed 7 tests-- 2 on 2006-08-30, 1 on 2004-03-16, 4 on 2007-04-26. I have approximately 3000 participants, each with one row. Each column represents a different type of test. Values in the columns are the dates that the participant completed the particular test.

I am attempting to identify the "average" date of each participant's tests. I was successful in identifying the "max" date of each participant's test with the following code:

qstdates$max_date <- apply(qstdates, 1, function(x) max_(c(x[2:11])))
qstdates$max_date <- as.Date(qstdates$max_date)

where qstdates is my dataset and columns 2-11 represent the 10 possible tests.

I essentially attempted to take this same code and modify it slightly to find the mean date for each participant (mean of each row). I've attempted a few ways:

qstdates$avg_date <- apply(sapply(qstdates[2:11], as.numeric), 1, function(x) round(mean_(c(x[2:11]))))
qstdates$avg_date <- sapply(qstdates$avg_date, as.Date, origin="1970-01-01")
##OR
qstdates$avg_date <- round(rowMeans(sapply(qstdates[2:11], as.numeric),na.rm=T))
qstdates$avg_date <- sapply(qstdates$avg_date, as.Date, origin="1970-01-01")

But for whatever reason, the serial number dates don't convert to date format. When I run the code, the code runs and the values either become negative or don't change at all! The serial number dates range from 12719-15460.

What could I be doing wrong?

1

There are 1 best solutions below

0
jamberhee On

Found an answer to my own question! Turns out that my question, at its core, was related to this question here.

I was able to calculate the "average date" and when I converted an individual value to a Date, it worked! For some reason, I ran into an issue when I tried to put these individual Dates into a vector. For example:

for(i in 1:length(qstdates$ParticipantID)){
    avg <- qstdates$avg_date[i]
    date <- as.Date(avg,origin="1970-01-01")
    dates <- c(dates,as.character(date))
}

In this code, "date" would return a date in the correct format ("%Y-%M-%D"), and class(date) would return "Date." BUT, "dates" would return a vector of serial date numbers in numeric format. class(dates) would return "numeric."

This led me to change my question and stackoverflow search into: Why can't I put individual dates into a vector that is of the "Date" class? I had never run into this issue before. I took this information and followed a recommended workaround provided to this other question by @mrip. According to @mrip, the issue is that R doesn't allow matrices to be populated with dates. Their workaround to explicitly force my dataset to be of class "Date" as well as "matrix" with class(df)<-c("matrix","Date") solved the issue!