I'm trying creating a table on the following dataset which I'm reporting here the very first fifty observations. Here following it is reported the dataset I'm working on.
There are some typos for age and gnder variable that I susggest to fix as follows:
colnames(d)[8] <- 'COND'
d$gender = ifelse(tolower(substr(d$gender,1,1)) == "k", "F", "M")
library(libr)
d <- datastep(d, {
if (is.na(age)) {
age <- 21
}}
)
I'm trying to create a summary table by using the following code:
CreateTableOne(
vars = c('TASK', 'COND', 't1.key', 'T1.response', 'age', 'T1.ACC'),
strata = c('ID'),
factorVars = c('gender'),
argsApprox = list(correct = FALSE),
smd = TRUE,
addOverall = TRUE,
test = TRUE) %>%
na.omit() %>%
kableone()
obtaning this table
However how you see from this function, as I have many observation for the same subject, I count just 54 IDs and therefore the number of females and males is incorrect.
length(unique(d$ID))
[1] 54
Anyone knows how to fix it? And furthermore as the 'age' and 'T1.ACC' have non-normal distribution anyone knows how I could replace them with median and Q1 and Q3, for example?

I would like to help you. However, there are the following problems with the data you provide:
CONDis missingTASKvariable (theCreateTableOnefunction does not accept variables with one unique value).age.IDis repeated several times.However, even without changing your data, you can see what your problem is. If you have data in this form, you cannot use
CreateTableOne! This is because it counts every occurrence of the valuemand every occurrence of the valuek. And since you have multiple entries for one person, theCreateTableOnefunction will count each occurrence separately.Please take a look at the solution I have proposed here How to describe unique values of grouped observations for several vars?.
Update 1
OKAY. Let's try to face your data. You have 54 patients with different IDs.
However, note that one ID appears to be incorrect.
However, we can leave it as it is. Correct it yourself if you have to. However, remember that you have as many as 8 different genders. Be careful because in our country the gender ideology is not well received ;-)
This, unfortunately, needs to be fixed. Unfortunately, patient P1440 was assigned age by gender. So what is the gender of the P1440?
As you can see, you have more women. So let P1440 be a woman. Will be OK?
Finally, notice that the two variables have inconvenient names. It is about
Condition (whether a person responded)andGo / Nogo (whether a person should respond).Let's fix it all in one go.
Finally, let's change some of the variables from
chrtofactor, but don't replace the correct levels. I hope I took it wisely.With the data organized in such a way, let's get to the heart of the problem. What do you really want to analyze. Note that for variables such as
TASK,Condition, andt1.key, there are both valid values for each applicant.However, if we look at the proportions of the occurrence of different values in these variables, they are different in each patient.
So write clearly what and how you want to summarize because it is not clear to me what you want to get.
Update 2
OKAY. I can see that you are beginning to understand something. Still, I don't know what you want to sum up. Look below. First, let's collect all the code to prepare the data
And now the summary. If we do this:
output
we get a summary for all observations that is
n == 41713. And since there are many observations for each patient, such a summary is of little use. At least I think so. However, we can summarize for a few selected patients.output
This makes more sense now, but is separate for each patient.
Alternatively, you can do this summary without using
CreateTableOne, e.g. yesoutput
Think carefully and write down what you really expect. Unless, of course, this topic is still interesting for you.