Issue
I have a list with three vectors where two are categorical (Whistle_Type and Country) and one is numeric (counts of whistle types A-F) (see below), which I produced using dplyr() with the count() function (see below). I want to run a Chi-Square test to determine if there are any significant differences between whistle types among the countries Germany and France
I want to create a distribution table using a function showing the p-values to conduct a Chi-square test. I would like to produce something like this.
Desired Distribution table with p-values
A B C D E F
France p p p p p p
Germany p p p p p p
*p stands for p-values
I can't quite figure out how to manipulate the function to produce the outcome that I would like. I don't understand this error message as I am incorporating both a dataframe and list into the function
Error in model.frame.default(formula = as.formula(paste(x, " ~ Country")), :
'data' must be a data.frame, environment, or list
Called from: model.frame.default(formula = as.formula(paste(x, " ~ Country")),
data = Count.Whistle.type_ChiSq$n)
If anyone is able to help (see the reproducible data frame below), I would be deeply appreciative
R code
Produce a list showing counts of whistle types per country using the function count()
Count.Whistle.type_ChiSq <- Whistle_Parameters %>% dplyr::count(Whistle_Type, Country)
Count.Whistle.type_ChiSq
List of counts of whistle types per country
Whistle_Type Country n
1 A France 90
2 A Germany 70
3 B France 34
4 B Germany 10
5 C France 24
6 C Germany 9
7 D France 44
8 D Germany 25
9 E France 21
10 E Germany 39
11 F France 25
12 F Germany 32
Chi-Square function
#List of acoustic parameters to conduct a Chi-squre test
Outcomes_Whistle_Types<-c("A", "B","C", "D", "E", "F")
#Eliminate the duplicate rows present in the vector country
Country <- unique(Parameters$Country)
#Prodcue a distribution table with p-values for the Chi Square test
Chi_Whistle<-sapply(Outcomes_Whistle_Types, \(x) chisq.test(xtabs(as.formula(paste(x, ' ~ Country')), Count.Whistle.type_ChiSq$n))$p.value)
#Set the names for the columns and rows in the distribution table
chi_Country <- setNames(Chi_Whistle, Country)
#Chi-Square test
chi_Square_results<-lapply(chi_Country, chisq.test)
chi_Square_results
Many thanks in advance
Reproducible Dataframe
#Dummy data
#Create a cluster column with dummy data (clusters = 3)
f1 <- gl(n = 2, k=167.5); f1
#Produce a data frame for the dummy level data
f2<-as.data.frame(f1)
#Rename the column f2
colnames(f2)<-"Country"
#How many rows
nrow(f2)
#Rename the levels of the dependent variable 'Country' as classifiers
#prefer the inputs to be factors
levels(f2$Country) <- c("France", "Germany")
#Add a vector called Whistle Types
#Add a vector called Behaviors
Whistle_Types<-sample(c('A', 'B', 'C', 'D',
'E', 'F'), 335, replace=TRUE)
#Create random numbers
Start.Freq<-runif(335, min=1.195110e+02, max=23306.000000)
End.Freq<-runif(335, min=3.750000e+02, max=65310.000000)
Delta.Time<-runif(335, min=2.192504e-02, max=3.155762)
Low.Freq<-runif(335, min=6.592500e+02, max=20491.803000)
High.Freq<-runif(335, min=2.051000e+03, max=36388.450000)
Peak.Freq<-runif(335, min=7.324220+02, max=35595.703000)
Center.Freq<-runif(335, min=2.190000e-02, max=3.155800)
Delta.Freq<-runif(335, min=1.171875+03, max=30761.719000)
Delta.Time<-runif(335, min=2.192504e-02, max=3.155762)
#Bind the columns together
Bind<-cbind(f2, Start.Freq, End.Freq, Low.Freq, High.Freq, Peak.Freq, Center.Freq, Delta.Freq, Delta.Time, Whistle_Types)
#Rename the columns
colnames(Bind)<-c('Country', 'Low.Freq', 'High.Freq', 'Start.Freq', 'End.Freq', 'Peak.Freq', 'Center.Freq',
'Delta.Freq', 'Delta.Time',"Whistle_Type")
#Produce a dataframe
Whistle_Parameters<-as.data.frame(Bind)
To be honest, I'm not sure about your desired output. What p-values do you want to show for each combination of country x whistle type?
We can easily calculate one p-value which tests the hypothesis whether there are difference in the distribution of whistle type by country.
This is similar to the first example in the docs of
?chisq.test().For this we just need the
Whistle_Parametersdata and we can usetable()to create a contingency table which we can then use as input forchisq.test().We can find the first example of the docs in
?chisq.test()in Agresti, A. (2007) on page 38.The random data with
set.seed()Created on 2022-10-06 by the reprex package (v2.0.1)