In R I have a data set that covers the southern Baltic Sea. Some stations were sampled once a year with only one grab sample. Other stations were sampled only once with only one grab sample and others were sampled several years with several grab samples. So my dataset consists of a unique station name (Stations) for a site, then station names for sampling at time x (Stationname) and a station name per grab sample sampled (Stationname_Hol). I would now like to first select a station from all stations, from this select a stationname (sampled time) and from this select a stationname_Hol (grab sample) - remember the variable (bqi) from this. I would like to draw as many stations per run as I have stations (n) and this all 10000 times. From these 10000 drawn n variables I want to calculate the 20th percentile.
I have now one code that is working, but I am not sure if I did it right. Could someone give me some feedback?
My dataset consists of
- row: 270 Stationsnames_Hol (grab samples)
- row: 96 Stationnames (samples time)
- row: 69 unique Stations
- row: bqi values
Ref_EIG4a <- subset(Data_EIG4a, Within_RefArea=="Referenz", select = c(stationsnamen_hol, stationsnamen, Stations, bqi))
for (i in 1:10000) {
for (j in 1:69) {
station <- sample(unique(Ref_EIG4a$Stations), 1)
stationname <- sample(unique(Ref_EIG4a$stationsnamen[Ref_EIG4a$Stations == station]), 1)
hol <- sample(unique(Ref_EIG4a$stationsnamen_hol[Ref_EIG4a$stationsnamen == stationname]), 1)
if (j==1) {
picked_bqi <- (Ref_EIG4a$bqi[Ref_EIG4a$stationsnamen_hol == hol])
}
else {
picked_bqi <- c(picked_bqi, Ref_EIG4a$bqi[Ref_EIG4a$stationsnamen_hol == hol])
}
}
if (i==1) {
Q20_BQI <- quantile(picked_bqi, probs = 0.2)
}
else {
Q20_BQI <- c(Q20_BQI, quantile(picked_bqi, probs = 0.2))
}
}
MD_Q20_Boot_Ref_EIG4a <- median(Q20_BQI) #Median berechnen