How to iteratively create matrices/vectors from columns/unique row values of dataframe, and pass them to subsequent code?

107 Views Asked by At

I want to iterate a chunk of code across columns in a dataframe, to create a results matrix. I am getting stuck with how to iteratively create new objects and matrices using/named after values in a certain column (Site) in my dataframe, to then pass through to the following lines of code.

The code I am wanting to iterate is:

SiteI1_data = data %>% subset(Site == I1) %>%select(x_coord, y_cood) 
SiteI1_dists <- as.matrix(dist(SiteI1_data))
SiteI1_dists.inv <- 1/SiteI1_dists
diag(SiteI1_dists.inv) <- 0

Resid_taxa1_siteI1 <- as.vector( Resid_data %>% subset(Site == I1) %>% select(Abundance_Taxa1) )

Moran.I(Resid_taxa1_siteI1, SiteI1_dists.inv)

The "Site" names and coordinates for the first chunk of code are in the following dataframe (data):

Sample Site Treatment x_coord y_coord Abundance_Taxa_1 ... Abundance_Taxa_n
I1S1 I1 Inside 140.00 -29.00 56 ... 0
O2S1 O2 Outside 141.00 -28.00 3 ... 100
O2S2 O2 Outside 141.10 -28.10 5 ... 4

The "Abundance_Taxa_" values are stored with the associated "Site" names in a separate dataframe (Resid_data):

Site Abundance_Taxa_1 ... Abundance_Taxa_n
O1 -0.5673 --- 1.1579
I1 -0.6666 --- 1.2111

There are multiple "Samples" for each "Site". The first four lines of code aim to use the x and y coordinates to create an inverse distance weights matrix for each site.

The fifth line aims to create a vector of the values in Resid_data for each site and taxa.

The final line of code uses the taxa/site vector and the previously created distance matrix to calculate Moran's I for each site and taxa. My desired outcome is to produce the following matrix:

Site Treatment MoranI_Taxa_1 ... MoranI_Taxa_n
I1 Inside 0.1 ... 0.2
O2 Outside 0.5 ... 0.01

I am not sure how to write this to

  • iteratively create a distance matrix for each site, named after that site, to later pass through to the final line
  • iterate the 5th line of code for each site and taxa in Resid_data (i.e., all columns beginning with "Abundance_Taxa_"), and create a vector for each
  • iterate the final line for each site (e.g., for site I1, it would repeat as:
Moran.I(Resid_taxa1_siteI1, SiteI1_dists.inv)
Moran.I(Resid_taxa2_siteI1, SiteI1_dists.inv)

until it has completed for all vectors of residual values for that site).

The number of Site values is relatively small (12), but the number of "Abundance_Taxa" columns that I need to iterate through is large (~2400). Most of what I have found online has been writing very simple 'for' loops. As such, I don't even really know where to start with this.

1

There are 1 best solutions below

2
Jan On

We first define two helper functions: calculateWeights corresponds to your first four lines of provided code and getResidualData reflects the fifth line.

library(dplyr)
library(tidyr)
library(ape)

calculateWeights <- function(df, site) {
  df |>
    filter(Site == site) |>
    select(x_coord, y_coord) |>
    dist() |>
    as.matrix() |>
    (\(.) 1 / replace(., . == 0, 1))()
}

getResidualData <- function(df, site, t) {
  Resid_data |>
    filter(Site == site) |>
    select(ends_with(paste0("_", t))) |> 
    pull()
}

Then your desired result can be calculated like this:

res <- lapply(unique(data$Site), function(site) {
  weight <- calculateWeights(data, site)
  resSite <- data.frame()
  lapply(1:(sum(grepl(
    'Taxa', colnames(data)
  ))), function(t) {
    x <- getResidualData(Resid_data, site, t)
    rbind(resSite, list(
      site = site,
      t = t,
      val = Moran.I(x, weight)$observed
    ))
  }) |> bind_rows()
}) |> bind_rows() |>
  mutate(Treatment = case_match(substring(site, 1, 1), "I" ~ "Inside", "O" ~ "Outside")) |>
  pivot_wider(names_from = t,
              names_glue = "MoranI_Taxa_{t}",
              values_from = val)

This would for example look like

> res
# A tibble: 2 × 12
  site  Treatment MoranI_taxa_1 MoranI_taxa_2 MoranI_taxa_3 MoranI_taxa_4
  <chr> <chr>             <dbl>         <dbl>         <dbl>         <dbl>
1 I1    Inside           -0.393        -0.136        0.0111        0.0191
2 O2    Outside          -0.116         0.262       -0.153        -0.639 
# ℹ 6 more variables: MoranI_taxa_5 <dbl>, MoranI_taxa_6 <dbl>, MoranI_taxa_7 <dbl>,
#   MoranI_taxa_8 <dbl>, MoranI_taxa_9 <dbl>, MoranI_taxa_10 <dbl>

using the sample data

> dput(data)
structure(list(Sample = c("I1S1", "O2S1", "O2S2", "O2S3", "O2S4", 
"I1S2", "I1S3", "I1S4"), Site = c("I1", "O2", "O2", "O2", "O2", 
"I1", "I1", "I1"), Treatment = c("Inside", "Outside", "Outside", 
"Outside", "Outside", "Inside", "Inside", "Inside"), x_coord = c(140, 
141, 141.1, 139.9, 139.4, 141.2, 140.5, 139.8), y_coord = c(-29, 
-28, -28.1, -28.5, -29.1, -28.9, -28.3, -29.2), Abundance_Taxa_1 = c(42L, 
46L, 24L, 93L, 30L, 45L, 100L, 39L), Abundance_Taxa_2 = c(52L, 
85L, 53L, 43L, 97L, 26L, 62L, 6L), Abundance_Taxa_3 = c(58L, 
23L, 42L, 41L, 60L, 45L, 82L, 85L), Abundance_Taxa_4 = c(33L, 
11L, 45L, 14L, 2L, 98L, 35L, 28L), Abundance_Taxa_5 = c(45L, 
16L, 80L, 100L, 8L, 72L, 37L, 87L), Abundance_Taxa_6 = c(10L, 
60L, 75L, 91L, 23L, 33L, 86L, 15L), Abundance_Taxa_7 = c(68L, 
60L, 10L, 72L, 95L, 92L, 45L, 84L), Abundance_Taxa_8 = c(55L, 
48L, 8L, 96L, 3L, 99L, 75L, 13L), Abundance_Taxa_9 = c(18L, 85L, 
5L, 31L, 56L, 20L, 82L, 67L), Abundance_Taxa_10 = c(8L, 19L, 
10L, 79L, 61L, 12L, 35L, 52L)), class = "data.frame", row.names = c(NA, 
-8L))

> dput(Resid_data)
structure(list(Site = c("O2", "I1", "I1", "O2", "I1", "I1", "O2", 
"O2"), Abundance_Taxa_1 = c(0.77, -0.68, -0.33, -0.19, -0.39, 
0, -0.02, -0.59), Abundance_Taxa_2 = c(-0.45, 0.52, 0.66, -0.87, 
-0.4, -0.1, -0.27, 0.5), Abundance_Taxa_3 = c(-0.27, 0.48, 0.68, 
0.66, -0.31, 0.79, 0.55, -0.93), Abundance_Taxa_4 = c(0.58, -0.11, 
0.89, -0.77, -0.64, -0.43, -0.18, -0.17), Abundance_Taxa_5 = c(-0.9, 
-0.1, -0.3, 0.83, 0.05, 0.71, 0.17, 0.31), Abundance_Taxa_6 = c(0.58, 
0.79, -0.66, 0.88, 0.97, -0.36, 0.75, 0.21), Abundance_Taxa_7 = c(-0.96, 
-0.84, -0.58, 0.92, -0.68, -0.19, -0.34, -0.32), Abundance_Taxa_8 = c(-0.62, 
0.97, -0.88, -0.68, 0.19, 0.77, 0.37, -0.23), Abundance_Taxa_9 = c(0.05, 
-0.03, 0.23, -0.2, -0.99, -0.14, -0.1, -0.61), Abundance_Taxa_10 = c(0.94, 
0.61, 0.76, 0.27, 0.17, -0.34, 0.7, -0.43)), class = "data.frame", row.names = c(NA, 
-8L))