Code for weighted adjacency matrix from df with 8 columns of string data?

Question

Code for weighted adjacency matrix from df with 8 columns of string data?

77 Views Asked by AMM At 26 September 2023 at 00:01

I really need help with code to create a weighted adjacency matrix from a dataset; some rows contain 1 or 2 ingredients, but others have more (up to 8). The resulting matrix will likely be upwards of 16x16 based on the number of unique ingredients in the dataset.

My data currently looks like the example below (but with different information). What column an ingredient shows up in is not important for the purposes of this network analysis but the co-occurrences and weighting are.

name1	name2	name3	name4	name5	name6	name7	name8
pineapple	sugar	mango	water	salt	blueberry
pineapple	asca
sugar	pineapple	water	lime
lime	asca	pepper	salt	water
blueberry	pineapple	water	salt	strawberry	banana	asca	sugar
mango

How do I write the code so that it will find all the co-occurrences/edges from all the columns, and not just the first two columns? That's one issue I'm having with trying to do the adjacency matrix from this data directly in R. I also need to preserve the names for the nodes (ingredients) so that when I create my network graph, the names will show up and not numbers, another issue I've had.

I have solid code that creates the network graph from an adjacency matrix for this new project, but previously I manually calculated the weighted adjacency matrix for a sample set as I was on a tight deadline.

Original Q&A

There are 2 best solutions below

ThomasIsCoding On 26 September 2023 at 09:59

I guess you can create a incidence matrix

> table(unlist(df), c(row(df)))

             1 2 3 4 5 6
  asca       0 1 0 1 1 0
  banana     0 0 0 0 1 0
  blueberry  1 0 0 0 1 0
  lime       0 0 1 1 0 0
  mango      1 0 0 0 0 1
  pepper     0 0 0 1 0 0
  pineapple  1 1 1 0 1 0
  salt       1 0 0 1 1 0
  strawberry 0 0 0 0 1 0
  sugar      1 0 1 0 1 0
  water      1 0 1 1 1 0

or an adjacency matrix

> tcrossprod(table(unlist(df), c(row(df))))

             asca banana blueberry lime mango pepper pineapple salt strawberry
  asca          3      1         1    1     0      1         2    2          1
  banana        1      1         1    0     0      0         1    1          1
  blueberry     1      1         2    0     1      0         2    2          1
  lime          1      0         0    2     0      1         1    1          0
  mango         0      0         1    0     2      0         1    1          0
  pepper        1      0         0    1     0      1         0    1          0
  pineapple     2      1         2    1     1      0         4    2          1
  salt          2      1         2    1     1      1         2    3          1
  strawberry    1      1         1    0     0      0         1    1          1
  sugar         1      1         2    1     1      0         3    2          1
  water         2      1         2    2     1      1         3    3          1

             sugar water
  asca           1     2
  banana         1     1
  blueberry      2     2
  lime           1     2
  mango          1     1
  pepper         0     1
  pineapple      3     3
  salt           2     3
  strawberry     1     1
  sugar          3     3
  water          3     4

**jblood94** · Accepted Answer · 2023-09-26T11:10:35.057000

If the row-wise incidents are desired, you can modify the answer by @ThomsIsCoding:

m <- tcrossprod(table(stack(as.data.frame(t(df))))[-1,])
m
#>             values
#> values       asca banana blueberry lime mango pepper pineapple salt strawberry sugar water
#>   asca          3      1         1    1     0      1         2    2          1     1     2
#>   banana        1      1         1    0     0      0         1    1          1     1     1
#>   blueberry     1      1         2    0     1      0         2    2          1     2     2
#>   lime          1      0         0    2     0      1         1    1          0     1     2
#>   mango         0      0         1    0     2      0         1    1          0     1     1
#>   pepper        1      0         0    1     0      1         0    1          0     0     1
#>   pineapple     2      1         2    1     1      0         4    2          1     3     3
#>   salt          2      1         2    1     1      1         2    3          1     2     3
#>   strawberry    1      1         1    0     0      0         1    1          1     1     1
#>   sugar         1      1         2    1     1      0         3    2          1     3     3
#>   water         2      1         2    2     1      1         3    3          1     3     4

Set the main diagonal to 0, if you want.

diag(m) <- 0
m
#>             values
#> values       asca banana blueberry lime mango pepper pineapple salt strawberry sugar water
#>   asca          0      1         1    1     0      1         2    2          1     1     2
#>   banana        1      0         1    0     0      0         1    1          1     1     1
#>   blueberry     1      1         0    0     1      0         2    2          1     2     2
#>   lime          1      0         0    0     0      1         1    1          0     1     2
#>   mango         0      0         1    0     0      0         1    1          0     1     1
#>   pepper        1      0         0    1     0      0         0    1          0     0     1
#>   pineapple     2      1         2    1     1      0         0    2          1     3     3
#>   salt          2      1         2    1     1      1         2    0          1     2     3
#>   strawberry    1      1         1    0     0      0         1    1          0     1     1
#>   sugar         1      1         2    1     1      0         3    2          1     0     3
#>   water         2      1         2    2     1      1         3    3          1     3     0

Data:

df <- data.table::fread("name1  name2   name3   name4   name5   name6   name7   name8
               pineapple    sugar   mango   water   salt    blueberry       
               pineapple    asca                        
               sugar    pineapple   water   lime                
               lime asca    pepper  salt    water           
               blueberry    pineapple   water   salt    strawberry  banana  asca    sugar
               mango                            ")

Code for weighted adjacency matrix from df with 8 columns of string data?

There are 2 best solutions below

Related Questions in R

Related Questions in IGRAPH

Related Questions in ADJACENCY-MATRIX

Trending Questions

Popular # Hahtags

Popular Questions