Normalize a many-to-many table

52 Views Asked by At

I have an input data.frame combined containing a many-to-many relationship that I would like to normalize into 2 tables. This input table contains the mixture composition for several different samples and locations.

I would like to derive 2 tables from this: one (goal_1) with location, sampleid, and a mixtureid; the second table should contain the actual mixture composition (goal_2).

library(tidyverse) 

goal_1 = tribble(
  ~sampleID, ~location, ~MixtureID,
  1, "A", 1,
  2, "B", 2,
  3, "C", 3,
  4, "A", 4,
  5, "A", 1,
  6, "B", 2,
  7, "B", 2
)

goal_2 = tribble(
  ~MixtureID, ~element, ~conc_pct,
  1, "He", 0,
  1, "H", 10,
  1, "C", 0,
  1, "O", 0,
  1, "N", 0,
  1, "Ca", 90,
  1, "Cs", 0,
  2, "Si", 0,
  2, "S", 100,
  2, "V", 0,
  3, "Nb", 100,
  3, "Fe", 0,
  4, "C", 20,
  4, "H", 10,
  4, "S", 70
);

combined = left_join(goal_1, goal_2, by = "MixtureID", relationship = "many-to-many") |>
  select(-MixtureID) 

Basically I want to reverse the operation of combined = left_join(...)

I can generate the goal_1 table partially with:

goal_1a = distinct(combined, sampleID, location)

However I'm stuck on how to derive the goal_2 table from the combined table.

1

There are 1 best solutions below

1
Michael On
  1. If this is a many-to-many relationship, shouldn't this technically be a full_join instead of a left_join? (In this case, it seems to result in the same data)
combined <- full_join( goal_1, goal_2, 
    by = "MixtureID", relationship = "many-to-many" )
  1. Using your existing left_join data set, then to normalize your combined data set, I would suggest you try something like this:
( table_1 <- combined |> distinct( sampleID, location ))
( table_2 <- combined |> distinct( sampleID, element, conc_pct ))

Then you could join the two and get the same 28 rows:

table_1 |> 
    inner_join( table_2, by = 'sampleID')