R: Color scatterplot using a variable from a different data frame

46 Views Asked by At

I have two dataframes:

farm_production <- data.frame (
  year = c(seq(1980,2000)),
  "n11" = c(seq(80,200,length.out=21)),
  "n26" = c(seq(110,180,length.out=21)),
  "n31" = c(seq(150,56,length.out=21)),
  "n48" = c(seq(200,160,length.out=21)),
  "n59" = c(seq(198,170,length.out=21)))

farm_info <- data.frame (
  ID = c("n11", "n26", "n31", "n48", "n59"),
  type = c("wheat", "wheat", "cereal", "hay", "hay"),
  country = c("Spain", "Greece", "Italy", "Spain", "Portugal"))

These two dataframes have in common cells with the same value (n11, n26, n31, n48, n59)

I plotted the production of these 5 farms over the years:

plot(farm_production$year, farm_production$n11, xlab = "Year", ylab = "Forage production (tons)", ylim = c(0, 200))
points(farm_production$year, farm_production$n26)
points(farm_production$year, farm_production$n31)
points(farm_production$year, farm_production$n48)
points(farm_production$year, farm_production$n59)

However, I want to color these points by "type" (3 levels: wheat, grain, hay), but this info is in the "farm_info" dataframe, how can I relate the info of one dataframe to another?

I am aware that I can probably do this manually, but keep in mind that this is just a small sample of a much larger dataframe with more than 100 rows and columns, so I am interested in finding a way to "automate" this process by relating the info in dataframe 1 (farm_production) to dataframe 2 (farm_info) to color these points by "type".

Any suggestions on how I can do this? Any help is greatly appreciated.

1

There are 1 best solutions below

4
the-mad-statter On BEST ANSWER

Having the data in this "wide" format will make plotting difficult.

I would start by transforming your farm_production dataframe to a tidy format and then join your farm_info data to create a single dataframe from which to plot.

During the data preparation, I would convert your type variable to a factor so that R might automatically assign colors.

Optionally, you might consider adding a legend.

farm_production <- data.frame (
  year = c(seq(1980,2000)),
  "n11" = c(seq(80,200,length.out=21)),
  "n26" = c(seq(110,180,length.out=21)),
  "n31" = c(seq(150,56,length.out=21)),
  "n48" = c(seq(200,160,length.out=21)),
  "n59" = c(seq(198,170,length.out=21)))

farm_info <- data.frame (
  ID = c("n11", "n26", "n31", "n48", "n59"),
  type = c("wheat", "wheat", "cereal", "hay", "hay"),
  country = c("Spain", "Greece", "Italy", "Spain", "Portugal"))

data <- merge(
  reshape(
    farm_production,
    varying = names(farm_production)[-1],
    v.names = "production",
    timevar = "farm",
    times = names(farm_production)[-1],
    direction = "long",
    sep = ""
  ),
  farm_info,
  by.x = "farm", 
  by.y = "ID"
)
data$id <- NULL
data$type <- factor(data$type)

plot(
  data$year, 
  data$production, 
  xlab = "Year", 
  ylab = "Forage production (tons)", 
  ylim = c(0, 200),
  col = data$type # R will automatically choose colors for factors
)

legend(
  x ="topleft",
  legend = levels(data$type), # labels for factor levels
  col = 1:3, # numeric representation of factor levels
  pch = 19,  # optionally change size of points
  cex = .7   # optionally change overall size of legend
)

Created on 2024-02-27 with reprex v2.1.0