ggplot() Area Chart doesn't Represent Data Correctly

33 Views Asked by At

I have data which I have summarized to a high level which I downloaded from UN Comtrade https://comtradeplus.un.org/TradeFlow

the ns_eu_category variable is my own division of the world into these regions: "East Asia and Pacific", "Global North", "Latin America and Caribbean", "Middle East and North Africa", "Non-EU Former Soviet Bloc Countries", "South Asia", and "Sub-Saharan Africa". I don't think this is the source of the problem so we can ignore what the exact divisions are for now.

> longterm_trade_data
# A tibble: 7,364 × 6
   ns_eu_category         year sitc_code import_or_export      value sector                                  
   <chr>                 <dbl> <chr>     <chr>                 <dbl> <chr>                                   
 1 East Asia and Pacific  1962 0         Export            946694358 Food And Live Animals                   
 2 East Asia and Pacific  1962 0         Import            745286120 Food And Live Animals                   
 3 East Asia and Pacific  1962 1         Export             60846922 Beverages And Tobacco                   
 4 East Asia and Pacific  1962 1         Import             67321814 Beverages And Tobacco                   
 5 East Asia and Pacific  1962 2         Export           1479804622 Crude Materials, Inedible, Except Fuels 
 6 East Asia and Pacific  1962 2         Import            640428682 Crude Materials, Inedible, Except Fuels 
 7 East Asia and Pacific  1962 3         Export            482623764 Mineral Fuels, Lubric. And Related Mtrls
 8 East Asia and Pacific  1962 3         Import            416870707 Mineral Fuels, Lubric. And Related Mtrls
 9 East Asia and Pacific  1962 4         Export             66775599 Animal And Vegetable Oils,Fats And Waxes
10 East Asia and Pacific  1962 4         Import             42687574 Animal And Vegetable Oils,Fats And Waxes
# ℹ 7,354 more rows
# ℹ Use `print(n = ...)` to see more rows

I take these aggregated statistics and turn the value of each trade sector into a percentage so I can put it in an area graph:

trade_data_sector <- longterm_trade_data %>%
  group_by(ns_eu_category, year, import_or_export) %>%
  mutate(total_of_sectors = sum(value)) %>%
  ungroup() %>%
  drop_na() %>%
  mutate(percent = value / total_of_sectors)

I try to produce an area graph

# "East Asia and Pacific"               "Global North"                        
# "Latin America and Caribbean"         "Middle East and North Africa"        "Non-EU Former Soviet Bloc Countries"
# "South Asia"                          "Sub-Saharan Africa"   
region <- "Sub-Saharan Africa"   
ix <- "Export"

trade_data_sector %>%
  mutate(truncated_name = sector %>% substr(0L, 10L),
         descriptor = paste0(sitc_code, ": ", truncated_name)) %>%
  filter(ns_eu_category == region, import_or_export == ix) %>%
  ggplot(aes(x = year, y = percent, fill = descriptor)) + 
  geom_area() +
  theme_minimal() + 
  labs(title = paste0(import_or_export, "s in ", region, " Over Time"), 
       caption = "Source: UN COMTRADE Database 1962-2023") +
  scale_y_continuous(breaks = seq(from = 0, to = 1, by = 0.1), labels = scales::percent, limits = c(0, 1)) +
  scale_x_discrete(limits = 1962:2023, expand = c(0,0)) +
  theme(
    # panel.grid.major.y = element_line(color = "dark gray", linewidth = 0.1, linetype = "dashed"),
    # panel.grid.major.x = element_blank(),
    axis.ticks.x=element_line(linewidth=0.2),
    axis.text.x = element_text(size = 6, family=my_font, angle=-90, vjust=0.5),
    axis.title.x = element_text(size = 8, family=my_font),
    axis.text.y=element_text(size = 6, family=my_font),
    # axis.ticks.y=element_line(), 
    axis.title.y = element_text(size = 8, family=my_font),
    panel.grid = element_blank(),
    legend.position="bottom",
    plot.title = element_text(size = 12, family=my_font),
    plot.subtitle = element_text(size = 10, family=my_font),
    legend.title = element_text( size=8, family=my_font),
    legend.text = element_text( size=8, family=my_font),
    strip.text = element_text(size=8, family=my_font),
    legend.key.size = unit(0.3, "cm"),
    plot.caption = element_text(size = 7, color="dark gray", family=my_font)
  )

The result is this:

Graph of Exports from Sub-Saharan Africa

Note: the data from 2010-2022 is missing right now to ignore that part of the graph.

Not only does it look a lot more erratic than it should be. There are entire sections where SITC Code 0: Food and Live Animals just disappears. But As we can see in the following graph, there's never a time when this amount was zero

sector_code <- "0"

trade_data_sector %>%
  filter(ns_eu_category == region, import_or_export == ix, sitc_code == sector_code) %>%
  ggplot(aes(x = year, y = value)) +
  geom_line() +
  theme_minimal() +
  labs(title =  paste0(import_or_export, "s in ", region, " Over Time (Sector ", sector, ")"),
       caption = "Source: UN COMTRADE Database 1962-2023") +
  scale_x_discrete(limits = 1962:2022) +
  # scale_y_continuous(breaks = seq(from = 0, to = 600, by = 100), limits=c(0,700)) +
  theme(
    panel.grid.major.y = element_line(color = "dark gray", linewidth = 0.1, linetype = "dashed"),
    # panel.grid.major.x = element_blank(),
    # axis.ticks.x=element_blank(),
    axis.text.x = element_text(size = 6, family=my_font, angle=-90, vjust=0.5),
    axis.title.x = element_text(size = 8, family=my_font),
    axis.text.y=element_text(size = 6, family=my_font),
    # axis.ticks.y=element_line(),
    axis.title.y = element_text(size = 8, family=my_font),
    panel.grid = element_blank(),
    legend.position="bottom",
    plot.title = element_text(size = 10, family=my_font),
    plot.subtitle = element_text(size = 8, family=my_font),
    legend.title = element_text( size=8, family=my_font),
    legend.text = element_text( size=8, family=my_font),
    strip.text = element_text(size=8, family=my_font),
    legend.key.size = unit(0.3, "cm"),
    plot.caption = element_text(size = 7, color="dark gray", family=my_font)
  )

Percentage of exports in Sector 0

What could be causing this? This isn't just happening with Sub-Saharan Africa, these gaps appear for other regions of the world as well.

1

There are 1 best solutions below

0
saladmobster On

The problem was the result of setting limits = c(0, 1) within scale_y_continuous. By removing the limits the graph now looks normal