Modification needed to handle missing data cell (containg dash '-')

42 Views Asked by At

I have created a function called create_bar_plot, which takes the Excel file as input, followed by modification of the data frame and creation of a stacked bar plot.

 create_bar_plot <- function(file_path) {
  # Read the Excel file
  excel_data <- read_excel(file_path)
# Extract required columns
  states <- excel_data$State
  fifth_percentile <- excel_data$`5th Percentile`
  mean <- excel_data$Mean
  ninety_fifth_percentile <- excel_data$`95th Percentile`
  threeColData <- data.frame(fifth_percentile, mean, ninety_fifth_percentile) 
#ValueData code
ValueData <- c(rbind(threeColData$fifth_percentile,threeColData$mean,threeColData$ninety_fifth_percentile))
# Create plot_data dataframe
plot_data <- data.frame(State = rep(states, each = 3),
    Metric = factor(rep(c("5th Percentile", "Mean", "95th Percentile"), times = 13),
    levels = c("95th Percentile", "Mean", "5th Percentile")),
    Value = ValueData
  )
  plot_data <- plot_data |>
  arrange(desc(Metric)) |>
  mutate(value_bar = Value - lag(Value, default = 0), .by = State) |>
  arrange(State)
options(repr.plot.width=10, repr.plot.height=6)

allStatePlot = ggplot(plot_data, aes(x = State, y = value_bar, fill = Metric)) +
    geom_col(color = "white", linewidth = .25, width = .75 ) +
    geom_text(aes(label = Value), position = position_stack(vjust = 0.5)) +
    scale_fill_manual( values = c( "5th Percentile" = "forestgreen", "Mean" = "pink",
        "95th Percentile" = "orange") ) +
    scale_y_continuous( expand = c(0, 0, .05, 0), limits = \(x) {
        c(0, range(scales::breaks_extended(only.loose = TRUE)(x))[2])
      }) +
    labs(x = " ",  y = " ", fill = " ") + theme_bw() +
    theme(panel.grid = element_blank(),
      axis.text.x = element_text(angle = 45, hjust = 1, size = 14, face = "bold"),
      axis.text.y = element_text(size = 12, face = "bold"),
      legend.position = "top",
      legend.text = element_text(size = 15, face = "bold")
    )
    print(allStatePlot)

}

 

The above code works very fine for the Excel file containing all the row data, but some Excel files have missing data, the missing place is filled with dash '-'.

Below I have pasted the tabled from one excel file

State   Sample Size Mean    S.D.    Range   5th Percentile  95th Percentile
All India   3423    218 46  93-419  142 294
Arunachal Pradesh   -   -   -   -   -   -
Gujarat 75  227 43  148-299 157 297
Jammu & Kashmir 485 239 43  138-375 167 310
Madhya Pradesh  825 227 43  105-419 156 298
Maharashtra 1249    202 45  93-391  128 276
Meghalaya   -   -   -   -   -   -
Mizoram -   -   -   -   -   -
Orissa  171 236 44  131-335 155 298
Punjab  -   -   -   -   -   -
Uttar Pradesh   -   -   -   -   -   -
Tamil Nadu  618 219 46  116-375 143 295
West Bengal -   -   -   -   -   -

In the above table, you can observe, that 'Arunachal Pradesh' , 'Meghalaya', 'Mizoram', 'Punjab', 'Uttar Pradesh', and 'West Bengal' don't have data instead the cell is filled with a dash '-'.

Now The above program is working when all the data (file without dash) is provided, but when the file has missing data(dash) it is not working (plot is not created).

I need a modified version of the code such that, the function 'create_bar_plot' creates the plot for the file with and without data. If the file doesn't have data for a certain state it should exclude those states and create a plot for the state containing data. KINDLY HELP ME

Table of the excel file with missing value

The structure of data is below

    str(PULL_STRENGTH_BOTH_HANDS_STANDING)
tibble [13 × 7] (S3: tbl_df/tbl/data.frame)
 $ State          : chr [1:13] "All India" "Arunachal Pradesh" "Gujarat" "Jammu & Kashmir" ...
 $ Sample Size    : chr [1:13] "3423" "-" "75" "485" ...
 $ Mean           : chr [1:13] "218" "-" "227" "239" ...
 $ S.D.           : chr [1:13] "46" "-" "43" "43" ...
 $ Range          : chr [1:13] "93-419" "-" "148-299" "138-375" ...
 $ 5th Percentile : chr [1:13] "142" "-" "157" "167" ...
 $ 95th Percentile: chr [1:13] "294" "-" "297" "310" ...
0

There are 0 best solutions below