I have created a function called create_bar_plot, which takes the Excel file as input, followed by modification of the data frame and creation of a stacked bar plot.
create_bar_plot <- function(file_path) {
# Read the Excel file
excel_data <- read_excel(file_path)
# Extract required columns
states <- excel_data$State
fifth_percentile <- excel_data$`5th Percentile`
mean <- excel_data$Mean
ninety_fifth_percentile <- excel_data$`95th Percentile`
threeColData <- data.frame(fifth_percentile, mean, ninety_fifth_percentile)
#ValueData code
ValueData <- c(rbind(threeColData$fifth_percentile,threeColData$mean,threeColData$ninety_fifth_percentile))
# Create plot_data dataframe
plot_data <- data.frame(State = rep(states, each = 3),
Metric = factor(rep(c("5th Percentile", "Mean", "95th Percentile"), times = 13),
levels = c("95th Percentile", "Mean", "5th Percentile")),
Value = ValueData
)
plot_data <- plot_data |>
arrange(desc(Metric)) |>
mutate(value_bar = Value - lag(Value, default = 0), .by = State) |>
arrange(State)
options(repr.plot.width=10, repr.plot.height=6)
allStatePlot = ggplot(plot_data, aes(x = State, y = value_bar, fill = Metric)) +
geom_col(color = "white", linewidth = .25, width = .75 ) +
geom_text(aes(label = Value), position = position_stack(vjust = 0.5)) +
scale_fill_manual( values = c( "5th Percentile" = "forestgreen", "Mean" = "pink",
"95th Percentile" = "orange") ) +
scale_y_continuous( expand = c(0, 0, .05, 0), limits = \(x) {
c(0, range(scales::breaks_extended(only.loose = TRUE)(x))[2])
}) +
labs(x = " ", y = " ", fill = " ") + theme_bw() +
theme(panel.grid = element_blank(),
axis.text.x = element_text(angle = 45, hjust = 1, size = 14, face = "bold"),
axis.text.y = element_text(size = 12, face = "bold"),
legend.position = "top",
legend.text = element_text(size = 15, face = "bold")
)
print(allStatePlot)
}
The above code works very fine for the Excel file containing all the row data, but some Excel files have missing data, the missing place is filled with dash '-'.
Below I have pasted the tabled from one excel file
State Sample Size Mean S.D. Range 5th Percentile 95th Percentile
All India 3423 218 46 93-419 142 294
Arunachal Pradesh - - - - - -
Gujarat 75 227 43 148-299 157 297
Jammu & Kashmir 485 239 43 138-375 167 310
Madhya Pradesh 825 227 43 105-419 156 298
Maharashtra 1249 202 45 93-391 128 276
Meghalaya - - - - - -
Mizoram - - - - - -
Orissa 171 236 44 131-335 155 298
Punjab - - - - - -
Uttar Pradesh - - - - - -
Tamil Nadu 618 219 46 116-375 143 295
West Bengal - - - - - -
In the above table, you can observe, that 'Arunachal Pradesh' , 'Meghalaya', 'Mizoram', 'Punjab', 'Uttar Pradesh', and 'West Bengal' don't have data instead the cell is filled with a dash '-'.
Now The above program is working when all the data (file without dash) is provided, but when the file has missing data(dash) it is not working (plot is not created).
I need a modified version of the code such that, the function 'create_bar_plot' creates the plot for the file with and without data. If the file doesn't have data for a certain state it should exclude those states and create a plot for the state containing data. KINDLY HELP ME
The structure of data is below
str(PULL_STRENGTH_BOTH_HANDS_STANDING)
tibble [13 × 7] (S3: tbl_df/tbl/data.frame)
$ State : chr [1:13] "All India" "Arunachal Pradesh" "Gujarat" "Jammu & Kashmir" ...
$ Sample Size : chr [1:13] "3423" "-" "75" "485" ...
$ Mean : chr [1:13] "218" "-" "227" "239" ...
$ S.D. : chr [1:13] "46" "-" "43" "43" ...
$ Range : chr [1:13] "93-419" "-" "148-299" "138-375" ...
$ 5th Percentile : chr [1:13] "142" "-" "157" "167" ...
$ 95th Percentile: chr [1:13] "294" "-" "297" "310" ...
