How to change data from discrete to continuous in R?

475 Views Asked by At

I have two columns of my "Review" dataset that are causing me issues - one "Year" has years formatted like "2001/02". The other "Hour" has the hour of the day formatted like "01-02". Whenever I try and use these columns in graphs, I see "Error: Discrete value supplied to continuous scale". How to I fix this? Sorry if the answer is obvious, I am a total beginner and couldn't find the answer anywhere else.

Here is my code for my "Year" column:

ggplot(review_data, aes(x = YEAR, colour = CAUSE)) +
  geom_point() +
  geom_line() +
  labs(title = "Incidents",
       subtitle = "By year and cause",
       x = NULL,
       y = "Cause") +
  scale_colour_brewer(palette = "Dark2", 
                      labels = c(),
                      name = NULL) +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

And for my "Date" column:

ggplot(review_data) = mapping = aes(x = HOUR, fill = NUMBER) +
  geom_histogram(binwidth = 1, colour = "black") +
  scale_fill_brewer(palette = "Dark2", name = NULL) +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  labs(title = "Number per hour")
1

There are 1 best solutions below

2
uke On

The problem is probably that your computer does not recognize date and hour as a date or hour, but it rather interprets them as plain text (character) variables. As you dive deeper into R, it is worth educating yourself about about its object types and classes.

Speaking simply, the class of an object defines what is allowed to do with it and how it is processed.

You can check the class of your variable with: class(review_data$YEAR).

If I guessed your problem right, it will say "character". This is a class that is used for text, and also as a fallback for everything else. Text is discrete by nature and therefore, it is very good behaviour of ggplot not to allow continuos calculations with it.

The solution is to convert your variables into a suitable class which tells the computer that it represents date or time information and hopefully ggplot will then understand this information.

Here is the conversion process:

For the Year/Month Variable:

To convert from "2001/02" format into a date, consult this question where the possibilities are covered in detail: Converting year and month ("yyyy-mm" format) to a date?

An easy way with the lubridate package would be:

review_data$year_month <- lubridate::ym(review_data$YEAR)

For the Hour/Minute variable:

using the lubridate package:

review_data$hour_minute <- lubridate::hm(review_data$HOUR)