Function to determine "pretty" cuts for an arbitrary sequence

46 Views Asked by At

I want to cut an arbitrary sequence of numbers into intervals using cut. The limits should be derived automatically by providing just the number of intervals I want to have in the end.

The interval limits should be "nice" i.e. typically multiple of 1, 2, 2.5, 5 times a power of 10.

Function pretty does a decent job of determining "pretty" ticks for an axis.

set.seed(14112023)
x <-runif(200, 0, 100)
pretty(x, 5)
#   0  20  40  60  80 100

But by design it uses equidistant cuts:

pretty(c(x, 10000), 5L)
# [1]     0  2000  4000  6000  8000 10000

which is totally fine for what it is meant to do (axis ticks) but is less useful for my use case as the intervals become very unevenly filled:

y <- c(x, 10000)
table(cut(y, pretty(y, 5L)))

#     (0,2000] (2000,4000] (4000,6000] (6000,8000] (8000,10000] 
#          200           0           0           0            1

I could use quantile to get nicely distributed limits, but the values are "ugly":

round(quantile(y, seq(0, 1, length.out = 5)), 2)
#       0%      25%      50%      75%     100% 
#     0.61    32.61    58.47    79.91 10000.00 

Thus, ideally I am looking for a function which takes a range of values (like the quantiles) and looks for the next "nice" number, e.g.:

pretty_limits(quantile(y, seq(0, 1, length.out = 5)))
# [1] 0 25 50 75 10000

table(cut(y, c(0, 25, 50, 75, 10000)))
#     (0,25]    (25,50]    (50,75] (75,10000] 
#         36         49         56         60

Here, I have an even distribution with nice limits. My data will be potentially skewed, so I don't care if the buckets are not perfectly evenly distributed, but I don't want an extreme case where only 2 buckets are filled.

N.B. I am totally aware that "nice" is a rather vague concept, but I can live very well with the definition of pretty and want to achieve a similar functionality for cuts rather than equidistant ticks on an axis. Also, the choice of my nice values is subjective, 0 20 40 60 80 10000 would be another valid choice. Any heuristic will be fine as long as the cuts are mutiple of 1, 2, 5 times a power of 10 (just as pretty does).

0

There are 0 best solutions below