How to get consistent decimal points for interval bounds using `cut()

38 Views Asked by At

How does one get consistent interval bound formatting when using cut() in R?

For example, the following interval bounds format will vary between 2 and 5 decimal points.

# Sample data
set.seed(1)
data <- runif(600, -1, 1.5)

intervals <- 15
# Create intervals with three decimal points
intervals <- cut(data, seq.int(min(data), max(data), length.out = intervals+1), include.lowest = TRUE)

# Display the intervals
intervals 

What I want is the interval bound formatting to be consistent with 3 decimal points. The closest I get is by introducing rounded breaks with 3 decimal points:

rounded_breaks <- round(
 seq(min(data), max(data), length.out = intervals + 1),
 3)
intervals <- cut(data, breaks = rounded_breaks, include.lowest = TRUE)

Although rounded_breaks holds values with 3 decimal points, cut() seems to drop the third decimal point if it is a 0, therefore rendering the format of the bounds to 2 decimal points.

How can this be adjusted so that the 0 is still shown in the third decimal place of the intervals?

1

There are 1 best solutions below

0
MrFlick On

You can extract and re-format the numbers in the labels. Here's one way to do that

m <- gregexpr(r"{-?\d+\.\d+}", levels(intervals))
nums <- regmatches(levels(intervals), m)
regmatches(levels(intervals), m) <- lapply(nums, \(x) sprintf("%0.3f", as.numeric(x)))
levels(intervals)
#  [1] "[-0.995,-0.830]" "(-0.830,-0.664]" "(-0.664,-0.498]" "(-0.498,-0.333]"
#  [5] "(-0.333,-0.167]" "(-0.167,-0.001]" "(-0.001,0.165]"  "(0.165,0.330]"  
#  [9] "(0.330,0.496]"   "(0.496,0.662]"   "(0.662,0.827]"   "(0.827,0.993]"  
# [13] "(0.993,1.160]"   "(1.160,1.320]"   "(1.320,1.490]" 

We use regular expressions to find the numbers in the labels and use sprintf to format them with a certain number of decimal places.