Generating a normally distributed random variable that has range [1, 3] in R

208 Views Asked by At

I want to generate a normally distributed random variable that has range [1, 3].

Specifically, I tried the following R code:

x1 <- runif(100, 1, 2)
x2 <- rnorm(100, 0, 0.3)

V <- 1 + x1 + x2

Then, V follows a normal distribution (conditional on x1) and is roughly concentrated on [1, 3].

But, I want to make V to have range [1, 3]. That is, all elements should be in [1, 3], not roughly on [1, 3]:

min(V)
[1] 1
max(V)
[1] 3

I have no idea how to do. Is there a technique for this task?

3

There are 3 best solutions below

4
Allan Cameron On BEST ANSWER

Since the support of any normal distribution is the whole real number line, the only way to get what you are looking for is to draw a sample and then normalize it into your specified range. As r2evans points out, there are theoretical problems with any such approach. However, a simple implementation of it would be

rnorm_limits <- function(n, min = 1, max = 3) {
  x <- rnorm(n)
  x <- (max - min) * x/diff(range(x))
  return(x - min(x) + min)
}

Testing, we have:

set.seed(1)

hist(rnorm_limits(100))

And of course the range will be exactly that specified:

range(rnorm_limits(100))
#> [1] 1 3
0
uke On

Here is another approach, but you would have to sacrifice the assumption that always 100% of values lay in [1, 3], and be satisfied with something like, at least 99.99% of values lay in [1, 3].

This is different from rescaling a standard normal distribution to always fit into [1, 3] completely, because the resulting distribution is going to have varying width (standard deviation), depending on whether there was an "outlier" or not in the random generation of the standard normal distribution. The whole distribution would be rescaled to make the outlier lay between [1, 3].

My approach is about setting the width (standard deviation) in a way, that for an infinitely large sample, e.g. 99.99% of values lay in between [1, 3].

You would have to reformulate the range as a percentage of the area under the normal distribution that you wish to cover. An area of 100% is always resulting in a range from -infinity to +infinity. So you have to step down on the area of the normal distribution that you want to be covered between [1, 3]. Let's say you want 99% of the area to be between 1 and 3.

You would have to use the sd argument to supply a standard deviation to rnorm() which is defining the normal distribution in a way that 99% of the area is between 1 and 3.

How to calculate that specific standard deviation? We can use qnorm() to get the limit value of a certain area. This area is ranging from -infinity to p. When we put p = 0.005, we say: Give me the x value left of which lays 0.05% of the distribution.

As the normal distribution is symmetrical, we determine p by halving what is leftover by 99% of 100% = 1 %. We split 1% in half and say: 0.5% should lay below 1 and 0.5% should lay above 3.

All that is left to supply the mean of our distribution to the qnorm(). It should be the center of your given range. In your case, this is 2.

So we know the desired outcome of qnorm(p = 0.005, mean = 2, sd = ???): It should be 1. We have to set sd in a way that the result is 1.

I did this with trial and error, approximating 1 and got to this point:

qnorm(0.005, mean = 2, sd = 0.388223)
#> 1.000004

So, in turn:

rnorm(mean = 2, sd = 0.388223, n = 100)

should give you random values of which ~99% fall in between the range [1, 3].

histogram of normal distribution ranging from 1 to 3

You could go more extreme by saying 99.99% should be inside [1, 3], approximating your goal of 100%.

  • p = (100% - 99.99%)/2 = 0.01% / 2 = 0.005% = 0.00005
  • same steps as above, optimize sd so that
qnorm(0.00005, mean = 2, sd = ???) == 1
  • pluck sd into your rnorm() call.

For the 99.99% example, sd would be ~ 0.25703. This is likely a bit extreme, because the values are more like [1.4, 2.8] then, but if you want to be sure, you can choose it. This would be suitable for large samples.

rnorm(mean = 2, sd = 0.25703, n = 10000) |> hist()

histogram of normal distribution with 10000 observations range 1 to 3 and sd = 0.25...

0
Stéphane Laurent On

If you want a distribution with a bell-shaped density function centered at 2 and with endpoints at 1 and 3, you can construct it from a Beta(a,a) distribution:

a <- 4
simulations <- 2*rbeta(100, a, a) + 1

By increasing the value of a, the distribution becomes more concentrated around 2.

Here is the theoretical density:

a <- 4
x <- seq(1, 3, length.out = 200)
y <- dbeta((x-1)/2, a, a)/2
plot(x, y, type = "l")

enter image description here