Shape estimation in truncated Pareto in R

463 Views Asked by At
library(VGAM)
library(fitdistrplus)

fitdist(u_NI$k_u, 'truncpareto',
         start = list(lower=1,
                      upper=42016,
                      shape=1)) -> fit.k_u

length(u_NI$k_u) = 637594

I got this error:

<simpleError in optim(par = vstart, fn = fnobj, fix.arg = fix.arg, obs = data,     gr = gradient, ddistnam = ddistname, hessian = TRUE, method = meth,     lower = lower, upper = upper, ...): function cannot be evaluated at initial parameters>
Error in fitdist(u_NI$k_u, "truncpareto", start = list(lower = 1, upper = 42016,  : 
  the function mle failed to estimate the parameters, 
                with the error code 100
In addition: Warning messages:
1: In fitdist(u_NI$k_u, "truncpareto", start = list(lower = 1, upper = 42016,  :
  The dtruncpareto function should return a zero-length vector when input has length zero
2: In fitdist(u_NI$k_u, "truncpareto", start = list(lower = 1, upper = 42016,  :
  The ptruncpareto function should return a zero-length vector when input has length zero

Is the problem in the excessive size of the dataset or in the start parameters?

REPRODUCIBILE EXAMPLE:

library(VGAM)
library(fitdistrplus)

rtruncpareto(100,1,100,1.5) -> a
fitdist(a, "truncpareto",
        start = list(lower=1,
                     upper=100,
                     shape=1.5))

This is not going to work, and I don't understand why.

It seems it has a problem here:

argument 'lower' must be positive

2

There are 2 best solutions below

7
Ben Bolker On BEST ANSWER

Part of your problem is a more general issue, i.e. that for a truncated distribution the MLE of the boundary parameter(s) is in general equal to the observed min/max of the data set. So you should always be able to do at least as well by setting the values of the lower/upper bounds equal to the min/max (based on my experience trying this they have to be slightly below/above the observed bounds). (I also found I had to set lower = 0 to stop the algorithm from trying negative values of the shape parameter.)

library(VGAM); library(fitdistrplus)
set.seed(101)
rtruncpareto(100,1,100,1.5) -> a
eps <- 1e-8
fitdist(a, "truncpareto",
        start = list(shape=1.5),
        fix.arg = list(lower = min(a) - eps, upper = max(a) + eps),
        lower = 0)
Parameters:
      estimate Std. Error
shape 1.349885  0.1554436
Fixed parameters:
          value
lower  1.006844
upper 25.906577

An alternative to fitdist would be bbmle:

library(bbmle)
m1 <- mle2(a ~ dtruncpareto(lower = min(a) - eps,
                            upper = max(a) + eps,
                            shape = exp(lshape)),
           start = list(lshape = 0),
           data = data.frame(a),
         method = "BFGS")
exp(coef(m1))   
  lshape 
1.349884 

bbmle is a little bit more flexible and allows you to fit the shape parameter on the log scale, which is generally more robust (and makes the standard deviation estimates more useful). Using method = "BFGS" here because the default Nelder-Mead algorithm works poorly for 1-dimensional optimization; could also use method = "Brent" (which would be more efficient and robust) but would then need to provide explicit lower and upper bounds for the lshape parameter.

6
jde On

I am not exactly sure, but this might be what your after.

library(fitdistrplus)
library(VGAM)
library(ggplot2)

u_NI <- data.frame("k_u" = rtruncpareto(10000,
                                        lower = 1,
                                        upper = 100,
                                        shape = 1.5))

fit <- vglm(k_u ~ 1,
            truncpareto(lower=1, upper=max(u_NI$k_u) + 1),
            data = u_NI,
            trace = TRUE)

x <- seq(1, max(u_NI$k_u), length.out = 10000)
y <- dtruncpareto(x, shape = fit@coefficients[["(Intercept)"]],
                  lower=fit@extra$lower,
                  upper=fit@extra$upper)
pareto <- cbind.data.frame(x, y)

ggplot()+
  geom_density(data = u_NI, aes(k_u))+
  geom_line(data = pareto, aes(x = x, y = y, color = "truncpareto"),linetype = 5, size = 1.3)+
  theme(legend.position = c(.95, .95),
        legend.justification = c(1, 1),
        legend.title = element_blank())

########################## Edit:

# fix lower and upper boundary, estimate may still fail, depending on     the start value of shape
fitdist(u_NI$k_u, 'truncpareto',
        method = "mle",
        start = list(shape=1),
        fix.arg=list(lower=1, upper=max(u_NI$k_u) + 0.01),
        control=list(trace=1, REPORT=1)) -> fit.k_u

# use different method to estimate parameter, mge seems to work
fitdist(u_NI$k_u, 'truncpareto',
        method = "mge",
        start = list(shape=1,
                     lower=1,
                     upper=max(u_NI$k_u) + 1),
        control=list(trace=1, REPORT=1)) -> fit.k_u

The errors that remain refer to this:

> pnorm(numeric(), 1,10)
numeric(0)
> ptruncpareto(numeric(), 1,10,5)
[1] NA

####### Edit2: I think i found the original error and its a bit confusing. However, there is also a lower parameter for the mle, that is separate from the lower parameter of the distribution. It should restrain parameter estimates to >= 0. Hence, this should work, however it takes a good while, even for just 10,000 values:

fitdist(u_NI$k_u, 'truncpareto',
        method = "mle",
        start = list(shape=1,
                     lower=1,
                     upper=max(u_NI$k_u) +1 ),
        control=list(trace=1, REPORT=1),
        lower = 0) -> fit.k_u