My code when running the generalized additive model with the betar family is as follow.

libary(mgcv)
b1 <- gam(ssim_exp ~ s(stage, k = 4, fx = TRUE, by = comparison_type) + comparison_type, data = df, family = betar(link = "logit", eps=.Machine$double.eps*1000))

Output

saturated likelihood may be inaccurate
Family: Beta regression(0.434) 
Link function: logit 

Formula:
ssim_exp_scale ~ s(stage, k = 4, fx = TRUE, by = comparison_type) + 
    comparison_type

Parametric coefficients:
                         Estimate Std. Error z value Pr(>|z|)    
(Intercept)               -0.5572     0.1607  -3.468 0.000524 ***
comparison_typefunctions   2.0598     0.1988  10.362  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
                                  edf Ref.df Chi.sq  p-value    
s(stage):comparison_typecomplete    3      3  19.07 0.000265 ***
s(stage):comparison_typefunctions   3      3   0.88 0.830160    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =  -0.00757   Deviance explained = -16.4%
-REML = -1035.1  Scale est. = 1         n = 171
saturated likelihood may be inaccuratesaturated likelihood may be inaccurate

I tried decreasing the eps but I still get the same warning "saturated likelihood may be inaccurate" and negative deviance, any idea why? And how to fix this?

For context - I do have some 0s and 1s in the data and my dependent variable is in the form of percentage from 0 - 100%, rescaled to 0 and 1. My dependent variable is a similarity measure like Jaccard similarity - https://www.learndatasci.com/glossary/jaccard-similarity/ .

This is the distribution of the dependent variable of my data

enter image description here

0

There are 0 best solutions below