I am currently using huggingface package to train my layoutlm model. However, I am experiencing overfitting for a token classification task. My dataset contains only 400 documents. I know it is very small dataset but I don't have any other chance to collect more data.
My results are in the table below. I have tried weight_decay=0.1 which is a high number in my opinion and also tried early stopping based on f1 score and loss seperately, but they didn't work.

Which regularisation techniques should I try extra? Do you have any solution to overfitting to a small dataset with bert-like models?