I'm trying to build a model on python to predict an operational parameter (ROP- Rate of Penetration) while drilling an oil well. I'm working with a neural network trained with PSO using pyswarms library. Input layer consists of 11 neurons and output layer just 1 neuron (ROP). I'm still searching for the "right" number of hidden layers.I don't have enough knowledge about machine learning, so any suggestion will be accepted.The loss function to minimize is MAE, due to it is not affected by outliers.
To track the performance of the model I'm not sure about what loss function I have to use. That's why after every run, I print MAE, RMSE, MSE R2 and R. The problem is that the values for train are "high" (loss functions) or "low" (R o R2) and for validation data is quite close.
I would like you to give oppinion about my "work".I'm not really sure about if the model is overfitting, underfitting or data quality is low.
Whole dataset consists of 6 wells (F-1A, F-1B, F-1C, F-11A,F-11B,F-11T2), for each well we have 12 parameters (including ROP that is the target). The number of samples for each well is different: For instance: Well F-1A: 60 000 samples (aprox) Well F-1B: 20 000 samples (aprox) Well F-1C: 25 000 samples (aprox)
So I consider that is enough to train my model on one well, for example on Well F-11A and then validate on Well F-1B.
On one of those runs I got this result:

Input layer: 11
Hidden layers: 2 (8 neurons and 10 neurons)
Output layer: 1
Options : {'c1': 0.68, 'c2': 0.7, 'w': 0.73}
n_particles = 100
iters = 100
The results for loss functions, R2 and R for each dataset are:
ROP Train Data r^2= 0.4955
ROP Train Data r= 0.7039
ROP Train Data MAE = 3.272725
ROP Train Data MSE = 19.528535
ROP Train Data RMSE = 4.41911
ROP Validation Data r^2= 0.5169
ROP Validation Data r= 0.719
ROP Validation Data MAE = 10.755544
ROP Validation Data MSE = 124.405781
ROP Validation Data RMSE = 11.153734
I dont know well what is the interpretation of this values. What I have to do next? Because I have realized that on the right plot, the curve of the Predicted Validation data (green curve) follow the trend of the Actual Validation data (blue curve) but the predicted values seems to be lower (as if they had been displaced)