I can't find anywhere a solution to this problem, so I'm asking here hoping someone has the solution.
The problem is the following:
given an image is it possible to find the "real sizes", intended as the projection onto the horizontal plane of the road, of a rectangle drawn in an image?
I have probably any information needed to solve this problem:
- focal of the camera in millimetres mm
- height of the camera compared to the road surface in cm
- inclination camera compared to the road surface in °
- sensor size in inches
- sizes of the image in px
- sizes of the rectangle in px
- position of the rectangle compared to the image in px retrieved by using openCV
However I miss the formulas to even start writing down the code.
Here 2 screenshots taken from Google. In both of them the rectangle is the same size, it has been only moved upward which creates another big issue: Perspective.


By looking around noone seems to even mention the projection onto the horizontal plane neither perspective. Honestly I can't even figure out where to start, I've never dealt with something like this before. If I didn't make everything clear, don't hesitate asking.
EDIT 1: Here there are 2 pictures for better understanding the problem.
- The first one as a context image:

- The second one as a visual example from https://docs.opencv.org/4.x/d9/d0c/group__calib3d.html of what is the point of all of this (sorry for the low quality level). The blue indicates the road, while the orange rectangle is the projection of the red one on the road plane:

Your problem essentially boils down to “what is the position on the ground that corresponds to any given pixel coordinate position” because once you have that, determining ground positions for four corners of a quadrilateral is easy.
If you assume a pinhole camera, you have a direct ray of light from a point on the ground through the pinhole to the sensor. So the most important part is to embed your sensor coordinate system correctly into 3d space.
Steps 1 through 5 should allow you to take (x, y) positions in the resulting image and turn them into 3d positions of the corresponding point on the correctly positioned sensor. Any multiple of that 3d vector is a point on the line spanned by the point on the sensor and the pinhole. Pick the multiple that also lies in the ground plane, i.e. has the one non-zero coordinate match that of the ground. It is your position on the ground.
All of this will take care of perspective by having different positions in the image correspond to different positions on the ground so that the same pixel distance on the sensor doesn't correspond to equal distances on the ground.
OpenCV might have some tools to help with all of this, but understanding how to correctly use those tools might be harder than multiplying 4 matrices to combine the basic transformations I outlined above. If you can't assume the camera to act like a pinhole camera, things become a lot more complicated, though.
Related topics: see Finding the transform matrix from 4 projected points for how to compute the matrix if instead of camera parameters you have 4 matching points, and see How to calculate true lengths from perspective projection? if instead of camera parameters you know the size of a visible object in the plane.