How to convert a screen coordinate into a translation for a projection matrix?

Question

How to convert a screen coordinate into a translation for a projection matrix?

693 Views Asked by KiraHoneybee At 27 August 2022 at 21:55

(More info at end)----->

I am trying to render a small picture-in-picture display over my scene. The PiP is just a smaller texture, but it is intended to reveal secret objects in the scene when it is placed over them.

To do this, I want to render my scene, then render the SAME scene on the smaller texture, but with the exact same positioning as the main scene. The intended result would be something like this:

My problem is... I cannot get the scene on the smaller texture to match up 1:1. I keep trying various kludges, but ultimately I suspect that I need to do something to the projection matrix to pan it over to the location of the frame. I can get it to zoom correctly...just can't get it to pan.

Can anyone suggest what I need to do to my projection matrix to render my scene 1:1 (but panned by x,y) onto a smaller texture?

The data I have:

Resolution of the full-screen framebuffer
Resolution of the smaller texture
XY coordinate where I want to draw the smaller texture as an overlay sprite
The world/view/projection matrices from the original full-screen scene
The viewport from the original full-screen scene

(Edit) Here is the function I use to produce the 3D camera:

void Make3DCamera(Vector theCameraPos, Vector theLookAt, Vector theUpVector, float theFOV, Point theRez, Matrix& theViewMatrix,Matrix& theProjectionMatrix)
{
    Matrix aCombinedViewMatrix;
    Matrix aViewMatrix;
    aCombinedViewMatrix.Scale(1,1,-1);

    theCameraPos.mZ*=-1;
    theLookAt.mZ*=-1;
    theUpVector.mZ*=-1;
    aCombinedViewMatrix.Translate(-theCameraPos);

    Vector aLookAtVector=theLookAt-theCameraPos;
    Vector aSideVector=theUpVector.Cross(aLookAtVector);

    theUpVector=aLookAtVector.Cross(aSideVector);
    aLookAtVector.Normalize();
    aSideVector.Normalize();
    theUpVector.Normalize();

    aViewMatrix.mData.m[0][0] = -aSideVector.mX;
    aViewMatrix.mData.m[1][0] = -aSideVector.mY;
    aViewMatrix.mData.m[2][0] = -aSideVector.mZ;
    aViewMatrix.mData.m[3][0] = 0;

    aViewMatrix.mData.m[0][1] = -theUpVector.mX;
    aViewMatrix.mData.m[1][1] = -theUpVector.mY;
    aViewMatrix.mData.m[2][1] = -theUpVector.mZ;
    aViewMatrix.mData.m[3][1] = 0;

    aViewMatrix.mData.m[0][2] = aLookAtVector.mX;
    aViewMatrix.mData.m[1][2] = aLookAtVector.mY;
    aViewMatrix.mData.m[2][2] = aLookAtVector.mZ;
    aViewMatrix.mData.m[3][2] = 0;

    aViewMatrix.mData.m[0][3] = 0;
    aViewMatrix.mData.m[1][3] = 0;
    aViewMatrix.mData.m[2][3] = 0;
    aViewMatrix.mData.m[3][3] = 1;

    if (gG.mRenderToSprite) aViewMatrix.Scale(1,-1,1);
    aCombinedViewMatrix*=aViewMatrix;

    // Projection Matrix

    float aAspect = (float) theRez.mX / (float) theRez.mY;
    float aNear = gG.mZRange.mData1;
    float aFar = gG.mZRange.mData2;

    float aWidth = gMath.Cos(theFOV / 2.0f);
    float aHeight = gMath.Cos(theFOV / 2.0f);

    if (aAspect > 1.0) aWidth /= aAspect;
    else aHeight *= aAspect;

    float s = gMath.Sin(theFOV / 2.0f);
    float d = 1.0f - aNear / aFar;

    Matrix aPerspectiveMatrix;
    aPerspectiveMatrix.mData.m[0][0] = aWidth;
    aPerspectiveMatrix.mData.m[1][0] = 0;
    aPerspectiveMatrix.mData.m[2][0] = gG.m3DOffset.mX/theRez.mX/2;
    aPerspectiveMatrix.mData.m[3][0] = 0;
    aPerspectiveMatrix.mData.m[0][1] = 0;
    aPerspectiveMatrix.mData.m[1][1] = aHeight;
    aPerspectiveMatrix.mData.m[2][1] = gG.m3DOffset.mY/theRez.mY/2;
    aPerspectiveMatrix.mData.m[3][1] = 0;
    aPerspectiveMatrix.mData.m[0][2] = 0;
    aPerspectiveMatrix.mData.m[1][2] = 0;
    aPerspectiveMatrix.mData.m[2][2] = s / d;
    aPerspectiveMatrix.mData.m[3][2] = -(s * aNear / d);
    aPerspectiveMatrix.mData.m[0][3] = 0;
    aPerspectiveMatrix.mData.m[1][3] = 0;
    aPerspectiveMatrix.mData.m[2][3] = s;
    aPerspectiveMatrix.mData.m[3][3] = 0;

    theViewMatrix=aCombinedViewMatrix;
    theProjectionMatrix=aPerspectiveMatrix;
}

Edit to add more information: Just playing and tweaking numbers, I have come to a "close" result. However the "close" result requires a multiplication by some kludge numbers, that I don't understand.

Here's what I'm doing to to perspective matrix to produce my close result:

//Before calling Make3DCamera, adjusting FOV:
aFOV*=smallerTexture.HeightF()/normalRenderSize.HeightF(); // Zoom it
aFOV*=1.02f // <- WTH is this?

//Then, to pan the camera over to the x/y position I want, I do:
Matrix aPM=GetCurrentProjectionMatrix();
float aX=(screenX-normalRenderSize.WidthF()/2.0f)/2.0f;
float aY=(screenY-normalRenderSize.HeightF()/2.0f)/2.0f;

aX*=1.07f; // <- WTH is this?
aY*=1.07f; // <- WTH is this?

aPM.mData.m[2][0]=-aX/normalRenderSize.HeightF();
aPM.mData.m[2][1]=-aY/normalRenderSize.HeightF();

SetCurrentProjectionMatrix(aPM);

When I do this, my new picture is VERY close... but not exactly perfect-- the small render tends to drift away from "center" the further the "magic window" is from the center. Without the kludge number, the drift away from center with the magic window is very pronounced.

The kludge numbers 1.02f for zoom and 1.07 for pan reduce the inaccuracies and drift to a fraction of a pixel, but those numbers must be a ratio from somewhere, right? They work at ANY RESOLUTION, though-- so I have have a 1280x800 screen and a 256,256 magic window texture... if I change the screen to 1024x768, it all still works.

Where the heck are these numbers coming from?

Original Q&A

There are 2 best solutions below

**jh100** · Answer 1 · 2022-08-28T00:48:07.810000

If you don't care about sub-optimal performance (i.e., drawing the whole scene twice) and if you don't need the smaller scene in a texture, an easy way to obtain the overlay with pixel perfect precision is:

Set up main scene (model/view/projection matrices, etc.) and draw it as you are now.
Use glScissor to set the rectangle for the overlay. glScissor takes the screen-space x, y, width, and height and discards anything outside that rectangle. It looks like you have those four data items already, so you should be good to go.
Call glEnable(GL_SCISSOR_TEST) to actually turn on the test.
Set the shader variables (if you're using shaders) for drawing the greyscale scene/hidden objects/etc. You still use the same view and projection matrices that you used for the main scene.
Draw the greyscale scene/hidden objects/etc.
Call glDisable(GL_SCISSOR_TEST) so you won't be scissoring at the start of the next frame.
Draw the red overlay border, if desired.

Now, if you actually need the overlay in its own texture for some reason, this probably won't be adequate...it could be made to work either with framebuffer objects and/or pixel readback, but this would be less efficient.

**derhass** · Answer 2 · 2022-08-28T18:10:59.643000

Most people completely overcomplicate such issues. There is absolutely no magic to applying transformations after applying the projection matrix.

If you have a projection matrix P (and I'm assuming default OpenGL conventions here where P is constructed in a way that the vector is post-multiplied to the matrix, so for an eye space vector v_eye, we get v_clip = P * v_eye), you can simply pre-multiply some other translate and scale transforms to cut out any region of interest.

Assume you have a viewport of size w_view * h_view pixels, and you want to find a projection matrix which renders only a tile w_tile * h_tile pixels , beginning at pixel location (x_tile, y_tile) (again, assuming default GL conventions here, window space origin is bottom left, so y_tile is measured from the bottom). Also note that the _tile coordinates are to be interpreted relative to the viewport, in the typical case, that would start at (0,0) and have the size of your full framebuffer, but this is by no means required nor assumed here.

Since after applying the projection matrix we are in clip space, we need to transform our coordinates from window space pixels to clip space. Note that clip space is a 4D homogeneous space, but we can use any w value we like (except 0) to represent any point (as a point in the 3D space we care about forms a line in the 4D space we work in), so let's just use w=1 for simplicity's sake.

The view volume in clip space is denoted by the [-w,w] range, so in the w=1 hyperplane, it is [-1,1]. Converting our tile into this space yields:

x_clip = 2 * (x_tile / w_view) -1
y_clip = 2 * (y_tile / h_view) -1
w_clip = 2 * (w_tile / w_view) -1
h_clip = 2 * (h_tile / h_view) -1

We now just need to translate the objects such that the center of the tile is moved to the center of the view volume, which by definition is the origin, and scale the w_clip * h_clip sized region to the full [-1,1] extent in each dimension.

That means:

T = translate(-(x_clip + 0.5*w_clip), -(y_clip + 0.5 *h_clip), 0)
S = scale(2.0/w_clip, 2.0/h_clip, 1.0)

We can now create the modified projection matrix P' as P' = S * T * P, and that's all there is. Rendering with P' instead of P will render exactly the region of your tile to whatever viewport you are using, so for it to be pixel-exact with respect to your original viewport, you must now render with a viewport which is also w_tile * h_tile pixels big.

Note that there is also another approach: The viewport is not clamped against the framebuffer you're rendering to. It is actually valid to provide negative values for x and y. If your framebuffer for rendering your tile into is exactly w_tile * h_tile pixels, you simply could set glViewport(-x_tile, -y_tile, x_tile + w_tile, y_tile + h_tile) and render with the unmodified projection matrix P instead.

How to convert a screen coordinate into a translation for a projection matrix?

There are 2 best solutions below

Related Questions in MATRIX

Related Questions in OPENGL

Related Questions in CAMERA

Related Questions in RENDER-TO-TEXTURE

Related Questions in PROJECTION-MATRIX

Trending Questions

Popular # Hahtags

Popular Questions