Compute homography given two ARKit camera poses/transforms

78 Views Asked by uohzxela At 29 December 2023 at 05:59

Background

I want to compute homography between arbitrary images captured during an ARSession so that I can stitch images to form a 360 degrees panorama using cylindrical/spherical projection.

For each image captured during an ARSession, I have the following metadata provided by ARKit:

Camera intrinsics (focal length, principal point)
Camera transform (camera position and orientation in world coordinate space)

The metadata is represented in JSON as follows:

{
  "transform": [
    [
      [
        0.061746232,
        -0.62273073,
        -0.7799959,
        0
      ],
      [
        0.9924069,
        0.12159537,
        -0.01851775,
        0
      ],
      [
        0.10637547,
        -0.7729301,
        0.62551045,
        0
      ],
      [
        -0.13529362,
        0.21586405,
        0.4334107,
        1
      ]
    ]
  ],
  "intrinsics": [
    [
      [
        1447.54,
        0,
        0
      ],
      [
        0,
        1447.54,
        0
      ],
      [
        936.6673,
        707.42883,
        1
      ]
    ]
  ],
  "eulerAngleY": 9.65149,
  "eulerAngleX": 50.617744,
  "eulerAngleZ": -78.951355
}

Code to extract camera params from a given image metadata and compute homography between two images using the camera params:

def extract_camera_params(metadata):
    # Transposed because ARKit uses column-first matrices
    transform = np.array(metadata['transform'][0]).T

    # Extract rotation matrix (3x3) and translation vector (3x1)
    R = transform[:3, :3]
    T = transform[:3, 3]

    # Transposed because ARKit uses column-first matrices
    K = np.array(metadata['intrinsics'][0], dtype=np.float32).transpose()

    return CameraParams(R, T, K)

def FindHomography(BaseCameraParams, SecCameraParams):
    R1, tvec1, R2, tvec2, K = BaseCameraParams.R, BaseCameraParams.T, SecCameraParams.R, SecCameraParams.T, SecCameraParams.K

    R_1to2 = np.dot(R2, R1.T)
    tvec_1to2 = np.dot(R2, np.dot(-R1.T, tvec1)) + tvec2

    normal = np.array([0, 0, 1]).reshape((3, 1))
    normal1 = np.dot(R1, normal)

    d = normal1.T.dot(tvec1)

    homography_euclidean = R_1to2 + (np.outer(tvec_1to2, normal1.T) / d)

    print("Homography Euclidean:\n", homography_euclidean)
    print("\nK:\n", K)

    homography = np.dot(np.dot(K, homography_euclidean), np.linalg.inv(K))
    homography /= homography[2, 2]

    print("\nHomography:\n", homography)

    return homography

However, the resulting stitched image looks like this:

My questions

Are there any logical mistakes with my FindHomography function? I think my homography code has some severe flaws in logic given that the second image is not even remotely aligned with the first image.
ARKit images are captured in landscape orientation even if the device is held in portrait mode. Should I adjust my code to account for this? If so, how? For context, this article mentioned the following:

Transform matrix also known as camera pose, can be obtained via ARCamera property transform. It shows position and orientation of the camera in the world coordinate system. The trick is that the initial camera position is a landscape right which means that if the device is hold in portrait mode, transform matrix will have rotation component that correspond to that.

Appendices

For the rest of the stitching pipeline, I'm using this Github repo as a reference: https://github.com/KEDIARAHUL135/PanoramaStitchingP2

References used when computing homography from camera displacement:

Original Q&A

Compute homography given two ARKit camera poses/transforms

Background

My questions

Appendices

There are 0 best solutions below

Related Questions in IOS

Related Questions in COMPUTER-VISION

Related Questions in AUGMENTED-REALITY

Related Questions in ARKIT

Related Questions in HOMOGRAPHY

Trending Questions

Popular # Hahtags

Popular Questions