Background
I want to compute homography between arbitrary images captured during an ARSession so that I can stitch images to form a 360 degrees panorama using cylindrical/spherical projection.
For each image captured during an ARSession, I have the following metadata provided by ARKit:
- Camera intrinsics (focal length, principal point)
- Camera transform (camera position and orientation in world coordinate space)
The metadata is represented in JSON as follows:
{
"transform": [
[
[
0.061746232,
-0.62273073,
-0.7799959,
0
],
[
0.9924069,
0.12159537,
-0.01851775,
0
],
[
0.10637547,
-0.7729301,
0.62551045,
0
],
[
-0.13529362,
0.21586405,
0.4334107,
1
]
]
],
"intrinsics": [
[
[
1447.54,
0,
0
],
[
0,
1447.54,
0
],
[
936.6673,
707.42883,
1
]
]
],
"eulerAngleY": 9.65149,
"eulerAngleX": 50.617744,
"eulerAngleZ": -78.951355
}
Code to extract camera params from a given image metadata and compute homography between two images using the camera params:
def extract_camera_params(metadata):
# Transposed because ARKit uses column-first matrices
transform = np.array(metadata['transform'][0]).T
# Extract rotation matrix (3x3) and translation vector (3x1)
R = transform[:3, :3]
T = transform[:3, 3]
# Transposed because ARKit uses column-first matrices
K = np.array(metadata['intrinsics'][0], dtype=np.float32).transpose()
return CameraParams(R, T, K)
def FindHomography(BaseCameraParams, SecCameraParams):
R1, tvec1, R2, tvec2, K = BaseCameraParams.R, BaseCameraParams.T, SecCameraParams.R, SecCameraParams.T, SecCameraParams.K
R_1to2 = np.dot(R2, R1.T)
tvec_1to2 = np.dot(R2, np.dot(-R1.T, tvec1)) + tvec2
normal = np.array([0, 0, 1]).reshape((3, 1))
normal1 = np.dot(R1, normal)
d = normal1.T.dot(tvec1)
homography_euclidean = R_1to2 + (np.outer(tvec_1to2, normal1.T) / d)
print("Homography Euclidean:\n", homography_euclidean)
print("\nK:\n", K)
homography = np.dot(np.dot(K, homography_euclidean), np.linalg.inv(K))
homography /= homography[2, 2]
print("\nHomography:\n", homography)
return homography
However, the resulting stitched image looks like this:

My questions
- Are there any logical mistakes with my FindHomography function? I think my homography code has some severe flaws in logic given that the second image is not even remotely aligned with the first image.
- ARKit images are captured in landscape orientation even if the device is held in portrait mode. Should I adjust my code to account for this? If so, how? For context, this article mentioned the following:
Transform matrix also known as camera pose, can be obtained via ARCamera property transform. It shows position and orientation of the camera in the world coordinate system. The trick is that the initial camera position is a landscape right which means that if the device is hold in portrait mode, transform matrix will have rotation component that correspond to that.
Appendices
For the rest of the stitching pipeline, I'm using this Github repo as a reference: https://github.com/KEDIARAHUL135/PanoramaStitchingP2
References used when computing homography from camera displacement: