An extrinsic camera matrix (E) is formed by a Rotation (R) matrix (3x3) and a Translation (t) vector (3x1).
E = [R|t]
If I have a 3D point A = [x, y, z]', the transformation from 3D to camera coordinate will be EA. That is, on the x-axis v_xrt = r_11 * x + r_12 * y + r_13 * z + t_1. This transformation is correct if Rotation is performed before Translation. The complete transformation for all axes is,
|r_11 r_12 r_13 t_1| |x|
|r_21 r_22 r_23 t_2| x |y|
|r_31 r_32 r_33 t_3| |z|
|1|
This extrinsic matrix is well-known in the literature.
If Translation is done before Rotation. On the x-axis, the transformation is v_xtr = r_11 * (x + t_1) + r_12 * (y + t_2) + r_13 * (z + t_3) (Note that v_xrt != v_xtr). The complete transformation for all axes is,
|r_11 r_12 r_13| |x + t_1|
|r_21 r_22 r_23| x |y + t_2|
|r_31 r_32 r_33| |z + t_1|
How can I compute an extrinsic camera matrix in this case (Translation Before Rotation)?