Matrix 4x4 multiplication (floats)
Submitted by markos on Sat, 01/03/2008 - 20:08
(Please see Matrix 4x4 addition/subtraction (floats) for the typedefs and definitions used.)
Matrix multiplication is done on a column x row basis. Given two input matrices m2, m3 we do the multiplication and store the result back to an output matrix m1. Hence the function prototype:
void Mat44MulTo(Mat44 m1, Mat44 m2, Mat44 m3);
The calculation of matrix m1 is done on a per column basis. Since we operate on 128-bit vectors, each vector can hold 4 32-bit floats, or one line of a 4x4 matrix. When working with columns, we need to calculate 4 128-bit vectors, but we only produce results for the first column. To calculate the full matrix product, we add the column results (again 4 128-bit vectors, but each time only the relevant column is non-zero) together.
void Mat44MulTo(Mat44 m1, Mat44 m2, Mat44 m3) { vector float zero; vector float vA1, vA2, vA3, vA4, vB1, vB2, vB3, vB4; vector float vC1, vC2, vC3, vC4; // Load matrices and multiply the first row while we wait for the next row zero = (vector float) vec_splat_u32(0); LOAD_ALIGNED_MATRIX(m2, vA1, vA2, vA3, vA4); LOAD_ALIGNED_MATRIX(m3, vB1, vB2, vB3, vB4); // Calculate the first column of m1 vC1 = vec_madd( vec_splat( vA1, 0 ), vB1, zero ); vC2 = vec_madd( vec_splat( vA2, 0 ), vB1, zero ); vC3 = vec_madd( vec_splat( vA3, 0 ), vB1, zero ); vC4 = vec_madd( vec_splat( vA4, 0 ), vB1, zero ); // By now we should have loaded both matrices and be done with the first row // Multiply vA x vB2, add to previous results, vC vC1 = vec_madd( vec_splat( vA1, 1 ), vB2, vC1 ); vC2 = vec_madd( vec_splat( vA2, 1 ), vB2, vC2 ); vC3 = vec_madd( vec_splat( vA3, 1 ), vB2, vC3 ); vC4 = vec_madd( vec_splat( vA4, 1 ), vB2, vC4 ); // Multiply vA x vB3, add to previous results, vC vC1 = vec_madd( vec_splat( vA1, 2 ), vB3, vC1 ); vC2 = vec_madd( vec_splat( vA2, 2 ), vB3, vC2 ); vC3 = vec_madd( vec_splat( vA3, 2 ), vB3, vC3 ); vC4 = vec_madd( vec_splat( vA4, 2 ), vB3, vC4 ); // Multiply vA x vB3, add to previous results, vC vC1 = vec_madd( vec_splat( vA1, 3 ), vB4, vC1 ); vC2 = vec_madd( vec_splat( vA2, 3 ), vB4, vC2 ); vC3 = vec_madd( vec_splat( vA3, 3 ), vB4, vC3 ); vC4 = vec_madd( vec_splat( vA4, 3 ), vB4, vC4 ); // Store back the result STORE_ALIGNED_MATRIX(m1, vC1, vC2, vC3, vC4); }