Published on *freevec.org* (http://www.freevec.org)

Submitted by markos on Sat, 23/08/2008 - 22:55

While completing Eigen2 AltiVec support (should be almost complete now), I noticed that the 32-bit integer multiplication didn't work correctly all of the time. As AltiVec does not really include any instruction to do 32-bit integer multiplication, I used Apple's routine from the Apple Developer's site [1]. But this didn't work and some results were totally off. With some debugging, I found out that this routine works for unsigned 32-bit integers, where Eigen2 uses signed integers! So, I had to search more, and to my surprise, I found no reference of any similar work. So I had 2 choices: a) ditch AltiVec integer vectorisation from Eigen2 (not acceptable!) b) implement my own method! It is obvious which choice I followed :)

UPDATE: Thanks to Matt Sealey, who noticed I could have used vec_abs() instead of vec_sub() and vec_max(). Duh! :D

UPDATE: Thanks to Matt Sealey, who noticed I could have used vec_abs() instead of vec_sub() and vec_max(). Duh! :D

Anyway, here is the code with some explanations, but just as a quick explanation, I can just say that there were 3 stages to the algorithm:

- get the signs of the multiplication using xor, in particular, if
v1 = | A1 | A2 | A3 | A4 |, v2 = | B1 | B2 | B3 | B4 | (v1 xor v2) -> | sgn(A1*B1) | sgn(A2*B2) | sgn(A3*B3) | sgn(A4*B4) |

and we just have to use compare each 32-bit quantity with 0 and produce the needed mask.

- Get the absolute values of each vector and do the multiplication as per the Apple method for unsigned integers.
- Change the signs of the required elements, according to the mask, using basic binary arithmetic. A negative number is the two's complement +1: -A = ~A +1. So we just NOR the negative elements, add 1 to them and merge the results back to the final vector.
Here is the code (taken and modified from the Eigen source [2]).

vector int vmulws(const vector int& a, const vector int& b) { v4i bswap, low_prod, high_prod, prod, prod_, a1, b1, v1sel; // Get the absolute values a1 = vec_abs(a); b1 = vec_abs(b); // Get the signs using xor v4bi sgn = (v4bi) vec_cmplt(vec_xor(a, b), v0i); // Do the multiplication for the asbolute values. bswap = (v4i) vec_rl((v4ui) b1, (v4ui) v16i_ ); low_prod = vec_mulo((vector short)a1, (vector short)b1); high_prod = vec_msum((vector short)a1, (vector short)bswap, v0i); high_prod = (v4i) vec_sl((v4ui) high_prod, (v4ui) v16i_); prod = vec_add( low_prod, high_prod ); // NOR the product and select only the negative elements according to the sign mask prod_ = vec_nor(prod, prod); prod_ = vec_sel(v0i, prod_, sgn); // Add 1 to the result to get the negative numbers v1sel = vec_sel(v0i, v1i, sgn); prod_ = vec_add(prod_, v1sel); // Merge the results back to the final vector. prod = vec_sel(prod, prod_, sgn); return prod; }

**Links**

[1] http://developer.apple.com/hardwaredrivers/ve/algorithms.html#Multiply32

[2] https://bitbucket.org/eigen/eigen/src/2ffba8f5dde409ff7d8ce408f8044f756a165458/Eigen/src/Core/arch/AltiVec/PacketMath.h?at=default

[3] http://www.freevec.org/category/simd/architecture/altivec

[4] http://www.freevec.org/category/simd/algorithms/algebra