Search
Primary links
Please donate to libfreevec to ensure its continuing development! Donations are done via Paypal.
32-bit *signed* integer multiplication with AltiVec
markos — Sat, 23/08/2008 - 21:55
While completing Eigen2 AltiVec support (should be almost complete now), I noticed that the 32-bit integer multiplication didn't work correctly all of the time. As AltiVec does not really include any instruction to do 32-bit integer multiplication, I used Apple's routine from the Apple Developer's site. But this didn't work and some results were totally off. With some debugging, I found out that this routine works for unsigned 32-bit integers, where Eigen2 uses signed integers! So, I had to search more, and to my surprise, I found no reference of any similar work. So I had 2 choices: a) ditch AltiVec integer vectorisation from Eigen2 (not acceptable!) b) implement my own method! It is obvious which choice I followed :)
UPDATE: Thanks to Matt Sealey, who noticed I could have used vec_abs() instead of vec_sub() and vec_max(). Duh! :D
Anyway, here is the code with some explanations, but just as a quick explanation, I can just say that there were 3 stages to the algorithm:
- get the signs of the multiplication using xor, in particular, if
v1 = | A1 | A2 | A3 | A4 |, v2 = | B1 | B2 | B3 | B4 | (v1 xor v2) -> | sgn(A1*B1) | sgn(A2*B2) | sgn(A3*B3) | sgn(A4*B4) |
and we just have to use compare each 32-bit quantity with 0 and produce the needed mask.
- Get the absolute values of each vector and do the multiplication as per the Apple method for unsigned integers.
- Change the signs of the required elements, according to the mask, using basic binary arithmetic. A negative number is the two's complement +1: -A = ~A +1. So we just NOR the negative elements, add 1 to them and merge the results back to the final vector.
Here is the code (taken and modified from the Eigen2 source).
vector int vmulws(const vector int& a, const vector int& b) { v4i bswap, low_prod, high_prod, prod, prod_, a1, b1, v1sel; // Get the absolute values a1 = vec_abs(a); b1 = vec_abs(b); // Get the signs using xor v4bi sgn = (v4bi) vec_cmplt(vec_xor(a, b), v0i); // Do the multiplication for the asbolute values. bswap = (v4i) vec_rl((v4ui) b1, (v4ui) v16i_ ); low_prod = vec_mulo((vector short)a1, (vector short)b1); high_prod = vec_msum((vector short)a1, (vector short)bswap, v0i); high_prod = (v4i) vec_sl((v4ui) high_prod, (v4ui) v16i_); prod = vec_add( low_prod, high_prod ); // NOR the product and select only the negative elements according to the sign mask prod_ = vec_nor(prod, prod); prod_ = vec_sel(v0i, prod_, sgn); // Add 1 to the result to get the negative numbers v1sel = vec_sel(v0i, v1i, sgn); prod_ = vec_add(prod_, v1sel); // Merge the results back to the final vector. prod = vec_sel(prod, prod_, sgn); return prod; }
ppc profiling on linux
sanjay — Mon, 25/08/2008 - 20:50Hi Markos,
Hopefully, this won't be taken as an unwanted advertisement...but as a note from one PPC hacker to another:
If you're interested in a profiler and code analyzer for Linux PPC, check out: http://www.rotateright.com .
--
Sanjay
I'll definitely check it
markos — Tue, 26/08/2008 - 19:36I'll definitely check it out! This might be just what I needed!!
Thanks!