freevec.org

  • about
  • benchmarks
Home

Search

Primary links

  • About
    • History of libfreevec
  • Benchmarks
    • libfreevec

Please donate to libfreevec to ensure its continuing development! Donations are done via Paypal.





32-bit *signed* integer multiplication with AltiVec

markos — Sat, 23/08/2008 - 21:55

While completing Eigen2 AltiVec support (should be almost complete now), I noticed that the 32-bit integer multiplication didn't work correctly all of the time. As AltiVec does not really include any instruction to do 32-bit integer multiplication, I used Apple's routine from the Apple Developer's site. But this didn't work and some results were totally off. With some debugging, I found out that this routine works for unsigned 32-bit integers, where Eigen2 uses signed integers! So, I had to search more, and to my surprise, I found no reference of any similar work. So I had 2 choices: a) ditch AltiVec integer vectorisation from Eigen2 (not acceptable!) b) implement my own method! It is obvious which choice I followed :)
UPDATE: Thanks to Matt Sealey, who noticed I could have used vec_abs() instead of vec_sub() and vec_max(). Duh! :D

Anyway, here is the code with some explanations, but just as a quick explanation, I can just say that there were 3 stages to the algorithm:

  • get the signs of the multiplication using xor, in particular, if
    v1 = | A1 | A2 | A3 | A4 |, v2 = | B1 | B2 | B3 | B4 |
     
    (v1 xor v2) -> | sgn(A1*B1) | sgn(A2*B2) | sgn(A3*B3) | sgn(A4*B4) |

    and we just have to use compare each 32-bit quantity with 0 and produce the needed mask.

  • Get the absolute values of each vector and do the multiplication as per the Apple method for unsigned integers.
  • Change the signs of the required elements, according to the mask, using basic binary arithmetic. A negative number is the two's complement +1: -A = ~A +1. So we just NOR the negative elements, add 1 to them and merge the results back to the final vector.

    Here is the code (taken and modified from the Eigen2 source).

    vector int vmulws(const vector int&   a, const vector int&   b)
    {
      v4i bswap, low_prod, high_prod, prod, prod_, a1, b1, v1sel;
     
      // Get the absolute values
      a1  = vec_abs(a);
      b1  = vec_abs(b);
     
      // Get the signs using xor
      v4bi sgn = (v4bi) vec_cmplt(vec_xor(a, b), v0i);
     
      // Do the multiplication for the asbolute values.
      bswap = (v4i) vec_rl((v4ui) b1, (v4ui) v16i_ );
      low_prod = vec_mulo((vector short)a1, (vector short)b1);
      high_prod = vec_msum((vector short)a1, (vector short)bswap, v0i);
      high_prod = (v4i) vec_sl((v4ui) high_prod, (v4ui) v16i_);
      prod = vec_add( low_prod, high_prod );
     
      // NOR the product and select only the negative elements according to the sign mask
      prod_ = vec_nor(prod, prod);
      prod_ = vec_sel(v0i, prod_, sgn);
     
      // Add 1 to the result to get the negative numbers
      v1sel = vec_sel(v0i, v1i, sgn);
      prod_ = vec_add(prod_, v1sel);
     
      // Merge the results back to the final vector.
      prod = vec_sel(prod, prod_, sgn);
      return prod;
    }

  • Algebra
  • AltiVec
  • Code
  • Login or register to post comments

ppc profiling on linux

sanjay — Mon, 25/08/2008 - 20:50

Hi Markos,

Hopefully, this won't be taken as an unwanted advertisement...but as a note from one PPC hacker to another:
If you're interested in a profiler and code analyzer for Linux PPC, check out: http://www.rotateright.com .

--
Sanjay

  • Login or register to post comments

I'll definitely check it

markos — Tue, 26/08/2008 - 19:36

I'll definitely check it out! This might be just what I needed!!

Thanks!

  • Login or register to post comments

SIMD

  • Algorithms (31)
    • Algebra (9)
      • Matrix operations (8)
    • Bit operations (0)
    • Codecs (0)
      • Audio (0)
      • Video (0)
    • Comparison (0)
      • image comparison (0)
      • Levenshtein (0)
    • Compression (0)
      • Bzip2 (0)
      • Gzip (0)
      • LZMA (0)
      • LZW (0)
      • Squashfs (0)
      • Zlib (0)
    • Encryption (0)
      • AES (0)
      • DES (0)
      • RSA (0)
      • Salsa (0)
      • SSL (0)
    • Hashing (1)
      • CRC (0)
      • TCP/IP checksum (0)
      • UMAC (0)
    • Memory operations (15)
    • Multiprecision (0)
    • Searching (5)
      • String searching (5)
    • Sorting (0)
  • Software (32)
    • Benchmarking (2)
    • Libraries (30)
      • Eigen2 (0)
      • libfreevec (22)
      • simdX86 (8)
  • Architecture (32)
    • AltiVec (32)
    • ARM NEON (0)
    • CELL SPU (0)
    • SSE (0)
    • VIS (0)

User login

  • Create new account
  • Request new password
  • about
  • benchmarks

Copyright (c)2008 by CODEX.
Powered by Drupal. Using theme Deco.
All Google charts have been created by the CSV Chart and Chart API Drupal modules.