We aim to provide an up-to-date library of information for SIMD architectures, like algorithms, tutorials, comparisons, benchmarks, etc.

libfreevec NG!!

Submitted by markos on Tue, 03/24/2009 - 23:24.

I'm in the process of rewriting libfreevec and porting it to other SIMD platforms, apart from AltiVec (which I consider dead or dying, unfortunately, thanks to the Big Powers that decided it's no longer important along with PowerPC, but that should be another topic). Anyway, the main platforms chosen are AltiVec (of course :), SSE (SSE2, SSE3 and possiby SSE4), ARM NEON and Cell SPU.

( categories: )

32-bit *signed* integer multiplication with AltiVec

Submitted by markos on Sat, 08/23/2008 - 21:55.

While completing Eigen2 AltiVec support (should be almost complete now), I noticed that the 32-bit integer multiplication didn't work correctly all of the time. As AltiVec does not really include any instruction to do 32-bit integer multiplication, I used Apple's routine from the Apple Developer's site. But this didn't work and some results were totally off. With some debugging, I found out that this routine works for unsigned 32-bit integers, where Eigen2 uses signed integers! So, I had to search more, and to my surprise, I found no reference of any similar work. So I had 2 choices: a) ditch AltiVec integer vectorisation from Eigen2 (not acceptable!) b) implement my own method! It is obvious which choice I followed :)
UPDATE: Thanks to Matt Sealey, who noticed I could have used vec_abs() instead of vec_sub() and vec_max(). Duh! :D

( categories: )

libfreevec 1.0.4 benchmarks updated!

Submitted by markos on Thu, 08/21/2008 - 11:23.

Hello again,

I managed to find time to update all of the libfreevec benchmarks to the latest version 1.0.4 and also include more complete tests and added a non-ppc architecture (an Athlon X2 5000 @2.6Ghz) where the same tests were run (as 32-bit apps on a 64-bit Linux) for comparison. This is important for two reasons:

  • to find how PowerPC CPUs compare to a current popular x86 CPU (the same benchmarks will be done on an Intel CPU soon)
  • to find any deficiencies in glibc itself (as you will see there are many).

All benchmarks were run on OpenSuse 11.0, except for the G5 which uses Debian Lenny/testing. The compiler used was gcc 4.3.2. All functions have been tested to work correctly on each platform.

HOWTO: Using libfreevec using LD_PRELOAD

Submitted by markos on Tue, 08/19/2008 - 13:18.

Ok, let's suppose you've downloaded libfreevec, built it successfully and now you want to use it for the whole system, without recompiling the whole system to use the library! Is it possible? Thanks to a glibc feature you can!

There are two ways to do that:

  • one is to use the LD_PRELOAD environment variable, eg. at boot time, but there is a more elegant way to do this.
  • by use of the /etc/ld.so.preload file, which is most likely distro-agnostic, so that the dynamic loader ld.so loads libfreevec before any other library (including libc.so).

    The 2nd is a much more elegant solution, IMHO, and I've been using it for months with no problems whatsoever. So, after you install the library somewhere (by default it's installed in /usr/local/lib/) you could just do a:

    echo /usr/local/lib/libfreevec_libc.so > /etc/ld.so.preload

    Beware it has to be the libfreevec_libc.so and not the libfreevec.so as the 2nd one prefixes each function with vec_ and is only useful to someone that wants to use the library explicitly -for whatever reason.

    The next application you will load you will use the AltiVec functions in libfreevec! Enjoy! :)

    Note: This has a slight overhead which would reduce some of the performance in the functions, but would still prove a good move in most cases.

( categories: )

Inverse of Matrix 4x4 using partitioning

Submitted by markos on Fri, 04/18/2008 - 17:31.

We tackle the 4x4 matrix inversion using the matrix partitioning method, as described in the "Numerical Recipes in C" book (2nd ed., though I guess it will be similar in the 3rd edition). Using the AltiVec SIMD unit, we achieve almost 300% increase in performance, making the routine the fastest -at least known to us, matrix inversion method!

SIMD