Powerbook G4 12" revamping, part 1

I decided to give my trusty powerbook G4 a second chance. But I thought it might be a good idea to upgrade some parts of it in the meantime. Now being as it is, I can't upgrade the CPU or RAM (G4 is fixed at 1Ghz and RAM at 1.2GB), but I could upgrade the disk and screen. This time I upgraded the disk plus I replaced the thermal toothpaste with something much more efficient so it wouldn't get as hot.

I won't go into the actual details of doing the upgrades, these are covered by the excellent ifixit.com articles:

freevec.org back online!

After a long time of inactivity, I've finally put some effort in modernizing and bringing freevec.org back to life. I've also decided to make it my single point of information for my current/past projects, or any other technical stuff I may deem interesting to post. Though my focus has changed significantly in the past -I do not write Altivec code 100% of my time- I will still post stuff relevant to vectorization, Altivec and NEON (SSE not so much, there is plenty of info for that out there). I might also post OpenCL stuff online, as I lately started messing with that a bit.

32-bit *signed* integer multiplication with AltiVec

While completing Eigen2 AltiVec support (should be almost complete now), I noticed that the 32-bit integer multiplication didn't work correctly all of the time. As AltiVec does not really include any instruction to do 32-bit integer multiplication, I used Apple's routine from the Apple Developer's site. But this didn't work and some results were totally off. With some debugging, I found out that this routine works for unsigned 32-bit integers, where Eigen2 uses signed integers! So, I had to search more, and to my surprise, I found no reference of any similar work. So I had 2 choices: a) ditch AltiVec integer vectorisation from Eigen2 (not acceptable!) b) implement my own method! It is obvious which choice I followed :)
UPDATE: Thanks to Matt Sealey, who noticed I could have used vec_abs() instead of vec_sub() and vec_max(). Duh! :D

Flags: 

Inverse of Matrix 4x4 using partitioning in Altivec

We tackle the 4x4 matrix inversion using the matrix partitioning method, as described in the "Numerical Recipes in C" book (2nd ed., though I guess it will be similar in the 3rd edition). Using the AltiVec SIMD unit, we achieve almost 300% increase in performance, making the routine the fastest -at least known to us, matrix inversion method!

AltiVec runtime detection in Linux

After a little search I did on Google to find how to detect AltiVec runtime in Linux (I used keywords such as runtime altivec detection and similar), I found that there is no single nice article anywhere that describes something so simple. Thankfully, I got a few good answers from benh and dwmw2 in #mklinux/FreeNode, and I decided to put these down in a cleaned up form.

SIMD: 

Matrix 4x4 Identity matrix

The nice thing about the identity matrix, is that we don't have to do any reading of the matrix. And since the form of the identity matrix is already known:

Matrix 4x4 Transpose (floats)

For the theory behind matrix transposition, please see here.

So, the 4x4 transpose would be:

Matrix 4x4 multiplication (floats)

Matrix multiplication is done on a column x row basis. Given two input matrices m2, m3 we do the multiplication and store the result back to an output matrix m1. Hence the function prototype:

void Mat44MulTo(Mat44 m1, Mat44 m2, Mat44 m3);

Matrix 4x4 scaling (floats)

Scaling a matrix implies multiplying each element with a float. Assume the following prototype:

void Mat44ScaleTo(Mat44 m1, Mat44 m2, float f);

where we multiply matrix m2 with the float f and store the result into m1.

Pages

Subscribe to Front page feed