Submitted by markos on Mon, 28/10/2013 - 01:48
This paper is hosted here actually! First version was using drutex-rendered of LaTeX excerpts of the original paper, today I'm using MathJax, much better. I wrote this paper when I was attempting to optimize MySQL with Altivec as part of a Genesi project, unfortunately it didn't amount to much in terms of accelerating MySQL, but I did invent an algorithm to vectorize a whole certain family of hashing functions.
The result was this paper
Submitted by markos on Mon, 28/10/2013 - 01:43
Actually that one is already on this site :)
In 2008, I tried to revive my original idea of vectorizing the world for Altivec, I actually made good progress, then I made the mistake of getting a completely unrelated project (Java EE, ugh) that basically eventually made me shutdown my company, and lose 2 years of possible progress in Altivec and vectorizations.
Check here for the paper.
Submitted by markos on Mon, 28/10/2013 - 01:33
Back in 2005, I was convinced that I could vectorize most/all of vital/unoptimized core routines of the system to use Altivec. Sadly, I was wrong, it was a huge task and it wasn't even my full-time job. I did however manage to optimize *some* routines, even as a proof of concept. Adler32 hashing function was the first of those and to prove my point, I wrote a small paper for it. It wasn't really entirely rigorous in terms of mathematical fullness of proof, but it was correct and the code was indeed that much faster.
Submitted by markos on Fri, 10/07/2009 - 14:26
Here's the link to the announcement:
From the press release:
YDL 6.2 now offers libfreevec, a (LGPL) library with replacement routines for GLIBC, such as memcpy(), strlen(), etc. These routines, which have been rewritten and optimized to use the AltiVec vector engine found in the G4/G4+ PowerPC CPUs, can provide for up to 25% increase in application performance.
Submitted by markos on Sat, 23/08/2008 - 22:55
While completing Eigen2 AltiVec support (should be almost complete now), I noticed that the 32-bit integer multiplication didn't work correctly all of the time. As AltiVec does not really include any instruction to do 32-bit integer multiplication, I used Apple's routine from the Apple Developer's site. But this didn't work and some results were totally off. With some debugging, I found out that this routine works for unsigned 32-bit integers, where Eigen2 uses signed integers! So, I had to search more, and to my surprise, I found no reference of any similar work. So I had 2 choices: a) ditch AltiVec integer vectorisation from Eigen2 (not acceptable!) b) implement my own method! It is obvious which choice I followed :)
UPDATE: Thanks to Matt Sealey, who noticed I could have used vec_abs() instead of vec_sub() and vec_max(). Duh! :D