SIMD book, update 3! Addition/Subtraction mostly finished

Finally. Apologies for the delay, but it's been a busy month. This time I will hold true to my word and upload a PDF for people to see (attached to this page).

So, what's new? Here is a list of things done:

* Finished ALL addition/subtraction related instructions for all engines and major derivatives (SSE*/AVX, VMX/VSX, NEON/armv8 NEON). With diagrams (these were the reasons it has taken so long).
* Reorganized the structure (split the book into Parts I/II, the instruction index will be in Part II, Part I will carry the design analysis of each SIMD engine.
* Added an TOC/index.
* So far, with just Addition/Subtraction Chapter and the rest empty sections, it has reached 175 pages (B5, again I'm not fixed on the size, it might actually change)! My estimate is that the whole book will surpass 800 pages with everything included.


SIMD book, second update!

From the Indiegogo page:

  • Added titlepage (simple, but will do the job)
  • Reorganized ALL instructions to include both unsigned/signed in the same entity
  • Added Saturated Addition, Subtraction and Saturated Subtraction
  • Added ARMv8 NEON instructions taken from ARM infocenter draft
  • Fixed some instructions (added 64-bit arithmetic for NEON)
  • Added some special addition/subtraction, like add/sub with carry(vmx/vsx), addsub(SSE3/AVX)
  • Added some in-vector sum additions, sum reductions but no descriptions yet
  • Added diagram for 8-bit addition/subtraction (still need lots more).
  • Removed VMX128, couldn't find enough information, an email to some IBM toolchain developers was left unanswered, so I guess noone really will really care if that engine is left out, if enough people insist on it, please also be kind enough to provide some documentation on it.

SIMD book, first draft published!

Check activity here:

From the update:

Ok, I've been busy the past days, I started writing the book (using LaTeX :), and I'd like to say that progress has been good. I fixed the current list of SIMD engines that I'm going to include and it's a long one:

Crowdfunded campaign for a SIMD Comparison Reference Book on Indiegogo!

This is my first ever attempt to do a project using crowdfunding. The project is something I've always wanted to do, a SIMD Engines Comparison Reference, that is a reference book that compares all three major SIMD engines (Altivec, NEON, SSE*). If, like me, you're tired of googling all the time to find information about a specific SSE instruction and how to do addition/multiplication/shuffling/etc on more than one SIMD engine, please help fund this project!

Powerbook G4 12" revamping, part 1

I decided to give my trusty powerbook G4 a second chance. But I thought it might be a good idea to upgrade some parts of it in the meantime. Now being as it is, I can't upgrade the CPU or RAM (G4 is fixed at 1Ghz and RAM at 1.2GB), but I could upgrade the disk and screen. This time I upgraded the disk plus I replaced the thermal toothpaste with something much more efficient so it wouldn't get as hot.

I won't go into the actual details of doing the upgrades, these are covered by the excellent articles: back online!

After a long time of inactivity, I've finally put some effort in modernizing and bringing back to life. I've also decided to make it my single point of information for my current/past projects, or any other technical stuff I may deem interesting to post. Though my focus has changed significantly in the past -I do not write Altivec code 100% of my time- I will still post stuff relevant to vectorization, Altivec and NEON (SSE not so much, there is plenty of info for that out there). I might also post OpenCL stuff online, as I lately started messing with that a bit.

32-bit *signed* integer multiplication with AltiVec

While completing Eigen2 AltiVec support (should be almost complete now), I noticed that the 32-bit integer multiplication didn't work correctly all of the time. As AltiVec does not really include any instruction to do 32-bit integer multiplication, I used Apple's routine from the Apple Developer's site. But this didn't work and some results were totally off. With some debugging, I found out that this routine works for unsigned 32-bit integers, where Eigen2 uses signed integers! So, I had to search more, and to my surprise, I found no reference of any similar work. So I had 2 choices: a) ditch AltiVec integer vectorisation from Eigen2 (not acceptable!) b) implement my own method! It is obvious which choice I followed :)
UPDATE: Thanks to Matt Sealey, who noticed I could have used vec_abs() instead of vec_sub() and vec_max(). Duh! :D


Inverse of Matrix 4x4 using partitioning in Altivec

We tackle the 4x4 matrix inversion using the matrix partitioning method, as described in the "Numerical Recipes in C" book (2nd ed., though I guess it will be similar in the 3rd edition). Using the AltiVec SIMD unit, we achieve almost 300% increase in performance, making the routine the fastest -at least known to us, matrix inversion method!

AltiVec runtime detection in Linux

After a little search I did on Google to find how to detect AltiVec runtime in Linux (I used keywords such as runtime altivec detection and similar), I found that there is no single nice article anywhere that describes something so simple. Thankfully, I got a few good answers from benh and dwmw2 in #mklinux/FreeNode, and I decided to put these down in a cleaned up form.


Matrix 4x4 Identity matrix

The nice thing about the identity matrix, is that we don't have to do any reading of the matrix. And since the form of the identity matrix is already known:


Subscribe to Front page feed