memchr()Submitted by markos on Tue, 01/29/2008 - 17:39.
Description According to the man page, memchr() The memchr() function scans the first n bytes of the memory area pointed to by s for the character c. The first byte to match c (interpreted as an unsigned character) stops the operation. It return a pointer to the matching byte or NULL if the character does not occur in the given memory area. The glibc implementation searches for the char using 32-bit (or 64-bit depending on the arch) blocks and bitmasking. In libfreevec we also do that, but as we already have stated, we also use modern SIMD units (AltiVec for PowerPC CPUs, and in the future SSE for x86 CPUs). This has the effect that for smaller sizes, the performance is the same but for larger sizes performance increases dramatically. Each CPU in detail: And for comparison here is the result of the same benchmark run on an Athlon X2 5000 (2.5Ghz), running 32-bit code: Results/Comments The Athlon's fast integer units show a substantial gain in all cases (almost 2x as fast as the G5), but of course this also has to do with the fact that glibc includes an assembly optimized version of memchr(). However once Altivec kicks in, the G5 is almost 300% faster than the asm i686 version (remember we're running the benchmarks in 32-bit mode, 64-bit mode will follow). An increase in performance is expected also when SSE is used, but due to the nature of this SIMD unit, we don't expect it will outperform the G5/AltiVec combination. Plus, the all-CPU chart is showing only aligned (best-case) scenarios. In the Athlon-specific chart, memchr() performs inconsistently when doing unaligned accesses, which does not happen with the powerpc functions. SIMD
|
SIMDUser login |