Search
Primary links
Please donate to libfreevec to ensure its continuing development! Donations are done via Paypal.
memccpy()
markos — Wed, 05/03/2008 - 21:47
Description
According to the man page, the memccpy() function copies no more than n bytes from memory area src to memory area dest, stopping when the character c is found.. It returns a pointer to the next character in dest after c, or NULL if c was not found in the first n characters of src.
The plain reference implementation does a per-byte processing, which is quite slow and doesn't take advantage of modern CPUs, either 32 or 64-bit. In libfreevec we chose to take advantage of the CPU's features and at use modern SIMD units (AltiVec for PowerPC CPUs, and in the future SSE for x86 CPUs). At the very least the processing will be done in 32-bit quantities -or 64-bit for 64-bit CPUs. The performance gains are obvious from the graphs below.
Each CPU in detail:
And for comparison here is the result of the same benchmark run on an Athlon X2 5000 (2.5Ghz), running 32-bit code:
Results/Comments
Though the Athlon is a much faster CPU than the G5, it actually performs worse than the G5 (and when Altivec is used, even the G4 beats it), due to its totally dumb implementation of memccpy() in glibc. True, in a real application, the speed of execution depends on more things than only a single function. But that is no reason for a function to be that slow. We also see that the utilization of a SIMD unit like AltiVec gives a new breath of power to the older CPUs. In particular the new MPC8610 CPU is a formidable contender, given its much faster memory bus (533MHz vs 133Mhz of the older G4).