Port amd64 SIMD libc optimizations to aarch64
Student: GetzMikalsen (getz@FreeBSD.com)
Mentor: RobertClausecker (fuz@FreeBSD.com), EdMaste (emaste@FreeBSD.com)
Project description
The goal of the project is to port the SIMD optimized routines written for amd64 to aarch64 using Arm NEON instructions. Several string functions already had SIMD routines located in /contrib/arm-optimized-routines but some were less than optimal, in those cases the amd64 variants were ported with great success.
Code to be ported is located at /src/lib/libc/amd64/string
https://github.com/freebsd/freebsd-src/tree/main/lib/libc/amd64/string
New code is located in /src/lib/libc/aarch64/string
https://github.com/freebsd/freebsd-src/tree/main/lib/libc/aarch64/string
Project outcome
Almost all string functions are now SIMD enhanced for aarch64, pending an exp-run before merge into -CURRENT.
A bug in the existing memccpy implementation was also discovered which could result in an overread condition causing a segfault.
What's left to do
str(c)spn would benefit from a SIMDized check which bytes are in a set.
NEON has no nice instruction to do this like pcmpistri for amd64 or MATCH for SVE so the above algorithm could work well.
Related DR's
Reviews |
Progress reports
Deliverables
HEADER FUNCTION NOTES string.h stpcpy String copy functions stpncpy strcat strncat strcpy strncpy strlcpy strlcat strchrnul strrchr strcspn strspn strpbrk strsep String tokenisation functions strtok_r strcmp String comparison functions strncmp memcpy Memory copy functions memccpy memset Memory initialisation functions memchr Memory search functions memrchr memmem memcmp Memory comparison function strlen String length
Milestones
- May 31st: Start of coding
- June 3rd: Second week
- July 8th - July 12th: Mid-term Evaluations
- August 19th - August 26th: Final week
Test Plan
Code will be tested using the available FreeBSD tests and the ones borrowed from NetBSD on a Raspberry Pi5. Additional tests will be written if needed. Performance will be measured using fuz' tool strperf (https://github.com/clausecker/strperf) and results will be analyzed using benchstat from devel/go-perf.
The Code
https://git.sr.ht/~getz/aarch64_string.h https://github.com/soppelmann/freebsd-src
Notes
I will publish progress reports here and in-depth writeups for interesting solutions on my blog, https://df.lth.se/~getz or https://getz.sdf.org)
Useful links
https://danlark.org/2023/06/06/csinc-the-arm-instruction-you-didnt-know-you-wanted/
https://www.corsix.org/content/whirlwind-tour-aarch64-vector-instructions