Stage III: Finishing up

Sukhbeer Dhillon
sukhbeerdhillon
Published in
3 min readDec 14, 2019

--

In the last part of my project series for my software optimization course, I made a minor change that would allow xz to call the 8-byte memcmplen method with unaligned access for Aarch64 platforms. The results of that didn’t confirm make my compression faster. In this post, I want to perform profiling for this changed version and compare it against the existing profile I got in the first part of this series. I will also talk about unaligned access addition in ARM processors.

I wanted to confirm if my directive was even working, so I tried to produce the preprocessed code for memcmplen.h . I had the following error:

sysdefs.h: No such file or directory

This header file was in a different directory so I had to include it while calling the preprocessor by using option -iquoteand specifying the folder that has the header file. Here’s what worked

cpp memcmplen.h  -iquote ~/project/git/xz/src/common

Turns out that the code I wanted to be executed was not even selected by the preprocessor directives I put. I tried two things: Use flag --enable-unaligned-acces while running the configure script, so that TUKLIB_FAST_UNALIGNED_ACCESS would be set to 1. The other thing I tried was modifying the code so that unaligned access would anyhow be executed.

To my surprise, the result time didn’t change in either case.

/*Modify source code to have unaligned access*/
real 35m36.629s
user 35m11.003s
sys 0m12.343s
/*Include --enable-unaligned-access flag while configuring*/
real 35m38.890s
user 35m14.585s
sys 0m11.185s
Flat profile from gprof for first build
Visual call graph for second build

These results demonstrate that for this software, unaligned access of memory while comparing length of two buffers wouldn’t improve efficiency. However, as we noticed in my first blog, there is a huge significant difference between compression time on x86_64 and Aarch64. Maybe it is some other piece of code that we have not looked in this scope. From the call graphs and earlier, profiling results, one can also see that memcmplen is not in the list of hotspot functions.

I do not have anything to give to this project’s upstream. This project is not dead, if you check their git log, they may have slowed down but are still active. This project gave me an opportunity to look at unaligned access of memory for processors. I had to read up a lot to understand what that even means. I first heard about this in my Parallel Processing for GPU class, but I didn’t quite get it.

This course has been very helpful. Even though I may not have given it my best, I learned a lot through it. I have become so much more comfortable in using the command line. I rarely boot into Windows anymore on my laptop. Chris Tyler’s methodology of projecting his way of working on the systems we were supposed to interact with and overcoming errors if any taught me more than any lecture. Learning about sys admin commands, assembly intro and SIMD material broadens my knowledge of computer architecture overall.

Below are some references if you’d like to read up more on unaligned access on ARM and its progress now.

https://www.kernel.org/doc/Documentation/unaligned-memory-access.txt

https://medium.com/@iLevex/the-curious-case-of-unaligned-access-on-arm-5dd0ebe24965

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka15414.html

https://stackoverflow.com/questions/32062894/take-advantage-of-arm-unaligned-memory-access-while-writing-clean-c-code

https://fgiesen.wordpress.com/2013/10/18/bit-scanning-equivalencies/

--

--