Stage 1: Finding a software to optimize

Sukhbeer Dhillon
3 min readNov 11, 2019

The first time I had exposure to open source was in January this year with my Parallel Processing course (GPU610) at Seneca. I had to look for a program and then port it to be executed on GPU using CUDA. I felt very challenged because I took upon a project that I couldn’t finish. Because of that, I was scared to start looking for a project for my Software Optimization class.

But Chris did a very good demonstration of project selection in class this Tuesday because of which I got some confidence and started looking for something. Also, the approaching deadline scared me.

Hence began my search. I started with rpm -q -a command and looked up each project in list. That was daunting. I began looking at what others had been working on. I remember Chris talking about the fedora project. I searched for all packages containing the keyword devel.

I fell upon xz which is a utility in the lzma collection of utilities available for file compression. I downloaded the source code from their remote git repository.

Installation and Build

The installation included the standard steps:

#Installation steps:#Download from their git repository
git clone https://git.tukaani.org/xz.git
#Create a build directory
mkdir xzbuild
cd xz#Run the autogen.sh script
./autogen.sh
#Run configure inside the cloned directory
./configure --prefix=/absolute/path/to/build/directory CFLAGS="-g -O2 -pg"
make install [-j] [number of cores]

The prefix option is required to place the executable in that directory instead of placing it in the system bin directory. I didn’t realize for quite a while that the installation was happening in my `/usr/bin/` directory, which I didn’t want. The CFLAGS is required to add the -pg option if you want to use gprof for profiling.

If you want to profile through perf do not include -pg option as it includes instrumentation data, which is not required by perf

Analysis with gprof

With that, I began my initial testing on my laptop which is an x86 architecture. The compression rate was very fast, around 0.2 seconds for a 3.9M text file. [Change file size]

I did the same setup on the class AArch64 system aarchie. Here the same file took 6 times more time.

Compressing 3.9M file
x86
real 0m2.815s
user 0m2.763s
sys 0m0.052s
AArch64
real 0m13.237s
user 0m12.967s
sys 0m0.180s
Compressing a 10GB file
x86
real 63m30.922s
user 63m10.830s
sys 0m10.385s
AArch64
real 366m25.701s
user 361m44.554s
sys 1m59.391s

You wonder why I went crazy with compressing a 10GB file? Here is the call graph for my earlier file size:

Isn’t this horrendously suspicious?

I know this picture is going to blow up on the Planet. That’s why we’re building a new one. Contribute!!

I had a feeling this result wasn’t correct, and Chris confirmed it in his Friday’s demonstration. So I created a 10GB file in a tmux session on both xerxes and aarchie. As you’ve seen, it took ages for the file to be compressed on both systems, with AArch64 still taking 6 times the time for x86. Here are the resulting call graphs.

Call graph with reading and writing functions with max run time
Resulting Call graph from x86_64 system xerxes
Call graph from AArch64 system aarchie

Now the hard part. What can I optimize?

This call graph isn’t too helpful to me in providing any exact code of lines that I can change and see my changes. I ran the perf profiler to sample the binary on both systems. With the annotated feature, I was able to find some interesting hotspots in top called functions.

Follow my next blog for more information.

--

--