Stage 1: Finding a software to optimize

3 min readNov 11, 2019

The first time I had exposure to open source was in January this year with my Parallel Processing course (GPU610) at Seneca. I had to look for a program and then port it to be executed on GPU using CUDA. I felt very challenged because I took upon a project that I couldn’t finish. Because of that, I was scared to start looking for a project for my Software Optimization class.

But Chris did a very good demonstration of project selection in class this Tuesday because of which I got some confidence and started looking for something. Also, the approaching deadline scared me.

Hence began my search. I started with rpm -q -a command and looked up each project in list. That was daunting. I began looking at what others had been working on. I remember Chris talking about the fedora project. I searched for all packages containing the keyword devel.

I fell upon xz which is a utility in the lzma collection of utilities available for file compression. I downloaded the source code from their remote git repository.

Installation and Build

The installation included the standard steps:

#Installation steps:#Download from their git repository
git clone https://git.tukaani.org/xz.git#Create a build directory
mkdir xzbuildcd xz#Run the autogen.sh script
./autogen.sh#Run configure inside the cloned directory
./configure --prefix=/absolute/path/to/build/directory CFLAGS="-g -O2 -pg"
make install [-j] [number of cores]

The prefix option is required to place the executable in that directory instead of placing it in the system bin directory. I didn’t realize for quite a while that the installation was happening in my `/usr/bin/` directory, which I didn’t want. The CFLAGS is required to add the -pg option if you want to use gprof for profiling.

If you want to profile through perf do not include -pg option as it includes instrumentation data, which is not required by perf

Analysis with gprof

With that, I began my initial testing on my laptop which is an x86 architecture. The compression rate was very fast, around 0.2 seconds for a 3.9M text file. [Change file size]

I did the same setup on the class AArch64 system aarchie. Here the same file took 6 times more time.

Compressing 3.9M file
x86
   real 0m2.815s
   user 0m2.763s
   sys  0m0.052s
AArch64
   real 0m13.237s
   user 0m12.967s
   sys  0m0.180sCompressing a 10GB file
x86
   real    63m30.922s
   user    63m10.830s
   sys     0m10.385s
AArch64
   real    366m25.701s
   user    361m44.554s
   sys     1m59.391s

You wonder why I went crazy with compressing a 10GB file? Here is the call graph for my earlier file size:

I know this picture is going to blow up on the Planet. That’s why we’re building a new one. Contribute!!

I had a feeling this result wasn’t correct, and Chris confirmed it in his Friday’s demonstration. So I created a 10GB file in a tmux session on both xerxes and aarchie. As you’ve seen, it took ages for the file to be compressed on both systems, with AArch64 still taking 6 times the time for x86. Here are the resulting call graphs.

Call graph with reading and writing functions with max run time — Resulting Call graph from x86_64 system xerxes

Now the hard part. What can I optimize?

This call graph isn’t too helpful to me in providing any exact code of lines that I can change and see my changes. I ran the perf profiler to sample the binary on both systems. With the annotated feature, I was able to find some interesting hotspots in top called functions.

Follow my next blog for more information.

Stage 1: Finding a software to optimize

Installation and Build

Analysis with gprof

Now the hard part. What can I optimize?

Written by Sukhbeer Dhillon