Stage 1: Finding a software to optimize
The first time I had exposure to open source was in January this year with my Parallel Processing course (GPU610) at Seneca. I had to look for a program and then port it to be executed on GPU using CUDA. I felt very challenged because I took upon a project that I couldn’t finish. Because of that, I was scared to start looking for a project for my Software Optimization class.
But Chris did a very good demonstration of project selection in class this Tuesday because of which I got some confidence and started looking for something. Also, the approaching deadline scared me.
Hence began my search. I started with rpm -q -a
command and looked up each project in list. That was daunting. I began looking at what others had been working on. I remember Chris talking about the fedora project. I searched for all packages containing the keyword devel.
I fell upon xz
which is a utility in the lzma collection of utilities available for file compression. I downloaded the source code from their remote git repository.
Installation and Build
The installation included the standard steps:
#Installation steps:#Download from their git repository
git clone https://git.tukaani.org/xz.git#Create a build directory
mkdir xzbuildcd xz#Run the autogen.sh script
./autogen.sh#Run configure inside the cloned directory
./configure --prefix=/absolute/path/to/build/directory CFLAGS="-g -O2 -pg"
make install [-j] [number of cores]
The prefix option is required to place the executable in that directory instead of placing it in the system bin directory. I didn’t realize for quite a while that the installation was happening in my `/usr/bin/` directory, which I didn’t want. The CFLAGS
is required to add the -pg
option if you want to use gprof for profiling.
If you want to profile through perf
do not include -pg
option as it includes instrumentation data, which is not required by perf
Analysis with gprof
With that, I began my initial testing on my laptop which is an x86 architecture. The compression rate was very fast, around 0.2 seconds for a 3.9M text file. [Change file size]
I did the same setup on the class AArch64 system aarchie. Here the same file took 6 times more time.
Compressing 3.9M file
x86
real 0m2.815s
user 0m2.763s
sys 0m0.052s
AArch64
real 0m13.237s
user 0m12.967s
sys 0m0.180sCompressing a 10GB file
x86
real 63m30.922s
user 63m10.830s
sys 0m10.385s
AArch64
real 366m25.701s
user 361m44.554s
sys 1m59.391s
You wonder why I went crazy with compressing a 10GB file? Here is the call graph for my earlier file size:
I know this picture is going to blow up on the Planet. That’s why we’re building a new one. Contribute!!
I had a feeling this result wasn’t correct, and Chris confirmed it in his Friday’s demonstration. So I created a 10GB file in a tmux
session on both xerxes and aarchie. As you’ve seen, it took ages for the file to be compressed on both systems, with AArch64 still taking 6 times the time for x86. Here are the resulting call graphs.
Now the hard part. What can I optimize?
This call graph isn’t too helpful to me in providing any exact code of lines that I can change and see my changes. I ran the perf
profiler to sample the binary on both systems. With the annotated feature, I was able to find some interesting hotspots in top called functions.
Follow my next blog for more information.