abc-private (Dec 2004): Re: [abc] compile times

From: Ondrej LHOTAK <olhotak@sable.mcgill.ca>
Date: Wed Dec 15 2004 - 02:06:07 GMT

On Mon, Dec 13, 2004 at 10:12:28PM +0000, Oege de Moor wrote:
> However, I also tried compiling abc itself:
>
> javac 5.0secs
> ajc 5.4secs
> abc 374.9secs
> abc -O0 312.3secs

I've been excluding the generated directory containing the parsers and
running on various McGill machine, and none of the machines gets
anywhere near this bad:

magic (Quad AMD Opteron):
ajc: 8.2s
abc-O0: 100.8s (12.3x)
abc: 115.7s (14.1x)

lima (Intel P4 1.8GHz):
ajc: 20.8s
abc-O0: 165.3s (7.9x)
abc: 189.3s (9.1x)

Still not good, but not quite 70 times.

Using -time, I get roughly the same breakdown by phase as Oege.

I tested out a couple hypotheses of potential causes of this difference.

First, I thought that maybe the heavier-weight data structures in
Soot/Polyglot were causing it to get memory-starved, and that lots of time
was being spent in the garbage collector. However, giving it more memory
didn't really change the times, so that's probably not it.

Second, I tried profiling it with hprof. Nothing obvious came up.
However, as with many Java programs, a big chunk of the time is being
spent in various places in the Java collections classes. From past
experience, I know that these are very slow compared to low-level array
operations. That got me wondering whether the Eclipse compiler also uses
these slow standard collections, so I checked the CVS. It turns out that
it doesn't use them at all, and it even uses OO features in general
sparingly. Everything is done with arrays, and often, rather than use
objects and virtual dispatch, it quite happily uses int constants and a
switch statement, just like a typical C program. s/String/char[]/. As
for BCEL, it does use the Java library a little bit, but it still uses
arrays for the bulk of the stuff, and int constants rather than objects.

It's true that hprof is not always reliable. However, based on past
experience, it is very much possible for the Java collections to be
responsible for the roughly 10-fold difference that we're seeing.
It's therefore very possible that the difference has nothing to do
with abc's architecture, but with the fact that it's written in a Java
style, rather than in a C style.

More generally, I find it quite odd that we're evaluating abc's
*architecture* by comparing it to ajc, a system with a different
architecture, but also very different components. Since it is very
much possible for the components themselves to have vastly different
performance, if we only want to compare the architectures, shouldn't
we keep the components constant? For example, rather than compare abc
to ajc, we should be comparing abc to Polyglot/JavaToJimple/Soot. Or
do we consider the choice of components a part of the architecture?
If so, then perhaps we should have decided that we care so much about
compile-time performance before we started writing abc, benchmarked the
components, and used Eclipse/BCEL rather than Polyglot/Soot.

Ondrej
Received on Wed Dec 15 02:06:12 2004

This archive was generated by hypermail 2.1.8 : Wed Dec 15 2004 - 12:00:03 GMT