[abc] around weaving

From: Prof. Laurie HENDREN <hendren@cs.mcgill.ca>
Date: Sat Apr 16 2005 - 21:25:13 BST

I have been working for the last day or so trying to understand
the impact of the different around weavers on performance, and
particularly why we get a slowdown for the weka benchmark.

I have created an example program that exhibits the basic behaviour
we see in weka, but is small enough to study and play around with.

In the attached jar file, you will find a modified version of the
EnforceCodeStandards.java and a Test program Test.java.

Then there are the following directories:

    abc - what you get when compiling with most recent abc, no flags
              (does some inlining)

    abcforce - most recent abc, with forcing around inlining
              (forces inlining, but does not inline proceeds)

    abcnewforce - Jame's version of abc, forcing around inlining by
                      creating pairs of static methods, one for the
                   specialized advice, and one for the proceed.

    abcnoinline - most recent abc, no around inlining

    ajc - produced by ajc1.2.1

    ajcsoot - what was produced by ajc1.2.1 followed by soot -O (this is
               to make it fair with abc wrt other opts)

    abcnewforce/dava/src - what abcnewforce produced, but decompiled
               and recompiled with javac

    abcnewforce/dava/modsrc - as above, but with types of specialized
               proceed methods made more specific and extra casts
               removed (I did this by hand)

Most directories also have dava/ subdirectories so you can look
at woven code as sourcecode.

Then I timed all these versions on three different architectures. See
numbers below:

My machine (abctm), tofu, and my windows laptop.

Here are the numbers from tofu, others follow similar trends:

============= abc =============
Count is: 600000000
18.770u 0.010s 0:18.82 99.7% 0+0k 0+0io 1469pf+0w
Count is: 600000000
18.720u 0.070s 0:18.82 99.8% 0+0k 0+0io 1469pf+0w
Count is: 600000000
18.720u 0.040s 0:18.82 99.6% 0+0k 0+0io 1371pf+0w
Count is: 600000000
18.760u 0.020s 0:18.82 99.7% 0+0k 0+0io 1469pf+0w
Count is: 600000000
18.750u 0.020s 0:18.84 99.6% 0+0k 0+0io 1469pf+0w
============= abcforce =============
Count is: 600000000
18.740u 0.020s 0:18.83 99.6% 0+0k 0+0io 1469pf+0w
Count is: 600000000
18.720u 0.050s 0:18.84 99.6% 0+0k 0+0io 1469pf+0w
Count is: 600000000
18.730u 0.020s 0:18.82 99.6% 0+0k 0+0io 1369pf+0w
Count is: 600000000
18.760u 0.030s 0:18.83 99.7% 0+0k 0+0io 1469pf+0w
Count is: 600000000
18.770u 0.020s 0:18.82 99.8% 0+0k 0+0io 1469pf+0w
============= abcnewforce =============
Count is: 600000000
10.440u 0.030s 0:10.54 99.3% 0+0k 0+0io 1371pf+0w
Count is: 600000000
10.470u 0.000s 0:10.54 99.3% 0+0k 0+0io 1469pf+0w
Count is: 600000000
10.440u 0.020s 0:10.53 99.3% 0+0k 0+0io 1469pf+0w
Count is: 600000000
10.480u 0.020s 0:10.55 99.5% 0+0k 0+0io 1469pf+0w
Count is: 600000000
10.470u 0.020s 0:10.54 99.5% 0+0k 0+0io 1469pf+0w
============= abcnoinline =============
Count is: 600000000
22.450u 0.030s 0:22.54 99.7% 0+0k 0+0io 1469pf+0w
Count is: 600000000
22.730u 0.010s 0:22.78 99.8% 0+0k 0+0io 1469pf+0w
Count is: 600000000
22.230u 0.020s 0:22.31 99.7% 0+0k 0+0io 1469pf+0w
Count is: 600000000
22.220u 0.030s 0:22.32 99.6% 0+0k 0+0io 1371pf+0w
Count is: 600000000
22.480u 0.010s 0:22.52 99.8% 0+0k 0+0io 1469pf+0w
============= ajc =============
Count is: 600000000
8.890u 0.030s 0:08.93 99.8% 0+0k 0+0io 1468pf+0w
Count is: 600000000
8.700u 0.020s 0:08.75 99.6% 0+0k 0+0io 1468pf+0w
Count is: 600000000
8.930u 0.040s 0:08.99 99.7% 0+0k 0+0io 1468pf+0w
Count is: 600000000
8.660u 0.040s 0:08.74 99.5% 0+0k 0+0io 1368pf+0w
Count is: 600000000
8.870u 0.030s 0:08.98 99.1% 0+0k 0+0io 1368pf+0w
============= ajcsoot =============
Count is: 600000000
8.630u 0.020s 0:08.74 98.9% 0+0k 0+0io 1337pf+0w
Count is: 600000000
8.660u 0.080s 0:08.75 99.8% 0+0k 0+0io 1435pf+0w
Count is: 600000000
8.650u 0.030s 0:08.74 99.3% 0+0k 0+0io 1335pf+0w
Count is: 600000000
8.670u 0.020s 0:08.74 99.4% 0+0k 0+0io 1435pf+0w
Count is: 600000000
8.690u 0.010s 0:08.73 99.6% 0+0k 0+0io 1435pf+0w
========== abcnew recompiled from dava ========
/home/research/ccl/hendren/AroundEx/abcnewforce/dava/src
Count is: 600000000
8.260u 0.010s 0:08.34 99.1% 0+0k 0+0io 1598pf+0w
Count is: 600000000
8.450u 0.010s 0:08.52 99.2% 0+0k 0+0io 1496pf+0w
Count is: 600000000
8.710u 0.020s 0:08.76 99.6% 0+0k 0+0io 1498pf+0w
Count is: 600000000
8.350u 0.020s 0:08.40 99.6% 0+0k 0+0io 1598pf+0w
Count is: 600000000
8.720u 0.040s 0:08.76 100.0% 0+0k 0+0io 1598pf+0w
========== previous one, but hand removed casts wrt to proceeds ========
/home/research/ccl/hendren/AroundEx/abcnewforce/dava/modsrc
Count is: 600000000
8.030u 0.030s 0:08.16 98.7% 0+0k 0+0io 1598pf+0w
Count is: 600000000
8.090u 0.030s 0:08.16 99.5% 0+0k 0+0io 1598pf+0w
Count is: 600000000
7.740u 0.000s 0:07.80 99.2% 0+0k 0+0io 1498pf+0w
Count is: 600000000
8.020u 0.030s 0:08.15 98.7% 0+0k 0+0io 1598pf+0w
Count is: 600000000
7.740u 0.020s 0:07.86 98.7% 0+0k 0+0io 1598pf+0w

Some things to note:

1. there really are significant differences in peformance for different weaving
   strategies. Of course, this benchmark is set up to show off those
   differences (lots of applications of around advice, and not much else
   going on in the benchmark). However, the differences are significant.

2. the non-inlining abc strategy is space efficient, but expensive. The
   calls to the methods, switches, casts, and extra params do cost.

3. the current abc inlining strategy only gets part of the way to the
     ajc performance, this is because it does not inline the proceed.

4. James' inlining strategy does pretty well and is close to ajc.

5. However, there is still room for improvement on James' strategy.
    The key point here is that the proceed static methods
     that get generated have
     parameters and return types of Object and then casts to the correct
     type. These casts actually do incur overhead as you can see from
     the different between the last two versions in the data above.

   I also note that ajc generates private final static methods, whereas
   James' version currently generates public methods.

Here is an example from the benchmark:

---------------------
Current abc inlining:
---------------------

SHADOW:

        EnforceCodingStandards.aspectOf();
            r38 = Test.abc$static$proceed$EnforceCodingStandards$around$0(0, $r2, null, 0);

            if (r38 == null)
            {
                System.err.println("Detected null return value after calling at line 10");
            }

PROCEED:

 public static java.lang.Object abc$static$proceed$EnforceCodingStandards$around$0(int shadowID$0, java.lang.Object contextArgFormal$3, java.lang.Object
contextArgFormal$8, int contextArgFormal$14)
    {
        A a;
        B b;

        switch (shadowID$0)
        {
            case 0:
                a = (A) contextArgFormal$3;
                return a.foo(a);

            case 1:
                b = (B) contextArgFormal$3;
                return b.goo(b);

            case 2:
                return ((A) contextArgFormal$8).ident(contextArgFormal$3);

            case 3:
                return ((StringBuffer) contextArgFormal$3).append((String) contextArgFormal$8);

            case 4:
                return ((StringBuffer) contextArgFormal$3).append(contextArgFormal$14);

            case 5:
                return ((StringBuffer) contextArgFormal$3).toString();

            default:
                throw new RuntimeException();
        }
    }
}

---------------
James' Inlining
----------------

SHADOW:

            EnforceCodingStandards.aspectOf();
            a2 = (A) Test.around$0$EnforceCodingStandards$1$0$inline($r0);

SPECIALIZED ADVICE:

   public static java.lang.Object around$0$EnforceCodingStandards$1$0$inline(java.lang.Object contextArgFormal$5)
    {
        java.lang.Object lRetVal;

        lRetVal = Test.inline$abc$static$proceed$EnforceCodingStandards$around$0$Test$0(contextArgFormal$5);

        if (lRetVal == null)
        {
            System.err.println("Detected null return value after calling at line 10");
        }

        return lRetVal;
    }

SPECIALIZED PROCEED: **** NOTE that return type and param is Object ******
                     **** Also note that since it has come from the
                          case of the switch statement, it has been
                          recognized that only one param is needed ...
                          whereas ajc will use two ....

    public static java.lang.Object inline$abc$static$proceed$EnforceCodingStandards$around$0$Test$0(java.lang.Object contextArgFormal$3)
    {
        A a;

        a = (A) contextArgFormal$3;
        return a.foo(a);
    }

------------
ajc Inlining
------------

SHADOW: **** NOTE there spurious copy statements, that's part of the reason
              that running the result through Soot cleans things up.

            r5 = r2;
            r6 = r2;
            r8 = (A) Test.foo_aroundBody1$advice(r6, r5, EnforceCodingStandards.aspectOf(), null);

SPECIALIZED ADVICE: ****** Note that they have spurious params, but the params
                           have the right type

    private static final java.lang.Object foo_aroundBody1$advice(A r0, A r1, EnforceCodingStandards r2, org.aspectj.runtime.internal.AroundClosure r3)
    {
        org.aspectj.runtime.internal.AroundClosure r4;
        A r5;

        r4 = r3;
        r5 = Test.foo_aroundBody0(r0, r1);

        if (r5 == null)
        {
            System.err.println("Detected null return value after calling at line 10");
        }

        return r5;
    }

SPECIALIZED PROCEED: **** Note the types are right, no casts needed.

private static final A foo_aroundBody0(A r0, A r1)
    {
        return r0.foo(r1);
    }

-------------------------------------------------------------------------

So - what is the conclusion .... we want to be able to create the static
methods as James has done, but we can do even better if we can create
the static proceed methods with the correct types and avoid the extra
casting. James and Sascha - what is your opinion on this? Is it a
difficult modification of what we already have? I am finishing up the
PLDI paper tonight and tomorrow. It would be nice to have these new
numbers too ... but only if they are reasonably easy to get. In any
case we should push a bit more on this and see if we can get that
extra little bit of improvement.

Cheers, Laurie

+-------------------------------------------------------------+
| Laurie Hendren, Professor, School of Computer Science |
| McGill University |
| 318 McConnell Engineering Building tel: (514) 398-7391 |
| 3480 University Street fax: (514) 398-3883 |
| Montreal, Quebec H3A 2A7 hendren@cs.mcgill.ca |
| CANADA http://www.sable.mcgill.ca/~hendren |
| http://wwww.sable.mcgill.ca http://aspectbench.org |
+-------------------------------------------------------------+

Received on Sun Apr 17 00:06:59 2005

This archive was generated by hypermail 2.1.8 : Sun Apr 17 2005 - 00:10:05 BST