I have been working for the last day or so trying to understand
the impact of the different around weavers on performance, and
particularly why we get a slowdown for the weka benchmark.
I have created an example program that exhibits the basic behaviour
we see in weka, but is small enough to study and play around with.
In the attached jar file, you will find a modified version of the
EnforceCodeStandards.java and a Test program Test.java.
Then there are the following directories:
abc - what you get when compiling with most recent abc, no flags
(does some inlining)
abcforce - most recent abc, with forcing around inlining
(forces inlining, but does not inline proceeds)
abcnewforce - Jame's version of abc, forcing around inlining by
creating pairs of static methods, one for the
specialized advice, and one for the proceed.
abcnoinline - most recent abc, no around inlining
ajc - produced by ajc1.2.1
ajcsoot - what was produced by ajc1.2.1 followed by soot -O (this is
to make it fair with abc wrt other opts)
abcnewforce/dava/src - what abcnewforce produced, but decompiled
and recompiled with javac
abcnewforce/dava/modsrc - as above, but with types of specialized
proceed methods made more specific and extra casts
removed (I did this by hand)
Most directories also have dava/ subdirectories so you can look
at woven code as sourcecode.
Then I timed all these versions on three different architectures. See
numbers below:
My machine (abctm), tofu, and my windows laptop.
Here are the numbers from tofu, others follow similar trends:
============= abc =============
Count is: 600000000
18.770u 0.010s 0:18.82 99.7% 0+0k 0+0io 1469pf+0w
Count is: 600000000
18.720u 0.070s 0:18.82 99.8% 0+0k 0+0io 1469pf+0w
Count is: 600000000
18.720u 0.040s 0:18.82 99.6% 0+0k 0+0io 1371pf+0w
Count is: 600000000
18.760u 0.020s 0:18.82 99.7% 0+0k 0+0io 1469pf+0w
Count is: 600000000
18.750u 0.020s 0:18.84 99.6% 0+0k 0+0io 1469pf+0w
============= abcforce =============
Count is: 600000000
18.740u 0.020s 0:18.83 99.6% 0+0k 0+0io 1469pf+0w
Count is: 600000000
18.720u 0.050s 0:18.84 99.6% 0+0k 0+0io 1469pf+0w
Count is: 600000000
18.730u 0.020s 0:18.82 99.6% 0+0k 0+0io 1369pf+0w
Count is: 600000000
18.760u 0.030s 0:18.83 99.7% 0+0k 0+0io 1469pf+0w
Count is: 600000000
18.770u 0.020s 0:18.82 99.8% 0+0k 0+0io 1469pf+0w
============= abcnewforce =============
Count is: 600000000
10.440u 0.030s 0:10.54 99.3% 0+0k 0+0io 1371pf+0w
Count is: 600000000
10.470u 0.000s 0:10.54 99.3% 0+0k 0+0io 1469pf+0w
Count is: 600000000
10.440u 0.020s 0:10.53 99.3% 0+0k 0+0io 1469pf+0w
Count is: 600000000
10.480u 0.020s 0:10.55 99.5% 0+0k 0+0io 1469pf+0w
Count is: 600000000
10.470u 0.020s 0:10.54 99.5% 0+0k 0+0io 1469pf+0w
============= abcnoinline =============
Count is: 600000000
22.450u 0.030s 0:22.54 99.7% 0+0k 0+0io 1469pf+0w
Count is: 600000000
22.730u 0.010s 0:22.78 99.8% 0+0k 0+0io 1469pf+0w
Count is: 600000000
22.230u 0.020s 0:22.31 99.7% 0+0k 0+0io 1469pf+0w
Count is: 600000000
22.220u 0.030s 0:22.32 99.6% 0+0k 0+0io 1371pf+0w
Count is: 600000000
22.480u 0.010s 0:22.52 99.8% 0+0k 0+0io 1469pf+0w
============= ajc =============
Count is: 600000000
8.890u 0.030s 0:08.93 99.8% 0+0k 0+0io 1468pf+0w
Count is: 600000000
8.700u 0.020s 0:08.75 99.6% 0+0k 0+0io 1468pf+0w
Count is: 600000000
8.930u 0.040s 0:08.99 99.7% 0+0k 0+0io 1468pf+0w
Count is: 600000000
8.660u 0.040s 0:08.74 99.5% 0+0k 0+0io 1368pf+0w
Count is: 600000000
8.870u 0.030s 0:08.98 99.1% 0+0k 0+0io 1368pf+0w
============= ajcsoot =============
Count is: 600000000
8.630u 0.020s 0:08.74 98.9% 0+0k 0+0io 1337pf+0w
Count is: 600000000
8.660u 0.080s 0:08.75 99.8% 0+0k 0+0io 1435pf+0w
Count is: 600000000
8.650u 0.030s 0:08.74 99.3% 0+0k 0+0io 1335pf+0w
Count is: 600000000
8.670u 0.020s 0:08.74 99.4% 0+0k 0+0io 1435pf+0w
Count is: 600000000
8.690u 0.010s 0:08.73 99.6% 0+0k 0+0io 1435pf+0w
========== abcnew recompiled from dava ========
/home/research/ccl/hendren/AroundEx/abcnewforce/dava/src
Count is: 600000000
8.260u 0.010s 0:08.34 99.1% 0+0k 0+0io 1598pf+0w
Count is: 600000000
8.450u 0.010s 0:08.52 99.2% 0+0k 0+0io 1496pf+0w
Count is: 600000000
8.710u 0.020s 0:08.76 99.6% 0+0k 0+0io 1498pf+0w
Count is: 600000000
8.350u 0.020s 0:08.40 99.6% 0+0k 0+0io 1598pf+0w
Count is: 600000000
8.720u 0.040s 0:08.76 100.0% 0+0k 0+0io 1598pf+0w
========== previous one, but hand removed casts wrt to proceeds ========
/home/research/ccl/hendren/AroundEx/abcnewforce/dava/modsrc
Count is: 600000000
8.030u 0.030s 0:08.16 98.7% 0+0k 0+0io 1598pf+0w
Count is: 600000000
8.090u 0.030s 0:08.16 99.5% 0+0k 0+0io 1598pf+0w
Count is: 600000000
7.740u 0.000s 0:07.80 99.2% 0+0k 0+0io 1498pf+0w
Count is: 600000000
8.020u 0.030s 0:08.15 98.7% 0+0k 0+0io 1598pf+0w
Count is: 600000000
7.740u 0.020s 0:07.86 98.7% 0+0k 0+0io 1598pf+0w
Some things to note:
1. there really are significant differences in peformance for different weaving
strategies. Of course, this benchmark is set up to show off those
differences (lots of applications of around advice, and not much else
going on in the benchmark). However, the differences are significant.
2. the non-inlining abc strategy is space efficient, but expensive. The
calls to the methods, switches, casts, and extra params do cost.
3. the current abc inlining strategy only gets part of the way to the
ajc performance, this is because it does not inline the proceed.
4. James' inlining strategy does pretty well and is close to ajc.
5. However, there is still room for improvement on James' strategy.
The key point here is that the proceed static methods
that get generated have
parameters and return types of Object and then casts to the correct
type. These casts actually do incur overhead as you can see from
the different between the last two versions in the data above.
I also note that ajc generates private final static methods, whereas
James' version currently generates public methods.
Here is an example from the benchmark:
---------------------
Current abc inlining:
---------------------
SHADOW:
EnforceCodingStandards.aspectOf();
r38 = Test.abc$static$proceed$EnforceCodingStandards$around$0(0, $r2, null, 0);
if (r38 == null)
{
System.err.println("Detected null return value after calling at line 10");
}
PROCEED:
public static java.lang.Object abc$static$proceed$EnforceCodingStandards$around$0(int shadowID$0, java.lang.Object contextArgFormal$3, java.lang.Object
contextArgFormal$8, int contextArgFormal$14)
{
A a;
B b;
switch (shadowID$0)
{
case 0:
a = (A) contextArgFormal$3;
return a.foo(a);
case 1:
b = (B) contextArgFormal$3;
return b.goo(b);
case 2:
return ((A) contextArgFormal$8).ident(contextArgFormal$3);
case 3:
return ((StringBuffer) contextArgFormal$3).append((String) contextArgFormal$8);
case 4:
return ((StringBuffer) contextArgFormal$3).append(contextArgFormal$14);
case 5:
return ((StringBuffer) contextArgFormal$3).toString();
default:
throw new RuntimeException();
}
}
}
---------------
James' Inlining
----------------
SHADOW:
EnforceCodingStandards.aspectOf();
a2 = (A) Test.around$0$EnforceCodingStandards$1$0$inline($r0);
SPECIALIZED ADVICE:
public static java.lang.Object around$0$EnforceCodingStandards$1$0$inline(java.lang.Object contextArgFormal$5)
{
java.lang.Object lRetVal;
lRetVal = Test.inline$abc$static$proceed$EnforceCodingStandards$around$0$Test$0(contextArgFormal$5);
if (lRetVal == null)
{
System.err.println("Detected null return value after calling at line 10");
}
return lRetVal;
}
SPECIALIZED PROCEED: **** NOTE that return type and param is Object ******
**** Also note that since it has come from the
case of the switch statement, it has been
recognized that only one param is needed ...
whereas ajc will use two ....
public static java.lang.Object inline$abc$static$proceed$EnforceCodingStandards$around$0$Test$0(java.lang.Object contextArgFormal$3)
{
A a;
a = (A) contextArgFormal$3;
return a.foo(a);
}
------------
ajc Inlining
------------
SHADOW: **** NOTE there spurious copy statements, that's part of the reason
that running the result through Soot cleans things up.
r5 = r2;
r6 = r2;
r8 = (A) Test.foo_aroundBody1$advice(r6, r5, EnforceCodingStandards.aspectOf(), null);
SPECIALIZED ADVICE: ****** Note that they have spurious params, but the params
have the right type
private static final java.lang.Object foo_aroundBody1$advice(A r0, A r1, EnforceCodingStandards r2, org.aspectj.runtime.internal.AroundClosure r3)
{
org.aspectj.runtime.internal.AroundClosure r4;
A r5;
r4 = r3;
r5 = Test.foo_aroundBody0(r0, r1);
if (r5 == null)
{
System.err.println("Detected null return value after calling at line 10");
}
return r5;
}
SPECIALIZED PROCEED: **** Note the types are right, no casts needed.
private static final A foo_aroundBody0(A r0, A r1)
{
return r0.foo(r1);
}
-------------------------------------------------------------------------
So - what is the conclusion .... we want to be able to create the static
methods as James has done, but we can do even better if we can create
the static proceed methods with the correct types and avoid the extra
casting. James and Sascha - what is your opinion on this? Is it a
difficult modification of what we already have? I am finishing up the
PLDI paper tonight and tomorrow. It would be nice to have these new
numbers too ... but only if they are reasonably easy to get. In any
case we should push a bit more on this and see if we can get that
extra little bit of improvement.
Cheers, Laurie
+-------------------------------------------------------------+
| Laurie Hendren, Professor, School of Computer Science |
| McGill University |
| 318 McConnell Engineering Building tel: (514) 398-7391 |
| 3480 University Street fax: (514) 398-3883 |
| Montreal, Quebec H3A 2A7 hendren@cs.mcgill.ca |
| CANADA http://www.sable.mcgill.ca/~hendren |
| http://wwww.sable.mcgill.ca http://aspectbench.org |
+-------------------------------------------------------------+
This archive was generated by hypermail 2.1.8 : Sun Apr 17 2005 - 00:10:05 BST