[Soot-list] Precision of Spark's Points-To-Analysis again (and much more)

Thu Nov 4 09:34:52 EST 2004

Hi,

as I stated some time ago, I'm using Soot as part of my master thesis.
Meanwhile the thesis is nearly completed and I've got a question to
the authors of the Soot-documentation:
Is it's ok to include some pictures from the presentation slides in
my thesis? Obviously I will reference the picture's source.

For my own purposes I had to do some major changes on Soot's source
code which I like to present to you in this mail.

The task was to develop a conecpt analysis for Java like descibed for C++ in:
"SNELTING, G. and TIP, F. (2000): Understanding Class Hierarchies Using
Concept Analysis, ACM Transactions on Programming Languages and Systems,
Vol. 22, No.3, May 2000."

My propram is part of KABA-Framework, see:
"STRECKENBACH, M. and SNELTING, G. (2004): Refactoring Class Hierarchies
with KABA, OOPSLA '04, Oct. 24-28, 2004."
and
"http://www.infosun.fmi.uni-passau.de/st/staff/streckenbach/"

For concept analysis one needs points-to analysis. I choosed to use Soot for
this purpose because it can handle large programs in a performant way. But
very soon I noticed that Spark's Points-To-Analysis was not precise enough
for KABA (see <a 
href="http://www.sable.mcgill.ca/listarchives/soot-list/msg01075.html">
Precision of Spark Analysis</a>).

So I tried to improve the quality of the result-set and finally I succeeded.
What follows is a detailed description of  my changes of the Soot-Framework.
To all related classes the source files and diff's are attached as well.
These files are still from the 2.1.0 final release (no subversion!).

-------------------------------------------------------------------------------
The description starts here.

Because of the conservative character of Spark's Points-To approximation it can
never happen that there's an edge missing. After running a few examples an
studying spark's source code I found out that virtual function calls are not
resolved correctly (see example in <a 
href="http://www.sable.mcgill.ca/listarchives/soot-list/msg01075.html">
Precision of Spark Analysis</a>).

I didn't want to change the whole PAG-construction-mechanism, so I searched
for another solution that was simpler to implement and I found it. For this
approach the PAG-construction-mechanism stays unchanged.

The subset based propagators of spark add the points-to set of every node
to the set of node it is assigned to in every iteration, see:
"LHOTÁK, O. and HENDREN, L. (2003): Scaling Java points-to analysis using
Spark, In G. Hedin, editor, Compiler Construction, 12th International
Conference, volume 2622 of LNCS, pages 153-169, Warsaw, Poland, Springer".

My idea is now, to check if the targets really can be reached before they are
added, with an algorithm called "staticLookup()". In order to serve Soot's
performance mission, a little optimization is included too.

For changes on the worklist-propagator (PropWorklist) I copied the source
into another file. The altered propagator is named PropKaba, the source is
included in the attatchments too.

At first I will explain the optimization and then discuss the staticLookup()
function and it's integration into the propagator.

Spark's imprecision only comes from virtual function calls. But there is no way
to find out which assignments are virual when the creation of the PAG is
finished. This is why I added another HashMap named "virtual" to the class
"AbstractPAG" that remebers virtual function calls. I gave it analoguos
function to the other Maps to access it's elements. By adding the following
three lines (marked by "//TB") of code in "AbstractPAG" the virtual edges are
saved in "virtual":

     final public void addCallTarget(AbstractMethodPAG srcmpag,
                                     AbstractMethodPAG tgtmpag,
                                     Stmt s,
                                     Object srcContext,
                                     Object tgtContext) {
         ...
         if(ie instanceof InstanceInvokeExpr) {
             ...
             if(iie instanceof VirtualInvokeExpr) { //TB
                 addVirtualEdge((VarNode) baseNode, (VarNode) thisRef); //TB
             } //TB
             ...
         }
         ...
     }

Now PAG becomes a "virtualLookup()"-function which is identical to the lookup
functions of the other Maps. A second function in needed for the optimization
that is called "isVirtual(VarNode src, VarNode tgt)". "isVirtual()" runs a
"virtualLookup()" for src and returns true if tgt is contained in the result.
The preparations for the optimization are now complete and we can focus at
the important things.

As mentioned aboved, spark's points-to sets sometimes contain to much Nodes.
A node calculates its own set by "node.makeP2Set();" but there is no way to way
to delete elements from the set. So I added a "removeAll()"-function for every
PointsToSetInternal which removes another PointsToSet from itself.
Be careful: I tried to change all implementations but I'm not sure if they are
working correcty because I did not test them much!

The basic part of my improvement is the function "staticLookup()":
     protected ClassMember staticLookup(Node y, ClassMember m) {...}
Considering node y calls method m virtual, then "staticLookup()" returns
the method that is really called by y. The source code is not so compex
but too long to show. Look at "PropKaba.staticLookup();".

It is a nice fact that virtual function calls - and therefore virtual edges
are always caused by two VarNodes. For this reason one must only seek in
method "handleVarNode()" for virtual edges. And in this method only the
"simpleTargets" need to be handled. With the following code fragments one can
compare the two variants of propagators:

     // PropWorklist:
     protected final boolean handleVarNode(final VarNode src) {
         ...
         Node[] simpleTargets = pag.simpleLookup(src);
         for (int i = 0; i < simpleTargets.length; i++) {
             if (simpleTargets[i].makeP2Set().addAll(newP2Set, null)) {
                 varNodeWorkList.add((VarNode) simpleTargets[i]);
                 ret = true;
             }
         }
         ...
     }

     ------------

     // PropKaba:
     protected final boolean handleVarNode(final VarNode src) {
         ...
         Node[] simpleTargets = pag.simpleLookup(src);
         for (int i = 0; i < simpleTargets.length; i++) {
             if(!pag.isVirtual(src, simpleTargets[i])) { //TB
                 if (simpleTargets[i].makeP2Set().addAll(newP2Set, null)) {
                     varNodeWorkList.add((VarNode) simpleTargets[i]);
                     ret = true;
                 }
             } else { //TB
                 PointsToSetInternal excludes = getExcludes(simpleTargets[i], 
newP2Set); //TB
                 PointsToSetInternal tgtP2Set = simpleTargets[i].makeP2Set(); 
//TB
                 boolean work = tgtP2Set.addAll(newP2Set, null); //TB
                 work |= tgtP2Set.removeAll(excludes); //TB
                 if (work) { //TB
                     varNodeWorkList.add((VarNode) simpleTargets[i]); //TB
                     ret = true; //TB
                 } //TB
             } //TB
             ...
         }

The method "getExcludes()" only calls "staticLookup()" for every node in
"newP2Set" and returns a set of Nodes that is not reachable from
simpleTargets[i]. Thus it is must be removed by "tgtP2Set.removeAll(excludes);"

This lines also contain what I call "tiny optimization" because my extension
is only called if there really IS a virtual edge. Otherwise the propagation
runs as known.

The last thing I want to mention is that finalize()-calls are treated as
virtual function calls as well. Because of that only one (or none) finalize()-
this-pointer can appear in the points-to-set on an object.

-------------------------------------------------------------------------------

I hope you understood my explanations and you will find them as useful as I do.
If you have some further question do not hesitate to ask me.

I've not done greater case studies till now, but the smaller tests promise that
there will be no loss in performance. I think that through "staticLookup()"
there is more calculation to do but the Points-To-Sets get much smaller as 
well.

Comments, critics and tests are very welcome.

Best regards,
    Thorsten Buckley
-------------- next part --------------
A non-text attachment was scrubbed...
Name: src.zip
Type: application/zip
Size: 29451 bytes
Desc: 
Url : http://www.sable.mcgill.ca/pipermail/soot-list/attachments/20041104/c22de1c2/src-0001.zip
-------------- next part --------------
A non-text attachment was scrubbed...
Name: diffs.zip
Type: application/zip
Size: 8568 bytes
Desc: 
Url : http://www.sable.mcgill.ca/pipermail/soot-list/attachments/20041104/c22de1c2/diffs-0001.zip