[Soot-list] Why Spark in soot 2.2.4 slows down

Thu Jul 5 13:09:58 EDT 2007

On Thu, Jul 05, 2007 at 10:54:51AM -0400, Eric Bodden wrote:
> I tried to come up with a wrapper that implements the correct
> behavior. However it's getting really painful because we have other
> wrappers already. Manu and I both independently wrote a wrapper that
> makes Spark's points-to sets compatible to others. If we did not have
> those other wrappers around, things would be much easier...
> 
> Furthermore, I personally think that if one relies on equality by
> reference, one should use appropriate IdentityHashSets. That's why
> they exist... (However, I do realize that those did not exist when
> Spark was written.)

I wonder what you think of Frank Tip's upcoming ECOOP paper:
http://domino.research.ibm.com/comm/research_people.nsf/pages/tip.ecoop2007.html

Basically, it argues that equals and hashCode depending on mutating parts
of an object are broken. Within Spark, the contents of PointsToSetInternal 
change all the time. There's a reason it has "Internal" as part of the
name...

> 
> Eric
> 
> On 05/07/07, Manu Sridharan <manu_s at eecs.berkeley.edu> wrote:
> >Hi there,
> >
> >I just wanted to chime in and say that the performance of my
> >demand-driven points-to analysis implementation *may* also depend on
> >equals() being object identity for Spark points-to sets; I can't
> >immediately remember.  I also think the cleaner fix would be in those
> >Spark clients depending on different equals() behavior, but the
> >IdentityHashSet solution may be much easier.
> >
> >-Manu
> >
> >Ondrej Lhotak wrote:
> >> On Wed, Jul 04, 2007 at 12:00:45PM -0400, Eric Bodden wrote:
> >>
> >>> Ondrej, I actually think that actually it would be the cleanest
> >>> solution if Spark would use an IdentityHashSet at those places where
> >>> comparison on identity is required. Would that be possible?
> >>>
> >>
> >> A clean solution would avoid imposing behaviour required by some
> >> external application on Spark-internal classes; using a wrapper
> >> in your application would be cleaner. But you're right that using an
> >> IdentityHashSet inside Spark is the easiest solution, if not cleanest.
> >> It's fine with me, as long as you find all the places where Spark
> >> could be calling hashCode and equals on those classes. The worklist
> >> may be the only such place, but it may not; it was a running assumption
> >> that equals would be object identity, and I just don't know all the
> >> places in the code that rely on that assumption.
> >>
> >>
> >>> Cheers,
> >>> Eric
> >>>
> >>> On 04/07/07, Eric Bodden <eric.bodden at mail.mcgill.ca> wrote:
> >>>
> >>>> In addition, I saw that hashCode() had an implementation bug:
> >>>>
> >>>>                P2SetVisitorInt visitor = new P2SetVisitorInt(1) {
> >>>>
> >>>>                        final int PRIME = 31;
> >>>>
> >>>>                        public void visit(Node n) {
> >>>>                                intValue = PRIME * intValue + 
> >n.hashCode();
> >>>>                        }
> >>>>
> >>>>                };
> >>>>                this.forall(visitor); <<<<<<<<< WAS MISSING
> >>>>                return visitor.intValue;
> >>>>
> >>>> Due to the missing line, the hashcode returned would always be 0. :-(
> >>>> No surprise that leads to slow results...
> >>>>
> >>>> Eric
> >>>>
> >>>>
> >>>> On 04/07/07, Eric Bodden <eric.bodden at mail.mcgill.ca> wrote:
> >>>>
> >>>>> Oh, I see. Sorry about that.
> >>>>>
> >>>>> Could the same be a problem in PointsToSetReadOnly in paddle and
> >>>>> AbstractFlowSet in Soot? I added similar methods there...
> >>>>>
> >>>>> I *do* require "equal" points-to sets to be equal in terms of
> >>>>> equals/hashCode in the case of points-to sets of Spark, i.e.
> >>>>> PointsToSetInternal. However, I could maybe come up with a wrapper
> >>>>> class or something like that.
> >>>>>
> >>>>> Ondrej, feel free to commit a fix to this at your will.
> >>>>>
> >>>>> Eric
> >>>>>
> >>>>> On 04/07/07, Ondrej Lhotak <olhotak at uwaterloo.ca> wrote:
> >>>>>
> >>>>>> I've tracked down the cause of the slowdown. It is a bug that could
> >>>>>>
> >>>> also
> >>>>
> >>>>>> cause Spark to give an incorrect (unsound) answer, though I haven't
> >>>>>> observed an input program for which this would be the case.
> >>>>>>
> >>>>>> The problem is introduced in the change:
> >>>>>>
> >>>>>>
> >>>> 
> >------------------------------------------------------------------------
> >>>>
> >>>>>> r2465 | ebodde | 2006-08-15 20:15:52 -0400 (Tue, 15 Aug 2006) | 1 
> >line
> >>>>>>
> >>>>>> implementation of equals/hashCode for PointsToSets in Spark
> >>>>>>
> >>>>>>
> >>>> 
> >------------------------------------------------------------------------
> >>>>
> >>>>>> This change adds expensive equals and hashCode methods to the class
> >>>>>> PointsToSetInternal, Spark's internal representation of a points-to
> >>>>>> set. These new methods override the object identity methods on 
> >Object,
> >>>>>> on which the Spark propagation algorithm depends. Inside Spark, a
> >>>>>> PointsToSetInternal is a mutable container that holds ever-changing
> >>>>>>
> >>>> sets
> >>>>
> >>>>>> of abstract objects during the analysis. Thus, from the point of view
> >>>>>>
> >>>> of
> >>>>
> >>>>>> Spark, the appropriate notion of equality for these containers is
> >>>>>>
> >>>> object
> >>>>
> >>>>>> identity, not equality of the contents. Specifically, Spark makes a
> >>>>>> worklist (actually a set) of PointsToSetInternal's whose contents 
> >need
> >>>>>> to be further propagated. The overridden methods cause adding to the
> >>>>>> worklist to be very slow, and may prevent PointsToSetInternal's whose
> >>>>>> contents still need to be propagated from being added, if their
> >>>>>>
> >>>> contents
> >>>>
> >>>>>> happen to be the same as another PointsToSetInternal.
> >>>>>>
> >>>>>> Removing these methods from PointsToSetInternal fixes the problem. I
> >>>>>> have not committed this change yet.
> >>>>>>
> >>>>>> Ondrej
> >>>>>>
> >>>>>> On Mon, Jun 25, 2007 at 04:58:45PM -0400, Eric Bodden wrote:
> >>>>>>
> >>>>>>> Ondrej could you do the same run with hprof enabled? I think with
> >>>>>>>
> >>>> such
> >>>>
> >>>>>>> a significant difference, this should easily show up in a profile.
> >>>>>>>
> >>>>>>> Eric
> >>>>>>>
> >>>>>>> On 22/06/07, Ondrej Lhotak <olhotak at uwaterloo.ca> wrote:
> >>>>>>>
> >>>>>>>> Strange, I also notice a big slowdown.
> >>>>>>>>
> >>>>>>>> Also, Soot 2.2.4 gives significantly different answers than 2.2.3.
> >>>>>>>>
> >>>>>>>> If I use the pre-jimplify switch, which separates Spark time from
> >>>>>>>> Jimplification time, I get the following Spark times on soot-c:
> >>>>>>>> 2.2.3:  15.3s
> >>>>>>>> 2.2.4: 113.9s
> >>>>>>>>
> >>>>>>>> Something *very* weird is going on.
> >>>>>>>>
> >>>>>>>> I'm not aware of any recent changes to Spark. Has anyone been
> >>>>>>>>
> >>>> working on
> >>>>
> >>>>>>>> Spark?
> >>>>>>>>
> >>>>>>>> I would like to do a binary search through the Subversion revisions
> >>>>>>>> between releases 2.2.3 and 2.2.4 to determine the cause of this.
> >>>>>>>> Unfortunately, the internet is dead here at the moment, so I can't
> >>>>>>>>
> >>>> do
> >>>>
> >>>>>>>> that right now (and who knows when this e-mail will even go out).
> >>>>>>>>
> >>>> I'll
> >>>>
> >>>>>>>> be travelling all next week, so I can't investigate the problem
> >>>>>>>>
> >>>> until
> >>>>
> >>>>>>>> later. If someone manages to track it down, I'd be interested to
> >>>>>>>>
> >>>> hear
> >>>>
> >>>>>>>> what the cause is.
> >>>>>>>>
> >>>>>>>> On Thu, Jun 21, 2007 at 08:39:25AM -0400, Eric Bodden wrote:
> >>>>>>>>
> >>>>>>>>> Nope. No such observation here. 50s sounds pretty fast to me. Are
> >>>>>>>>>
> >>>> you
> >>>>
> >>>>>>>>> sure it's doing the right thing? If so, maybe could you email us
> >>>>>>>>>
> >>>> you
> >>>>
> >>>>>>>>> test case and command line?
> >>>>>>>>>
> >>>>>>>>> Eric
> >>>>>>>>>
> >>>>>>>>> On 21/06/07, Joe Qian <owqian at gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi everyone,
> >>>>>>>>>>
> >>>>>>>>>>       Recently I moved from soot 2.2.3 to soot 2.2.4. But when
> >>>>>>>>>>
> >>>> testing
> >>>>
> >>>>>>>> old
> >>>>>>>>
> >>>>>>>>>> programs (also with JDK1.4.1), i found the Spark points-to
> >>>>>>>>>>
> >>>> analysis run
> >>>>
> >>>>>>>>>> much slower than before.  For my small test case, the old
> >>>>>>>>>>
> >>>> version of
> >>>>
> >>>>>>>> spark
> >>>>>>>>
> >>>>>>>>>> run about 50s, while the new spark version have to run about
> >>>>>>>>>>
> >>>> 200s.
> >>>>
> >>>>>>>>>> I have no idea about this situation. Does any one else have
> >>>>>>>>>>
> >>>> similar
> >>>>
> >>>>>>>>>> experience?
> >>>>>>>>>>
> >>>>>>>>>>       Thanks
> >>>>>>>>>>
> >>>>>>>>>> Joe
> >>>>>>>>>> _______________________________________________
> >>>>>>>>>> Soot-list mailing list
> >>>>>>>>>> Soot-list at sable.mcgill.ca
> >>>>>>>>>> http://mailman.cs.mcgill.ca/mailman/listinfo/soot-list
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Eric Bodden
> >>>>>>>>> Sable Research Group
> >>>>>>>>> McGill University, Montréal, Canada
> >>>>>>>>> _______________________________________________
> >>>>>>>>> Soot-list mailing list
> >>>>>>>>> Soot-list at sable.mcgill.ca
> >>>>>>>>> http://mailman.cs.mcgill.ca/mailman/listinfo/soot-list
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>> --
> >>>>>>> Eric Bodden
> >>>>>>> Sable Research Group
> >>>>>>> McGill University, Montréal, Canada
> >>>>>>>
> >>>>>>>
> >>>>> --
> >>>>> Eric Bodden
> >>>>> Sable Research Group
> >>>>> McGill University, Montréal, Canada
> >>>>>
> >>>>>
> >>>> --
> >>>> Eric Bodden
> >>>> Sable Research Group
> >>>> McGill University, Montréal, Canada
> >>>>
> >>>>
> >>> --
> >>> Eric Bodden
> >>> Sable Research Group
> >>> McGill University, Montréal, Canada
> >>>
> >>>
> >> _______________________________________________
> >> Soot-list mailing list
> >> Soot-list at sable.mcgill.ca
> >> http://mailman.cs.mcgill.ca/mailman/listinfo/soot-list
> >>
> >
> 
> 
> -- 
> Eric Bodden
> Sable Research Group
> McGill University, Montréal, Canada
>