[Soot-list] Why Spark in soot 2.2.4 slows down

Thu Jul 5 10:54:51 EDT 2007

I tried to come up with a wrapper that implements the correct
behavior. However it's getting really painful because we have other
wrappers already. Manu and I both independently wrote a wrapper that
makes Spark's points-to sets compatible to others. If we did not have
those other wrappers around, things would be much easier...

Furthermore, I personally think that if one relies on equality by
reference, one should use appropriate IdentityHashSets. That's why
they exist... (However, I do realize that those did not exist when
Spark was written.)

Eric

On 05/07/07, Manu Sridharan <manu_s at eecs.berkeley.edu> wrote:
> Hi there,
>
> I just wanted to chime in and say that the performance of my
> demand-driven points-to analysis implementation *may* also depend on
> equals() being object identity for Spark points-to sets; I can't
> immediately remember.  I also think the cleaner fix would be in those
> Spark clients depending on different equals() behavior, but the
> IdentityHashSet solution may be much easier.
>
> -Manu
>
> Ondrej Lhotak wrote:
> > On Wed, Jul 04, 2007 at 12:00:45PM -0400, Eric Bodden wrote:
> >
> >> Ondrej, I actually think that actually it would be the cleanest
> >> solution if Spark would use an IdentityHashSet at those places where
> >> comparison on identity is required. Would that be possible?
> >>
> >
> > A clean solution would avoid imposing behaviour required by some
> > external application on Spark-internal classes; using a wrapper
> > in your application would be cleaner. But you're right that using an
> > IdentityHashSet inside Spark is the easiest solution, if not cleanest.
> > It's fine with me, as long as you find all the places where Spark
> > could be calling hashCode and equals on those classes. The worklist
> > may be the only such place, but it may not; it was a running assumption
> > that equals would be object identity, and I just don't know all the
> > places in the code that rely on that assumption.
> >
> >
> >> Cheers,
> >> Eric
> >>
> >> On 04/07/07, Eric Bodden <eric.bodden at mail.mcgill.ca> wrote:
> >>
> >>> In addition, I saw that hashCode() had an implementation bug:
> >>>
> >>>                P2SetVisitorInt visitor = new P2SetVisitorInt(1) {
> >>>
> >>>                        final int PRIME = 31;
> >>>
> >>>                        public void visit(Node n) {
> >>>                                intValue = PRIME * intValue + n.hashCode();
> >>>                        }
> >>>
> >>>                };
> >>>                this.forall(visitor); <<<<<<<<< WAS MISSING
> >>>                return visitor.intValue;
> >>>
> >>> Due to the missing line, the hashcode returned would always be 0. :-(
> >>> No surprise that leads to slow results...
> >>>
> >>> Eric
> >>>
> >>>
> >>> On 04/07/07, Eric Bodden <eric.bodden at mail.mcgill.ca> wrote:
> >>>
> >>>> Oh, I see. Sorry about that.
> >>>>
> >>>> Could the same be a problem in PointsToSetReadOnly in paddle and
> >>>> AbstractFlowSet in Soot? I added similar methods there...
> >>>>
> >>>> I *do* require "equal" points-to sets to be equal in terms of
> >>>> equals/hashCode in the case of points-to sets of Spark, i.e.
> >>>> PointsToSetInternal. However, I could maybe come up with a wrapper
> >>>> class or something like that.
> >>>>
> >>>> Ondrej, feel free to commit a fix to this at your will.
> >>>>
> >>>> Eric
> >>>>
> >>>> On 04/07/07, Ondrej Lhotak <olhotak at uwaterloo.ca> wrote:
> >>>>
> >>>>> I've tracked down the cause of the slowdown. It is a bug that could
> >>>>>
> >>> also
> >>>
> >>>>> cause Spark to give an incorrect (unsound) answer, though I haven't
> >>>>> observed an input program for which this would be the case.
> >>>>>
> >>>>> The problem is introduced in the change:
> >>>>>
> >>>>>
> >>> ------------------------------------------------------------------------
> >>>
> >>>>> r2465 | ebodde | 2006-08-15 20:15:52 -0400 (Tue, 15 Aug 2006) | 1 line
> >>>>>
> >>>>> implementation of equals/hashCode for PointsToSets in Spark
> >>>>>
> >>>>>
> >>> ------------------------------------------------------------------------
> >>>
> >>>>> This change adds expensive equals and hashCode methods to the class
> >>>>> PointsToSetInternal, Spark's internal representation of a points-to
> >>>>> set. These new methods override the object identity methods on Object,
> >>>>> on which the Spark propagation algorithm depends. Inside Spark, a
> >>>>> PointsToSetInternal is a mutable container that holds ever-changing
> >>>>>
> >>> sets
> >>>
> >>>>> of abstract objects during the analysis. Thus, from the point of view
> >>>>>
> >>> of
> >>>
> >>>>> Spark, the appropriate notion of equality for these containers is
> >>>>>
> >>> object
> >>>
> >>>>> identity, not equality of the contents. Specifically, Spark makes a
> >>>>> worklist (actually a set) of PointsToSetInternal's whose contents need
> >>>>> to be further propagated. The overridden methods cause adding to the
> >>>>> worklist to be very slow, and may prevent PointsToSetInternal's whose
> >>>>> contents still need to be propagated from being added, if their
> >>>>>
> >>> contents
> >>>
> >>>>> happen to be the same as another PointsToSetInternal.
> >>>>>
> >>>>> Removing these methods from PointsToSetInternal fixes the problem. I
> >>>>> have not committed this change yet.
> >>>>>
> >>>>> Ondrej
> >>>>>
> >>>>> On Mon, Jun 25, 2007 at 04:58:45PM -0400, Eric Bodden wrote:
> >>>>>
> >>>>>> Ondrej could you do the same run with hprof enabled? I think with
> >>>>>>
> >>> such
> >>>
> >>>>>> a significant difference, this should easily show up in a profile.
> >>>>>>
> >>>>>> Eric
> >>>>>>
> >>>>>> On 22/06/07, Ondrej Lhotak <olhotak at uwaterloo.ca> wrote:
> >>>>>>
> >>>>>>> Strange, I also notice a big slowdown.
> >>>>>>>
> >>>>>>> Also, Soot 2.2.4 gives significantly different answers than 2.2.3.
> >>>>>>>
> >>>>>>> If I use the pre-jimplify switch, which separates Spark time from
> >>>>>>> Jimplification time, I get the following Spark times on soot-c:
> >>>>>>> 2.2.3:  15.3s
> >>>>>>> 2.2.4: 113.9s
> >>>>>>>
> >>>>>>> Something *very* weird is going on.
> >>>>>>>
> >>>>>>> I'm not aware of any recent changes to Spark. Has anyone been
> >>>>>>>
> >>> working on
> >>>
> >>>>>>> Spark?
> >>>>>>>
> >>>>>>> I would like to do a binary search through the Subversion revisions
> >>>>>>> between releases 2.2.3 and 2.2.4 to determine the cause of this.
> >>>>>>> Unfortunately, the internet is dead here at the moment, so I can't
> >>>>>>>
> >>> do
> >>>
> >>>>>>> that right now (and who knows when this e-mail will even go out).
> >>>>>>>
> >>> I'll
> >>>
> >>>>>>> be travelling all next week, so I can't investigate the problem
> >>>>>>>
> >>> until
> >>>
> >>>>>>> later. If someone manages to track it down, I'd be interested to
> >>>>>>>
> >>> hear
> >>>
> >>>>>>> what the cause is.
> >>>>>>>
> >>>>>>> On Thu, Jun 21, 2007 at 08:39:25AM -0400, Eric Bodden wrote:
> >>>>>>>
> >>>>>>>> Nope. No such observation here. 50s sounds pretty fast to me. Are
> >>>>>>>>
> >>> you
> >>>
> >>>>>>>> sure it's doing the right thing? If so, maybe could you email us
> >>>>>>>>
> >>> you
> >>>
> >>>>>>>> test case and command line?
> >>>>>>>>
> >>>>>>>> Eric
> >>>>>>>>
> >>>>>>>> On 21/06/07, Joe Qian <owqian at gmail.com> wrote:
> >>>>>>>>
> >>>>>>>>> Hi everyone,
> >>>>>>>>>
> >>>>>>>>>       Recently I moved from soot 2.2.3 to soot 2.2.4. But when
> >>>>>>>>>
> >>> testing
> >>>
> >>>>>>> old
> >>>>>>>
> >>>>>>>>> programs (also with JDK1.4.1), i found the Spark points-to
> >>>>>>>>>
> >>> analysis run
> >>>
> >>>>>>>>> much slower than before.  For my small test case, the old
> >>>>>>>>>
> >>> version of
> >>>
> >>>>>>> spark
> >>>>>>>
> >>>>>>>>> run about 50s, while the new spark version have to run about
> >>>>>>>>>
> >>> 200s.
> >>>
> >>>>>>>>> I have no idea about this situation. Does any one else have
> >>>>>>>>>
> >>> similar
> >>>
> >>>>>>>>> experience?
> >>>>>>>>>
> >>>>>>>>>       Thanks
> >>>>>>>>>
> >>>>>>>>> Joe
> >>>>>>>>> _______________________________________________
> >>>>>>>>> Soot-list mailing list
> >>>>>>>>> Soot-list at sable.mcgill.ca
> >>>>>>>>> http://mailman.cs.mcgill.ca/mailman/listinfo/soot-list
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> --
> >>>>>>>> Eric Bodden
> >>>>>>>> Sable Research Group
> >>>>>>>> McGill University, Montréal, Canada
> >>>>>>>> _______________________________________________
> >>>>>>>> Soot-list mailing list
> >>>>>>>> Soot-list at sable.mcgill.ca
> >>>>>>>> http://mailman.cs.mcgill.ca/mailman/listinfo/soot-list
> >>>>>>>>
> >>>>>>>>
> >>>>>> --
> >>>>>> Eric Bodden
> >>>>>> Sable Research Group
> >>>>>> McGill University, Montréal, Canada
> >>>>>>
> >>>>>>
> >>>> --
> >>>> Eric Bodden
> >>>> Sable Research Group
> >>>> McGill University, Montréal, Canada
> >>>>
> >>>>
> >>> --
> >>> Eric Bodden
> >>> Sable Research Group
> >>> McGill University, Montréal, Canada
> >>>
> >>>
> >> --
> >> Eric Bodden
> >> Sable Research Group
> >> McGill University, Montréal, Canada
> >>
> >>
> > _______________________________________________
> > Soot-list mailing list
> > Soot-list at sable.mcgill.ca
> > http://mailman.cs.mcgill.ca/mailman/listinfo/soot-list
> >
>

-- 
Eric Bodden
Sable Research Group
McGill University, Montréal, Canada