abc-private (Aug 2005): Re: [abc] OOPSLA paper

From: Ondrej Lhotak <olhotak@sable.mcgill.ca>
Date: Wed Aug 03 2005 - 15:54:50 BST

On Wed, Aug 03, 2005 at 08:46:46AM -0400, Laurie Hendren wrote:
>
> Hi Ondrej,
>
> I have been thinking about theOptimisation section.
> I like the first part which deals with the situation where there are no
> variables, but something bothers me about the second column, where the
> iterator example is discussed.
>
> In considering what bothers me, here is what I can come up with:
>
> I think it is because the discussion here seems to me to be very specific to
> this particular example, and it's not clear to me what the general problem
> is. In the first part you state the problem generally. Of course,
> perhaps part of the problem is that it is hard to state a general problem,
> and so we are just giving an illustrative example ... but then I think that
> has to be very clear.

OK, I've re-read it, and I think I see what you mean.

I couldn't find the discussion that prompted me to write this in terms
of the example, but I suspect that what happened was that the general
explanation was too theoretical, and we decided to use an example to
illustrate it. I think one of the points we wanted to make was that
even for such a simple example, a fairly sophisticated analysis would be
required.

One thing I could do is write it in general terms first, then give the
example (and shorten the discussion to save space). Alternatively,
I could just make it clear that, as you say, "it is hard to state a
general problem, and so we are just giving an illustrative example".

> Secondly, I wonder why you are using the term "alias analysis" instead of
> "points-to" analysis. It seems clear that one would want to start with

I think we want a word that's more general than points-to or alias. Some
people say "pointer analysis" to mean both. But we probably want to
include "shape analyses" as well. I could use the word "pointer
analysis" and have a footnote saying it includes all this stuff.

The reason that I said "alias" in particular is that what we really want
to know is whether a pair of variables (possibly at different times)
either point to definitely distinct objects or definitely the same
objects. We don't really care what specific objects they point to, just
whether we have definitely the same or definitely a different object
at the various joinpoints that match events of the tracematch. That
is, when we get to a joinpoint matching an event of the tracematch,
do we still have the same object we had in a previous event, or do we
have a different one.

I think what we want most closely resembles alias information rather
than points-to information, but it's a bit different from both of them,
so we probably shouldn't use the word alias.

> some sort of points-to analysis, since one wants to reason about the
> pointed-to objects. As our previous round of e-mail says, the problem
> is really that since tracematches are relative to object instances.
>
> With existing may points-to analyses, it is possible to say when two variables
> don't point to the same instance,

Right, and this is just may-alias information, is it not? (Of course,
may-alias information can be computed from may-points-to information.)

> but for tracematches we will need to
> concentrate on analysis that determine when two variables definitely point
> to the same instance.

Exactly, which is must-alias information.

> Have you seen anything in recent literature that
> solves this problem?

Well, nothing that's exactly like what we would need, in particular
tracking objects through time. The closest recent thing is the POPL 05
paper by Hackett and Rugina, which I think would be a good starting
point for this kind of thing. One thing that they don't really keep
track of is which of their nodes represent a single object, and which
ones may represent multiple objects, but hopefully that could be fixed.
The TVLA stuff out of Mooly Sagiv's group might also be relevant, but
I'm not sure.

In general, I think shape analysis stuff is most likely to be closest to
what we need.

> Rakesh's work did do this using his connection/shape
> analysis for C, and Christopher Lapkowski's work is C also addressed this
> by using SSA numbering (i.e. when a pointer variable had the same primary
> and secondary SSA number, it must point to the same instance).

We'll need it to be inter-procedural.

Furthermore, I'm not sure that's exactly what we want. In the SSA stuff,
if two variables have the same numbers, they must point to the same
object instance _at any given time_. But we're talking different times. Say you
have a loop like this:
goto L1;
L2: use x;
L1: define x;
L3: use x;
goto L2;

The two uses of x would have the same SSA numbers. If you had a
tracematch with two events matching join point L3 then L2,
and the x had to be the same at the two events, it would match (because
the x from L3 flows to the x at L2). But, if the events in the
tracematch were swapped, to match L2 and then L3, it wouldn't match,
because between L2 and L3, x gets redefined.

I think the idea of the iterator example was intended to illustrate this
on a more realistic example.

Are you suggesting that we try to survey alias/pointer/shape analyses
and try to speculate on how they could be modified to give us the
information we need? I thought we didn't want to open that can of worms
at this point.

Ondrej

> Of course it would be easier to hash this out on a white board, but hopefully
> my main points come across in e-mail????
>
> Cheers, Laurie
>
> +-------------------------------------------------------------+
> | Laurie Hendren, Professor, School of Computer Science |
> | McGill University |
> | 318 McConnell Engineering Building tel: (514) 398-7391 |
> | 3480 University Street fax: (514) 398-3883 |
> | Montreal, Quebec H3A 2A7 hendren@cs.mcgill.ca |
> | CANADA http://www.sable.mcgill.ca/~hendren |
> | http://wwww.sable.mcgill.ca http://aspectbench.org |
> +-------------------------------------------------------------+
>
> On Tue, 2 Aug 2005, Ondrej Lhotak wrote:
>
> > On Tue, Aug 02, 2005 at 12:54:51PM +0100, Oege de Moor wrote:
> > > > Section 6
> > > > ---------
> > >
> > > Laurie, I've brought this up a number of times. This section is the
> > > main McGill contribution to the paper - it was written by Ondrej.
> > > If you're willing to rewrite it - as I've asked in a previous email,
> > > that would be great... if there's no time to do that, I don't
> > > think it does major harm.
> >
> > I'm rather confused by this discussion. Perhaps I missed something
> > that you talked about. You (Laurie and Oege) seem to think there is
> > something wrong with it, but I don't understand what you feel is wrong
> > with it. I'm not saying there's nothing wrong with it; just that I don't
> > understand what exactly you feel is wrong with it. If I rewrite it
> > without such an understanding, the rewritten version is likely to end up
> > very similar to the current version.
> >
> > > In the story of the whole paper, I think it is important to have
> > > the section (with some level of detail), to assure the reader
> > > that one day tracematches could be even more efficient than the
> > > pure AspectJ versions...
> >
> > I thought the reasons that you (Laurie and Oege) asked me to write this
> > section were the following:
> > 1) to show that it is possible to optimize tracematches, so that their
> > current slowness may eventually be overcome
> > 2) to explain in detail what kind of analysis information would be
> > required, without going into details about the analyses that would be
> > done to compute it
> > 3) to say that our design of tracematches lends itself well to analysis
> > because it has clear semantics and limits what the programmer can do
> > 4) that the analyses required are by no means simple, which is why we
> > leave them to future work
> >
> > I wrote the section with the above goals in mind, and I believe the
> > current version achieves them quite well. From the current comments,
> > it sounds like Oege pretty much agrees with this (correct me if I'm
> > wrong), but Laurie doesn't. Am I misunderstanding the goals for this
> > section, or do you disagree that the section achieves the goals?
> >
> > > > I'm not certain about the discussion of analyses needed to further
> > > > optimize tracematches using static analysis. In some sense, this seems
> > > > to be hitting exactly the same problems as those faced by people who
> > > > build static checkers for properties ... and the problems seem just
> > > > as hard.
> >
> > The analysis that would be required is hard, and I thought this was one
> > of the points we were trying to make.
> >
> > > > So, I'm not sure what this discussion is supposed to convey. Is it
> > > > to say that further optimizing tracematches will require a hard analysis,
> > > > yet to be defined?
> > > >
> > > > In the discussion of alias information, it is stated that one needs
> > > > to know whether a variable points to the same location at two different
> > > > points in the program. For store-based analyses, i.e. those that give
> > > > names to locations, this can be done by using the names. For example,
> > > > if s points to "new_1" at program point A and t points to "new_1" at
> > > > program point B, then there is a possible dependence. We already
> > > > do that for side-effect and dependence analysis in Soot using the alloc
> > > > sites - although these are "may" relationships.
> >
> > Right. However, for optimizing tracematches, I think we will need both
> > may and must relationships. I would even say that the must relationships
> > will be more important than the may relationships, because in most
> > tracematches, you're trying to trace a single object through the
> > execution of the program, to make sure that it's the same object in each
> > event. Moreover, not only will we need to know that things must be
> > aliased at a given point, but that the object at one event must be the
> > same object as at another event.
> >
> > > > There are also "storeless" pointer analyses, such as those developed in
> > > > my Ph.D. thesis, and then later by my student Rakesh Ghiya. In these
> > > > cases you don't have names, like "new_1", but only relationships between
> > > > variables at a program point. In his work (see POPL98 and his thesis) he
> > > > showed how to inject enough new variable copies so that the relationships
> > > > between variables at different program points could be compared, and he
> > > > then used that info for optimizing C pointer programs.
> > > >
> > > > I mention these things not because I think we need to address them in the
> > > > paper, but to point out that when we start talking about static analyses
> > > > like these there is a lot of related work ... and perhaps we just want to
> > > > say something a little more high-level and not get into details...
> >
> > Yes, there is a lot of related work on doing these kinds of analyses,
> > and I think you're right that we don't want to get into details on how
> > the analysis is done at this point (we will in the future, when we
> > actually implement an analysis). What we do want to get into details
> > on is what kinds of information we want from such an analysis. Do you
> > feel we go into too little detail on the information we want from the
> > analysis? Do we assume too much about how the analysis should compute
> > the information that we want?
> >
> > Ondrej
> >
> >
> >
> >
>
>
Received on Wed Aug 3 15:55:24 2005

This archive was generated by hypermail 2.1.8 : Wed Aug 03 2005 - 16:00:06 BST