Re: [abc] type assigner complexity in Soot (fwd)

From: Prof. Laurie HENDREN <hendren@sable.mcgill.ca>
Date: Tue Sep 21 2004 - 15:15:42 BST

perhaps an extra copy of this message ...

Laurie

+-------------------------------------------------------------+
| Laurie Hendren, Professor, School of Computer Science |
| McGill University |
| 318 McConnell Engineering Building tel: (514) 398-7391 |
| 3480 University Street fax: (514) 398-3883 |
| Montreal, Quebec H3A 2A7 hendren@cs.mcgill.ca |
| CANADA http://www.sable.mcgill.ca/~hendren |
+-------------------------------------------------------------+

---------- Forwarded message ----------
Date: Tue, 21 Sep 2004 10:08:10 -0400
From: Etienne Gagnon <gagnon.etienne_m@uqam.ca>
Reply-To: egagnon@sablevm.org
To: Prof. Laurie HENDREN <hendren@sable.mcgill.ca>
Cc: abc@comlab.ox.ac.uk, John Jorgensen <jorgnsn@lcd.uregina.ca>
Subject: Re: [abc] type assigner complexity in Soot

Hi Laurie,

>>It seems to me that the merging approach taken in TypeVariable.java is
>>problematic, and we're probably pretty close to its
>>worst-case input with our example program.
>>
>>There is, however, what seems like a reasonably easy way to
>>significantly reduce the time and memory requirements.
>>
>>The type inference algorithm constructs a directed constraint graph of
>>the static types and (at the beginning) untyped local variables. It then
>>collapses cycles in this graph by merging two adjacent nodes in a cycle
>>repeatedly until the cycle consists of only one node (at least if I
>>understand the code correctly - I haven't looked at it in *too* much
>>detail. It would seem more reasonable to replace the whole cycle at once
>>in any case, as when we start merging neighbours we must have already
>>detected the cycle).

It's been a long time since I looked at the code, but it think that indeed
the current algorithm "merges" cycles node by node.

>>It's the algorithm for merging to TypeVariable objects that seems
>>to be the biggest problem in our case. It makes a new TreeSet containing
>>all of the elements of its "parents" list, then adds all the elements of the
>>"parents" list of the TypeVariable it's merging with, then extracts a
>>list from the TreeSet and remembers this as its new "parents" list.

The main problem is that when you merge two nodes A and B, you want to fix
their parent/child lists so that you eliminate duplicates (which is tricky,
due to fast union) and you make these lists point to the representative of
each set.

>>Note that a lot of time and memory is wasted here - the TreeSet is
>>constructed only to remove duplicates, and is discarded immediately.

Right.

>>Also, the old "parents" list is thrown away, a new one is constructed.

Right.

>>The same procedure applies to the "children" lists of the two objects.

Right.

>>This results in nodes aggregating 6000+ parents and 3500+ children in
>>steps of 1-3 at a time, and these are then collapsed back down to
>>single-digit numbers, again, 1-3 at a time. The algorithm seems to
>>aggregate all nodes into one node, repeatedly merging it with nodes with
>>0-1 children 1 and parent. Each iteration constructs a TreeSet and a new
>>List for all of the parents and children.
>>
>>Part of the reason for this repeated construction of the list is that
>>both the parents and children lists are kept as "unmodifiableList"s.
>>Perhaps I'm missing something, but I don't see a reason
>>why this should be the case - if we really want to protect the lists
>>from improper write access, then just expose read-only interface
>>methods, and keep the rest private to the package - however, clearly it
>>would be beneficial to be able to destructively update the lists from
>>within the toolkits.typing package.
>>
>>In fact, I have a feeling that the best data structure for these two
>>lists is either a HashSet or a LinkedHashSet.

When I wrote this code, LinkedHashSet did not exist, and the worst case complexity
of your HashSet is worse than that of the TreeSet. Maybe some profiling would be
desirable? Personally, I prefer coding while taking into account the worst case,
when the code is likely to be used in unforeseen ways (which is usually the case
with compiler tools) but it's a personal preference...

>> Does anybody know why
>>they're implemented as unmodifiable LinkedLists?

The reason for "unmodifiable" is mostly "software engineering". Explanation.

To obtain the list of children of a TypeVariable, you write:
   ... = sometypevar.children();
Here is the implementation of this method:

   public List children()
   {
     if(rep != this)
       {
    return ecr().children();
       }

     return children;
   }

I wanted to make sure that no client code could ever corrupt the children/parents
lists.

If you'd like, you could replace the code to use normal lists, but then you'll
have to do something like:

   public List children()
   {
     List result;

     if(rep != this)
       {
          list = ecr().children();
       }
     else
       {
         list = children;
       }

     return Collections.unmodifiableList(list);
   }

I hope this helps,

Etienne

-- 
Etienne M. Gagnon, Ph.D.             http://www.info.uqam.ca/~egagnon/
SableVM:                                       http://www.sablevm.org/
SableCC:                                       http://www.sablecc.org/
Received on Tue Sep 21 15:15:47 2004

This archive was generated by hypermail 2.1.8 : Tue Sep 21 2004 - 15:30:02 BST