perhaps an extra copy of this message ...
Laurie
+-------------------------------------------------------------+
| Laurie Hendren, Professor, School of Computer Science |
| McGill University |
| 318 McConnell Engineering Building tel: (514) 398-7391 |
| 3480 University Street fax: (514) 398-3883 |
| Montreal, Quebec H3A 2A7 hendren@cs.mcgill.ca |
| CANADA http://www.sable.mcgill.ca/~hendren |
+-------------------------------------------------------------+
---------- Forwarded message ----------
Date: Tue, 21 Sep 2004 10:08:10 -0400
From: Etienne Gagnon <gagnon.etienne_m@uqam.ca>
Reply-To: egagnon@sablevm.org
To: Prof. Laurie HENDREN <hendren@sable.mcgill.ca>
Cc: abc@comlab.ox.ac.uk, John Jorgensen <jorgnsn@lcd.uregina.ca>
Subject: Re: [abc] type assigner complexity in Soot
Hi Laurie,
>>It seems to me that the merging approach taken in TypeVariable.java is
>>problematic, and we're probably pretty close to its
>>worst-case input with our example program.
>>
>>There is, however, what seems like a reasonably easy way to
>>significantly reduce the time and memory requirements.
>>
>>The type inference algorithm constructs a directed constraint graph of
>>the static types and (at the beginning) untyped local variables. It then
>>collapses cycles in this graph by merging two adjacent nodes in a cycle
>>repeatedly until the cycle consists of only one node (at least if I
>>understand the code correctly - I haven't looked at it in *too* much
>>detail. It would seem more reasonable to replace the whole cycle at once
>>in any case, as when we start merging neighbours we must have already
>>detected the cycle).
It's been a long time since I looked at the code, but it think that indeed
the current algorithm "merges" cycles node by node.
>>It's the algorithm for merging to TypeVariable objects that seems
>>to be the biggest problem in our case. It makes a new TreeSet containing
>>all of the elements of its "parents" list, then adds all the elements of the
>>"parents" list of the TypeVariable it's merging with, then extracts a
>>list from the TreeSet and remembers this as its new "parents" list.
The main problem is that when you merge two nodes A and B, you want to fix
their parent/child lists so that you eliminate duplicates (which is tricky,
due to fast union) and you make these lists point to the representative of
each set.
>>Note that a lot of time and memory is wasted here - the TreeSet is
>>constructed only to remove duplicates, and is discarded immediately.
Right.
>>Also, the old "parents" list is thrown away, a new one is constructed.
Right.
>>The same procedure applies to the "children" lists of the two objects.
Right.
>>This results in nodes aggregating 6000+ parents and 3500+ children in
>>steps of 1-3 at a time, and these are then collapsed back down to
>>single-digit numbers, again, 1-3 at a time. The algorithm seems to
>>aggregate all nodes into one node, repeatedly merging it with nodes with
>>0-1 children 1 and parent. Each iteration constructs a TreeSet and a new
>>List for all of the parents and children.
>>
>>Part of the reason for this repeated construction of the list is that
>>both the parents and children lists are kept as "unmodifiableList"s.
>>Perhaps I'm missing something, but I don't see a reason
>>why this should be the case - if we really want to protect the lists
>>from improper write access, then just expose read-only interface
>>methods, and keep the rest private to the package - however, clearly it
>>would be beneficial to be able to destructively update the lists from
>>within the toolkits.typing package.
>>
>>In fact, I have a feeling that the best data structure for these two
>>lists is either a HashSet or a LinkedHashSet.
When I wrote this code, LinkedHashSet did not exist, and the worst case complexity
of your HashSet is worse than that of the TreeSet. Maybe some profiling would be
desirable? Personally, I prefer coding while taking into account the worst case,
when the code is likely to be used in unforeseen ways (which is usually the case
with compiler tools) but it's a personal preference...
>> Does anybody know why
>>they're implemented as unmodifiable LinkedLists?
The reason for "unmodifiable" is mostly "software engineering". Explanation.
To obtain the list of children of a TypeVariable, you write:
... = sometypevar.children();
Here is the implementation of this method:
public List children()
{
if(rep != this)
{
return ecr().children();
}
return children;
}
I wanted to make sure that no client code could ever corrupt the children/parents
lists.
If you'd like, you could replace the code to use normal lists, but then you'll
have to do something like:
public List children()
{
List result;
if(rep != this)
{
list = ecr().children();
}
else
{
list = children;
}
return Collections.unmodifiableList(list);
}
I hope this helps,
Etienne
-- Etienne M. Gagnon, Ph.D. http://www.info.uqam.ca/~egagnon/ SableVM: http://www.sablevm.org/ SableCC: http://www.sablecc.org/Received on Tue Sep 21 15:15:47 2004
This archive was generated by hypermail 2.1.8 : Tue Sep 21 2004 - 15:30:02 BST