[Soot-list] How to reuse CFG framework without low-level byte code.

Chris Pickett cpicke at cs.mcgill.ca
Sun Jul 10 21:44:07 EDT 2005


Hi Pavel,

Yes, you don't get to keep aggregated expressions in Jimple -- it is a 
3-address IR.

In your original message, you wrote, "The purpose of the tool is to 
convert programs in proprietary legacy language to Java."  I 
misinterpreted that as "Java bytecode", although now I am reading it as 
"Java source code".

The Grimp IR which is part of Soot does let you keep aggregated 
expressions, although it is still not Java source code (you probably 
don't get items (2) and (3) in your list).

I assume the goto elimination is a *refactoring* taking place because 
you want maintainable Java source code.

I don't think you can easily map the Jimple CFG classes to Java source. 
  If other list members know about a CFG represenation for Java source 
that is good for doing refactorings like goto elimination, please 
comment.  Or even if Soot's CFG classes can be reused nicely.

It does seem likely that somebody else has encountered this problem, and 
you might try looking for an Eclipse plugin.  Googling turns up the 
jGRASP project, which provides graphical CFG views of Java source.

Cheers,
Chris

emujrock wrote:
> Hi Chris.
> 
> Thanks for the response,
> 
> The language X has "for", "while" and "switch" so there is no need to 
> use goto, but it is just like C allows it. The syntax is more like 
> mixture of FORTRAN  PL/1 and COBOL. It is from the same age, no reserved 
> keywords and so on  ;-(
> 
> If I go through jimple I would loose some source code information which 
> dava will not restore (please correct if I am wrong).
> 
> 1. a[I+J] -- array indexes as expressions, not just simple variables or 
> constants.
> 
> 2. for, while -- All existing "for", "while", "switch" statements should 
> be broken down. Suppose that I have some good loops, i.e. they do not 
> have goto inside them. I could treat such good loops in my analysis as 
> basic block and do not go through this "goto” elimination. (Maybe I 
> would not mind useless round trip too much if dava would restore it 
> back.  Based on what I saw dava does not restore “for” loop to the 
> original form)
> 
> 3. a+=b, a++ -- should be broken in two.
> 
> 4. x+sin(y) -- Complicated expressions including function calls.
> 
> 5. call foo(x+y) -- similar to 4 should be broken down.
> 
> I understand the reason for jimple to be so low level because in 
> addition to CFG you also need to analyze variables (usage and type). In 
> my case I just need to reuse CFG part.
> 
> Sincerely,
> Pavel
> 
> 
>> Hi Pavel,
>>
>> Maybe you should consider trying to convert from your AST to Soot's 
>> representation of Java classes, with method bodies written in the 
>> Jimple intermediate representation (IR).  From there, Soot will 
>> generate bytecode for you.
>>
>> You don't need to go through bytecode to create Jimple... it's just 
>> that when Soot is analysing *existing* classfiles, it must parse the 
>> bytecode somehow to create a Jimple representation that is suitable 
>> for analysis.
>>
>> The Soot tutorials should also clear some things up:
>>
>> http://www.sable.mcgill.ca/soot/tutorial
>>
>> in particular, "Creating a Class File from Scratch."
>>
>> If your language is C-like, you might have an easier time generating 
>> Java source code directly.  If your language is bytecode-like, you 
>> might have an easier time generating bytecode directly.  In both 
>> cases, Soot can still be used to analyse your classfiles.
>>
>> Cheers,
>> Chris
>>
>> emujrock wrote:
>>
>>> I am working on code tool for some legacy system. The purpose of the
>>> tool is to convert programs in proprietary legacy language to Java. The
>>> language has goto and I need some goto removal framework. It looks like
>>> soot can do exactly what I need.
>>>
>>> I start from AST tree of legacy program, not from java bytecode. My
>>> direct attempt to use soot goto removal algorithm failed because it
>>> relies upon low level packages baf.* and jimple.*. I do not understand
>>> how to build “Body”, “Unit” and “Block” instances directly without 
>>> going through bytecode. From what I see  in the code it looks like it 
>>> may be impossible to do. If I am wrong could somebody point me to the 
>>> right direction? If it can not be done now:
>>>
>>>
>>> Do you have any plans to separate high level CFG algorithms from
>>> the rest of the code? If you need volunteers to do this I am ready to
>>> contribute.
>>>
>>> Sincerely,
>>> Pavel
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
> 
> 
> 




More information about the Soot-list mailing list