[Soot-list] Inconsistency in handling unicode class names

Christophe Foket christophe.foket at elis.ugent.be
Sun Mar 6 06:58:16 EST 2011


Hello,

I've been using Soot for a few months now, and I recently ran into the 
following problem. I've written a SceneTransformer that assigns each 
class a randomly generated name, chosen from the unicode character set. 
In particular, I have a class "D" that is renamed to "Ǥ" (Unicode: 
\u01E4, Latin Capital Letter G with stroke). However, when Soot writes 
out the renamed class files, something goes wrong. It outputs a class 
"Ǥ.class", which contains the following class defninition:

-bash-3.2$ javap -c Ǥ
Compiled from "D.java"
public class Ǥ extends c implements J{
public Ǥ();
Code:
0: aload_0
1: invokespecial #3; //Method c."<init>":()V
4: return

public void z();
Code:
0: return

}

Note: Ǥ is what you get when converting \u01E4 to UTF-8 according to 
http://java.sun.com/docs/books/jvms/second_edition/html/ClassFile.doc.html#7963
Since "Ǥ" is also referenced from other classes, I cannot run my 
application after transformation, since the class loader cannot find the 
definition of class "Ǥ".

On the other hand, when I make Soot output jasmin "Ǥ.jasmin", which 
after running jasminclasses produces "Ǥ.class", which contains:

-bash-3.2$ javap -c Ǥ
Compiled from "D.java"
public class Ǥ extends c implements J{
public Ǥ();
Code:
0: aload_0
1: invokespecial #3; //Method c."<init>":()V
4: return

public void z();
Code:
0: return

}

Is there a way to make Soot output the right class file right away, 
without having to first generate jasmin files?

Kind regards,

Christophe


More information about the Soot-list mailing list