[Soot-list] Inconsistency in handling unicode class names

Christophe Foket christophe.foket at elis.ugent.be
Mon Mar 7 13:06:46 EST 2011


Hi Eric,

I've used the following example class:

-bash-3.2$ cat Test.java
public class Test {

    public static void main(String[] args){
       
        System.out.println(new \u01e4().getClass());
    }
   
    private static class \u01e4 {
       
       
    }
}

Compiling this class will yield an inner class named "Test$Ǥ.class", 
which contains unicode characters.

Kind regards,

Christophe

Eric Bodden wrote:
> Hi Christophe.
>
> I have just tried to reproduce the problem but I am having trouble
> creating a source/class file that contains such characters. (I have
> never used unicode in class names.) Could you tell us how to produce a
> test case or even better send us an appropriate test file?
>
> Cheers,
> Eric
>
> On 6 March 2011 23:20, Christophe Foket <christophe.foket at elis.ugent.be> wrote:
>   
>> Hi Richard,
>>
>> You are probably right. I've tried the same thing with the version of jasmin
>> I got from http://sourceforge.net/projects/jasmin/files/jasmin/jasmin-2.4/
>> and everything works fine.
>> When I run the following jasmin file (correctly generated by Soot)
>>
>> cfoket at degenerate:~/Desktop/test/jasmin-original$ cat Ǥ.jasmin
>> .source D.java
>> .class public Ǥ
>> .super c
>>
>> .implements J
>> .method public <init>()V
>> .limit stack 1
>> .limit locals 1
>> aload_0
>> invokespecial c/<init>()V
>> return
>> .end method
>>
>> .method public z()V
>> .limit stack 0
>> .limit locals 1
>> return
>> .end method
>>
>> through jasmin I end up with the correct class file "Ǥ.class":
>>
>> cfoket at degenerate:~/Desktop/test/jasmin-original$ javap Ǥ
>> Compiled from "D.java"
>> public class Ǥ extends c implements J{
>> public Ǥ();
>> public void z();
>> }
>>
>> Indeed, as you pointed out, the problem lies somewhere in the version of
>> jasmin that comes with Soot.
>>
>> -Christophe
>>
>> Richard L. Halpert wrote:
>>     
>>> Actually, it sounds to me like jasmin is the culprit, since its output is
>>> clearly in ASCII instead of UTF-8, but soot's immediate output (the .jasmin
>>> file) seems to be correct. Jasmin must be converting strings (which in Java
>>> are stored as UTF-8) or files containing your character to ASCII before
>>> writing the final version. This could occur by converting to a character
>>> array or saving to a file without correctly specifying the charset.
>>>
>>> -Richard
>>>
>>> On Mar 6, 2011 7:34 AM, "Eric Bodden"
>>> <bodden at st.informatik.tu-darmstadt.de
>>> <mailto:bodden at st.informatik.tu-darmstadt.de>> wrote:
>>>       
>>     
>
>
>
>   



More information about the Soot-list mailing list