[Soot-list] Decoding CodeAttribute in the VM

Chris Pickett chris.pickett at mail.mcgill.ca
Wed Apr 12 14:12:40 EDT 2006


Hi Sébastien,

[[ I'm assuming you're using SableVM; however, this should still be
somewhat helpful if not. ]]
[[ I'm also CC'ing this to sablevm-devel at sablevm.org since the info might
be useful later. ]]

If you want to decode Soot-generated attributes in SableVM, you should
look at the ReturnValueUseTable and ParameterDependenceTable attributes in
my sandbox (chris/sandbox/sablevm) at the SableVM repository.

In order to parse these attributes just basically clone all the steps I
took; the only difference should be how you handle the parsing of the
actual table entries.  There are quite a few files you have to touch. 
Haiying in 621 did this for her purity analysis without much intervention
from me, even though she didn't end up using the code, so it's definitely
possible.  grep is your friend here.  The place where the actual decoding
happens is class_file_parser.m4.c.

It's documented a little bit in my comments (look at
ParameterDependenceTable), but basically the generic attribute structure
is:

  u2 attribute_name_index;
  u4 attribute_length;
  u2 table_length;
  u1 table[attribute_length - 2];

The table_length and table[] array is what Soot gives you.  It is a stream
of bytes consisting of <pc, data[]> pairs.  However, please note that each
pair may have a varying length; to check you don't overrun the end of the
attribute use attribute_length.  To check that you correctly parse the
right number of pairs, use table_length.

The entries of table[] look like this:

  u2 pc;
  u1 data[];

where the length of data[] is either known ahead of time by you to be of
fixed size (e.g. u2), or the initial bytes of data[] encodes its length
(you probably wouldn't want more than one byte, since that allows up to
256 bytes per tagged pc).  You have to set this up yourself when you
generate the tag in Soot.

I seem to remember using u1, u2, u4 because that's what the VM spec calls
u8, u16, u32 when it describes code attributes.  Don't worry about parsing
a u16 and storing it into a jint -- that's fine.

In order to use the actual data, you need to do some secondary parsing
inside prepare_code.c.  This is because SableVM expands and specializes
many instructions and the bytecode pc does not correspond to the internal
_svmt_code[] pc for a method body.  I also have some examples of this in
prepare_code.m4.c; note that you probably don't want to use m4 here.

My code is a bit messy, I hope that's ok.  Let me know how you make out.

Cheers,
Chris





On Wed, April 12, 2006 2:13 am, Sébastien wrote:
> Okay,
>
>
>
>
>> From my last mail, I think I got the point...
>>
>
>
> In the following byte array,
>
>
> 0 2 0 18 0 0 35 0
>
>
>
> 0 2 = the number of tags.
>
> 0 18 0 = my first tag.
>
> 0 35 0 = my second tag.
>
>
>
> Is that exact?
>
>
>
>
> Is the 0 in front of each information only a separator?
>
>
>
>
> Thanks,
>
>
> Sébastien Adam
>
>
> Courriel : sebastien.adam at gmail.com
>
>
>
>
> _______________________________________________
> Soot-list mailing list
> Soot-list at sable.mcgill.ca
> http://mailman.cs.mcgill.ca/mailman/listinfo/soot-list
>
>




More information about the Soot-list mailing list