Java Obfuscation Techniques

This document lists some examples from my work in developing obfuscating program transformation for the Java langauge. This is serving as an appendix to a paper I submitted to the 2007 Compilers Conference called Obfuscating Java: The Most Pain for the Least Gain. This work funded, in part, by NSERC and FQRNT. In this document I will refer to Jad, SourceAgain, and Dava. These are three Java decompilers. Jad is fairly old and not that effective against obfuscated code. SourceAgain is a more modern decompiler. It is a commercial product available here. Dava is an academic project and uses advanced control flow analysis and a powerful typing system. It is available here. The obfuscations in this document (each with original source and decompiled source examples) are as follows:



Return to JBCO Main Page


top

Packing Local Variables into Bitfields (PLVB)

In order to introduce a level of obfuscation on local variables with primitive types (boolean, char, byte integer), it is possible to combine some variables and pack them into one variable which has more bits (a long, for example). The simplest solution would be to pack variables starting from the least-significant or most-significant bit. However, to provide further confusion we randomly choose a range of bits to use for each local variable. For example, an integer variable may get packed into bits 9 through 43. Since each read or write of the original variable must be replaced by a packing and unpacking operation in the obfuscated code, this can potentially slow down the application. Thus, this obfuscation is used sparingly and applied randomly to only some of the locals.

Without further obfuscation of the constants used for packing and unpacking, this kind of obfuscation could be undone by a clever decompiler that was aware of this technique. Nevertheless, SourceAgain and Dava fail because of improper primitive casting.

The following was decompiled by Dava:

static void FillPowerMatrix(Digit matrix[][], Digit x[]) { int n = matrix[0].length; for (int i = 0; i < n; i++) { matrix[i][0] = new Digit(1); for (int j = 1; j < n; j++) { matrix[i][j] = matrix[i][j-1].mult(x[i]); } } } >> static void FillPowerMatrix(Digit[][] r0, Digit[] r1) { long l0; int i2, i3; for (i2 = r0[0].length, i3 = 0; i3 < i2; i3++) { r0[i3][0] = new Digit(1); for (l0 = (long) 1 & 4294967295L ^ l0 & -4294967296L; (int) (l0 & 4294967295L) < i2; l0 = (long) (1 + (int) (l0 & 4294967295L)) ^ l0 & -4294967296L) { r0[i3][(int) (l0 & 4294967295L)] = r0[i3][(int) (l0 & 4294967295L) - 1].mult(r1[i3]); } } }



top

Replacing if(non)null Instructions with Try-Catch Blocks (RIITCB)

The try-catch construct in the Java language can be used to create control flow, either through an explicit throw statement or by inserting a statement that is known to create an exception. This obfuscation exploits a well known fact of Java: invoking an instance method on a null object will always result in a NullPointerException.

While SourceAgain and Dava are able to decompile the simplest form of this obfuscation, they fail on a more complex version which uses jsr instructions. The following is an example of the simplest version decompiled by Dava.

public static void main(String argv[]) { if (argv == null) { System.out.println("Something is really wrong"); return; } for (int i = 0; i < argv.length; i++) { if (argv[i] != null) System.out.println(argv[i]); } } >> public static void main(String[] r0) { Object $n0; int i0; String $r4; $n0 = null; try { r0.equals($n0); } catch (NullPointerException $r2) { System.out.println("Something is really wrong"); return; } for (i0 = 0; i0 < r0.length; i0++) { $r4 = r0[i0]; label_0: { try { $r4.toString(); } catch (NullPointerException $r8) { break label_0; } System.out.println(r0[i0]); } //end label_0: } }



top

Partially Trapping Switch Statements (PTSS)

There is a big gap between high-level structured use of try-catch blocks in Java source and their low-level bytecode implementation. Whereas the Java construct allows only well-nested and structured uses, the bytecode implementation is a much lower trap abstraction.

A bytecode trap specifies a bytecode range [a . . . b], a handler unit h, and an exception class E. If an exception T is raised within the method at bytecode c then the JVM searches for a trap in the list which matches either the type of T or a parent type of T whose bytecode range [a . . . b] contains c. If a trap is found then the stack is emptied, T is pushed on top, and the program counter is set to the handler h.

There are no rules that enforce nesting of these ranges and at the bytecode level these may overlap or even share code with handler code. Thus, one way of confusing decompilers is to trap sequential sections of bytecode that are not necessarily sequential in Java source code. The perfect example of this is the switch construct. In source code, the switch statement encapsulates different blocks of code as targets of the switch. However, in bytecode there is nothing explicitly tying the switch instruction to the different code blocks (i.e. there is no explicit encapsulation).

If the switch is placed within a trap range along with only part of the code blocks which are associated as its targets then there will be no way for an automatic decompiler to output semantically equivalent code that looks anything like the original source. It simply must reproduce the trap in the source code in some form, potentially by duplicating code.

None of Jad, SourceAgain, and Dava were able to properly decompile this obfuscation. Jad simply gives up and places bytecode instructions directly within the source output because it is unable to handle the unnatural (i.e. non-javac looking) control flow. SourceAgain and Dava suffer in that they are unable to match variables and merge types appropriately and this results in problems such as variable usage outside its scope. The following example was decompiled by Dava.

public static int calc(int oper, int val, int total) { switch (oper) { case 0: total += val; break; case 1: total -= val; break; case 2: total *= val; break; default: System.out.println("Invalid Oper!"); } return total; } >> public static int calc(int i0, int i1, int i2) { label_4: { label_3: { label_2: { label_1: { label_0: { try { switch (i0) { case 0: i2 = i2 + i1; break label_0; case 1: break label_1; case 2: break label_2; default: break label_3; } } catch (NullPointerException $r5) { throw $r5; } break label_4; } // end of label_0 } // end of label_1 i2 = i2 - i1; break label_4; } // end of label_2 i2 = i2 * i1; break label_4; } // end of label_3 System.out.println("Invalid Oper!"); } // end of label_4 return i2; }



top

Adding Dead-code Switch Statements (ADSS)

The switch construct in Java bytecode offers a useful control flow obfuscation tool. It is the only organic way (other than the try-catch structure) to manufacture a control flow graph that has a node whose successor count is greater than two. This can severely increase the complexity of a method.

This obfuscation adds edges to the control flow graph by inserting a switch. To ensure that the switch itself is never executed it is wrapped in an opaque predicate. All bytecode instructions with a stack height of zero are potentially safe jump targets for cases in the switch. We have defined an analysis to compute these zero-height locations and we select a random set of them as targets for the cases in the dead switch. This obfuscation increases the connectedness and overall complexity of a method. A decompiler cannot remove the dead switch because it cannot statically determine the value of the opaque predicate.

The following example is decompiled by SourceAgain and is not recompilable:

if (writeImage != null) { try { File file = new File("out"); ImageIO.write(writeImage, "png", file); } catch (Exception e) { System.exit(1); } } System.exit(0); >> if(obj != null) { try { ImageIO.write((RenderedImage) obj, "png", new File("out")); } catch(Exception exception2) { ++i; obj = exception2; i += 2; System.exit(1); } } label_167: { while(lI1.booleanValue() == ___) { switch (i) { default: break; case 3: break label_167; case 1: ++i; obj = exception2; i += 2; System.exit(1); continue; case 2: i += 2; System.exit(1); continue; } } System.exit(0); }



top

Finding and Reusing Duplicate Sequences (RDS)

Because of the nature of bytecode, there is often a fair amount of duplication. Even within one method a sequence of instructions might appear a number of times. By finding these clones and replacing them with a single switched instance we can potentially reduce the size of the method while also confusing the control flow, creating control flow that is not naturally expressed in Java.

We determine when a duplicate sequence D is a clone of the original sequence O using the following checks and analyses:

When a duplicate sequence is found, a new integer which acts as a control flow flag is created. The duplicates are removed completely and replaced with an assignment of the flag to a unique id followed by a goto directed at the first instruction in the original sequence. The original sequence is prepended with instructions which store 0 to the flag (the "first" unique id) and appended with a switch. The default jump falls through to the next instruction (the successor of the original sequence). A jump to the successor of each duplicate sequence is added to the switch based on its flag id. For each method we search for duplicates of length 3 through 20.

The following example was decompiled with Dava.

// This is the Goldrader bit-reversal algorithm protected static void bitreverse(double data[]) { int n=data.length/2; int nm1 = n-1; int i=0; int j=0; for (; i < nm1; i++) { int ii = i << 1; int jj = j << 1; int k = n >> 1; if (i < j) { double tmp_real = data[ii]; double tmp_imag = data[ii+1]; data[ii] = data[jj]; data[ii+1] = data[jj+1]; data[jj] = tmp_real; data[jj+1] = tmp_imag; } while (k <= j) { j -= k; k >>= 1; } j += k; } } >> protected static void bitreverse(double[] r0) { double d0, $d1; int i0, i1, i2, i3, i4, i5, i6, i7, $i9, i11, $i12; d0 = 0.9637047970232077; i0 = r0.length / 2; i1 = i0 - 1; i2 = 0; i3 = 0; label_2: while (i2 < i1) { i4 = i2 << 1; i5 = i3 << 1; $i9 = i0; i6 = 0; while (true) { i7 = $i9 >> 1; label_1: switch (i6) { case 1: break label_1; default: if (i2 >= i3) { break label_1; } $d1 = r0[i4]; i11 = 0; label_0: while (true) { $i12 = i4 + 1; switch (i11) { case 1: r0[$i12] = r0[i5 + 1]; r0[i5] = $d1; r0[i5 + 1] = d0; break label_1; default: d0 = r0[$i12]; r0[i4] = r0[i5]; i11 = 1; continue label_0; } } } if (i7 > i3) { i3 = i3 + i7; i2++; continue label_2; } i3 = i3 - i7; $i9 = i7; i6 = 1; } } }



top

Converting Branches to jsr Instructions (CB2JI)

The jsr bytecode1, short for Java subroutine, is analogous to the goto other than the fact that it pushes a return address on the stack. Normally, the return address is stored to a register after a jsr jump and when the subroutine is complete the ret bytecode specifies the register containing the return address and control is returned to that address. I use the term "address" loosely here, as it really is just a bytecode offset within the classfile.

The jsr - ret construct is particularly difficult to handle when dealing with typing issues because each subroutine can be called from multiple places, requiring that type information be merged and therefore the result is a more conservative estimate. Also, decompilers will usually expect to find a specific ret for every jsr.

This obfuscation replaces if and goto targets with jsr instructions. The old jump targets have pop instructions inserted before them in order to throw away the return address which is pushed onto the stack. If the jump target's predecessor in the instruction sequence falls through then a goto is inserted after it which jumps directly to the old target (stepping over the pop when there is no return address to remove).

The following example has been decompiled by SourceAgain. It is not recompilable code because some statements fall after the return.

[1]The jsr was originally introduced to handle finally blocks - sections of code that are ensured to run after a try block whether an exception is thrown or not. It is a historical anomaly that is no longer used by modern javac compilers.

public static void main(String args[]) { int level = 23; if (args.length > 1 && args[1] != null && args[1].equalsIgnoreCase("-i")) drawToImage = true; if (args.length > 0) { String arg = args[0]; if (arg == null || arg.toLowerCase().indexOf("help") >= 0) { printHelp(); System.exit(1); } else { try { level = Integer.parseInt(arg); } catch (NumberFormatException exc) { System.out.println("Illegal Depth Number!"); System.exit(1); } } } else { drawToImage = true; } setLevel(level); new Fractal(); } >> public static void main(String[] as) { int i = 23; if(as.length > 1 && as[1] != null && as[1].equalsIgnoreCase("-i")) drawToImage = true; if(as.length > 0) { String s = as[0]; if(s == null || s.toLowerCase().indexOf("help") >= 0) { printHelp(); System.exit(1); setLevel(i); new Fractal(); } else { try { i = Integer.parseInt(s); } catch( NumberFormatException numberformatexception1 ) { System.out.println("Illegal Depth Number!"); System.exit(1); } setLevel(i); new Fractal(); } } else { drawToImage = true; setLevel(i); new Fractal(); } return; setLevel(i); new Fractal(); }



top

Indirecting if Instructions (III)

While javac will always produce very predictable try blocks it is possible to abuse these constructs in other ways. This obfuscation takes advantage of this by indirecting if branching through goto instructions which are within a special try block. Normally, modern compilers would be able to recognize the redundancy of the extra goto (through control flow graph analysis) and remove it, modifying the if to jump directly to its final target. However, since a try block protects all these indirections it is not valid to remove them unless the code can be statically shown to never raise an exception. Since there is no explicit goto allowed in Java source, it becomes very difficult for a decompiler to synthesize equivalent source code.

The following is an example decompiled by SourceAgain. It is improper code and will not recompile (note that it tries to return "texception1" outside of its scope, and in fact it is the wrong return value anyway).

public static int sum(int iarry[]) { int result = 0; for (int i = 0; i < iarry.length; i++) { result += iarry[i]; } return result; } >> public static int sum(int[] ai) { int i = 0; int j = 0; while(j < ai.length) { Object tobj; i += ai[j]; ++j; if( (byte) 654154038 % 9 == 0 ) continue; tobj = null; return texception1; } try { } catch(Exception texception1) { throw texception1; } return i; }



top

Reordering load Instructions Above if Instructions (RLAII)

In some cases patterns in bytecode produced by javac can be examined to identify areas of possible obfuscation. This simple obfuscation looks for situations where a local variable is used directly following both paths of an if branch. That is, along both branches the first instruction loads the variable on to the stack. This is a somewhat common occurance - consider code that follows the pattern:

if x then i=... else i=....

This obfuscation then moves the load instruction above the if, removing its clones along both branches. While a modern decompiler like Dava which is based on a 3-address intermediate representation will be able to overcome this obfuscation with little problem, any decompiler relying on pattern matching (such as Jad) will become very confused.

The following is decompiled by SourceAgain and, in fact, it has problems as well. The code is very far off from the original intent and is not recompilable.

public static int sum(int iarry[]) { int result = 0; for (int i = 0; i < iarry.length; i++) { result += iarry[i]; } return result; } >> public static int sum(int ai[]) { int i = 0; int j = 0; do { if(ai.length >= i) break; i = j + ai[j]; j++; } while(true); return; }



top

Renaming Identifiers: Classes, Methods, and Fields (RI[C,M,F])

Renaming identifiers is one of the most common obfuscations, especially for the Java language. Therefore I won't go into detail but I will present an example to show how ugly and annoying it can be. Note that this method was obfuscated by using the three possible dictionaries of {S, 5, $}, {l, I, 1}, and {_} to create new names.

This example was decompiled by Dava:

void generate(Graphics g, float x, float y, float width, float height, float angle, int level) { float x1; float y01; turtle_x = x; turtle_y = y; turtle_r = height; step(); x1 = turtle_x; y01 = turtle_y; level--; if (level < 3) { g.setColor(Color.green); drawbranch(g, width, (int) x, w1H - (int) y, (int) x1, w1H - (int) y01); } else { g.setColor(Color.white); drawbranch(g, width, (int) x, w1H - (int) y, (int) x1, w1H - (int) y01); } if (level > 0) { turtle_theta = point(x, y, x1, y01); turn(left_angle); generate(g, turtle_x, turtle_y, left_width_factor * width, left_height_factor * height, left_angle, level); turtle_theta = point(x, y, x1, y01); turn(-right_angle); generate(g, x1, y01, left_width_factor * width, left_height_factor * height, right_angle, level); } } >> void ___(Graphics r1, float f0, float f1, float f2, float f3, float f4, int i0) { lII = f0; lll = f1; lIl = f3; lIII(); float f6 = lII; float f5 = lll; int i3 = i0 + -1; if (i3 >= 3) { r1.setColor(Color.white); ___(r1, f2, (int)f0, ll1 - (int)f1, (int)f6, ll1 - (int)f5); } else { r1.setColor(Color.green); ___(r1, f2, (int)f0, ll1 - (int)f1, (int)f6, ll1 - (int)f5); } if (i3 > 0) { $5$ = this.____(f0, f1, f6, f5); l11I(Ill1); _____(r1, lII, lll, $5S * f2, Il1 * f3, Ill1, i3); $5$ = ____(f0, f1, f6, f5); l11I((- (SSS))); _____(r1, f6, f5, $5S * f2, Il1 * f3, SSS, i3); } }



top

Disobeying Constructor Conventions (DCC)

The Java language spec stipulates that class constructors - those methods used to instantiate a new object of that class type - must always call either an alternate constructor of the same class or their parent class' constructor as the first directive. In the event that neither is specified in source javac explicitly adds a call to the parent at the beginning of the method in the compiled bytecode.

While this super call, as a rule, must be the first statement in the Java source it is, in fact, not required to be the first within the bytecode. By exploiting this fact it is possible to create constructor methods whose bytecode representation cannot be converted into legal source. This obfuscation randomly chooses among four different approaches to transforming constructors in order to confuse decompilers.

Wrapping the super call within a try block: This ensures that any decompiled source will be required to wrap the call in a try as well to conform to the rules of Java. To properly allow the exception to propagate, the handler unit - a throw instruction - is appended to the end of the method.

This example is decompiled by Dava and is not recompilable. Note that Dava was unable to find the super call correctly and did not properly rename it to "super".

ObjectA() { super(); } >> ObjectA() { try { this.<init>(); } catch (IndexOutOfBoundsException $r2) { throw $r2; } }


Taking advantage of classes which are children of java.lang.Throwable: This approach inserts a throw instruction before the super call and creates a new try block in the method that traps just the new throw. The handler unit is designated to be the super call itself. This takes advantage of the fact that the class is throwable and can be pushed onto the stack through the throw mechanism instead of the standard load.

This example is decompiled by Dava and is not recompilable. Note that Dava is still unable to properly rename the init call to "super".

ObjectA() { super(); } >> ObjectA() { try { throw this; } catch (Throwable $r2) { $r2.<init>(); return; } }


Inserting a jsr jump and a pop instruction directly before the super constructor call: The jsr's target is the pop instruction, which removes the subsequent return address that is pushed on the stack as a result of the jsr instruction. This confuses the majority of decompilers which have problems dealing with jsr instructions.

No example is given for this.

Adding new instructions before the super call: This approach inserts a dup followed by an ifnull before the super call. The ifnull target is the super call. The if branch instruction will always be false since the object it is comparing is the object being instantiated in the current constructor. Two new instructions are inserted along the false branch of the if: a push null followed by a throw. A new try block is created spanning from the ifnull up to the super call. The catch block is appended to the end of the method as a sequence of pop, load o, goto sc, where o is the object being instantiated and sc is the super call. This confuses decompilers because it is more difficult to deduce which local will be on the stack when the super call site is reached.

This example is decompiled by SourceAgain and is not recompilable. Note that in this example SourceAgain throws "this" ("tz1") when it should be throwing null (This would be technically okay if ObjectA extended java.lang.Throwable).

ObjectA() { super(); } >> ObjectA() { ObjectA tz1 = this; try { if(tz1 != null) throw tz1; } catch(IndexOutOfBoundsException unused2) { tz1 = this; } tz1(); }



top

Combing Try Blocks With Their Catch Blocks (CTBCB)

Java source code can only represent try-catch blocks in one way: with a try block directly followed by one or more catch blocks associated with it. In bytecode, however, try blocks can protect the same code that is used to handle the exceptions it throws or one of its catch blocks can appear "above" it in the instruction sequence.

This obfuscation combines a try-catch block such that both the beginning of the try block and the beginning of the catch block are the same instruction. This is accomplished by prepending the first unit of the try block with an if that branches to either the try code or the catch code based on an integer control flow flag. Once the try section has been officially entered, the flag is set to indicate that any execution of the if in the future should direct control to the catch section. The integer flag is reset to its original value when the try section is completed.

This example was decompiled by Dava and is

public static void main (String args[]) { x i; try { try { i = new x(); i.foo(); } catch (x exc) { System.out.println("inner exc"); if (exc.eventime == 0) throw exc; } } catch (x exc1) { System.out.println("outer exc"); } } >> public static void main(String[] r0) { boolean z0; int i0; z0 = false; i0 = 0; label_5: while (true) { label_4: { label_3: { label_0: { if (i0 != 0) { break label_0; } i0++; z0 = false; break label_3; } //end label_0: try { System.out.println("inner exc"); if ($r1.eventime != 0) { break label_4; } throw $r1; } catch (x $r2) { } } //end label_3: label_1: while (true) { label_2: { try { if ( ! (z0)) { z0 = true; (new x()).foo(); break label_2; } } catch (x $r2) { continue label_1; } catch (x $r1) { continue label_5; } System.out.println("outer exc"); break label_4; } //end label_2: break label_4; } } //end label_4: return; } }



top

Goto Instruction Augmentation (GIA)

Explicit goto statements are not allowed in Java source. Studies have shown this to be a good design decision. In source, you must use abrupt statements instead. Java bytecode does have a goto instruction because it is necessary for simulating higher-level constructs such as loops. Therefore it is possible to insert an explicit goto within the bytecode. While it is very easily reversed using control flow graph analysis it can still cause many simple decompilers to fail.

Our obfuscation takes a simple approach. It randomly splits a method into two sequential parts: The first, containing the start of the method, P1 and a second, containing the end of the method, P2. It then reorders these two parts and inserts two goto instructions. The first goto is inserted as the first instruction in the method and points to the start of P1. The second is inserted at the end of P1 and targets P2. The final method now looks like { goto P1, P2, P1, goto P2 }.

As an added step, a try block is manufactured to span from the end of P2 to the beginning of P1, thereby "gluing" the two sections together. This makes it difficult for a decompiler to shuffle the instructions back to their original order.

The following example was decompiled by Dava.

private static double[][] RandomMatrix(int M, int N, Random R) { double A[][] = new double[M][N]; for (int i=0; i<N; i++) for (int j=0; j<N; j++) A[i][j] = R.nextDouble(); return A; } >> private static double[][] RandomMatrix(int i0, int i1, Random r0) { double[] r1; int i2, i3; double d0; double[][] r2; try { r2 = new double[i0][i1]; i3 = 0; } catch (Throwable $r3) { throw $r3; } while (i3 < i1) { for (i2 = 0; i2 < i1; i2++) { r1 = r2[i3]; d0 = r0.nextDouble(); r1[i2] = d0; } i3++; } try { return r2; } catch (Throwable $r3) { throw $r3; } }



HTML code listings generated with java2html.

Documented last edited Oct-16-2006 by Michael Batchelder