Soot phase options

Soot supports the powerful--but initially confusing--notion of ``phase options''. This document aims to clear up the confusion so you can exploit the power of phase options.

Soot's execution is divided into a number of phases. For example, JimpleBodys are built by a phase called jb, which is itself comprised of subphases, such as the aggregation of local variables (jb.a).

Phase options provide a way for you to change the behaviour of a phase from the Soot command-line. They take the form -p phase.name option:value. For instance, to instruct Soot to use original names in Jimple, we would invoke Soot like this:

java soot.Main foo -p jb use-original-names:true

Multiple option-value pairs may be specified in a single -p option separated by commas. For example,

java soot.Main foo -p cg.spark verbose:true,on-fly-cg:true

There are five types of phase options:

Boolean options take the values ``true'' and ``false''; if you specify the name of a boolean option without adding a value for it, ``true'' is assumed.
Multi-valued options take a value from a set of allowed values specific to that option.
Integer options take an integer value.
Floating point options take a floating point number as their value.
String options take an arbitrary string as their value.

Each option has a default value which is used if the option is not specified on the command line.

All phases and subphases accept the option ``enabled'', which must be ``true'' for the phase or subphase to execute. To save you some typing, the pseudo-options ``on'' and ``off'' are equivalent to ``enabled:true'' and ``enabled:false'', respectively. In addition, specifying any options for a phase automatically enables that phase.

Adding your own subphases

Within Soot, each phase is implemented by a Pack. The Pack is a collection of transformers, each corresponding to a subphase of the phase implemented by the Pack. When the Pack is called, it executes each of its transformers in order.

Soot transformers are usually instances of classes that extend BodyTransformer or SceneTransformer. In either case, the transformer class must override the internalTransform method, providing an implementation which carries out some transformation on the code being analyzed.

To add a transformer to some Pack without modifying Soot itself, create your own class which changes the contents of the Packs to meet your requirements and then calls soot.Main.

The remainder of this document describes the transformations belonging to Soot's various Packs and their corresponding phase options.

Jimple Body Creation (`jb`)

Jimple Body Creation creates a JimpleBody for each input method, using either coffi, to read .class files, or the jimple parser, to read .jimple files.

`iter`	Iter is a simple, iterative algorithm, which propagates everything until the graph does not change.
`worklist`	Worklist is a worklist-based algorithm that tries to do as little work as possible. This is currently the fastest algorithm.
`cycle`	This algorithm finds cycles in the PAG on-the-fly. It is not yet finished.
`merge`	Merge is an algorithm that merges all concrete field (yellow) nodes with their corresponding field reference (red) nodes. This algorithm is not yet finished.
`alias`	Alias is an alias-edge based algorithm. This algorithm tends to take the least memory for very large problems, because it does not represent explicitly points-to sets of fields of heap objects.
`none`	None means that propagation is not done; the graph is only built and simplified. This is useful if an external solver is being used to perform the propagation.

`hash`	Hash is an implementation based on Java's built-in hash-set.
`bit`	Bit is an implementation using a bit vector.
`hybrid`	Hybrid is an implementation that keeps an explicit list of up to 16 elements, and switches to a bit-vector when the set gets larger than this.
`array`	Array is an implementation that keeps the elements of the points-to set in a sorted array. Set membership is tested using binary search, and set union and intersection are computed using an algorithm based on the merge step from merge sort.
`heintze`	Heintze's representation has elements represented by a bit-vector + a small 'overflow' list of some maximum number of elements. The bit-vectors can be shared by multiple points-to sets, while the overflow lists are not.
`sharedlist`	Shared List stores its elements in a linked list, and might share its tail with other similar points-to sets.
`double`	Double is an implementation that itself uses a pair of sets for each points-to set. The first set in the pair stores new pointed-to objects that have not yet been propagated, while the second set stores old pointed-to objects that have been propagated and need not be reconsidered. This allows the propagation algorithms to be incremental, often speeding them up significantly.

`Geom`	Geometric Encoding.
`HeapIns`	Heap Insensitive Encoding. Omit the heap context range term in the encoded representation, and in turn, we assume all the contexts for this heap object are used.
`PtIns`	Pointer Insensitive Encoding. Similar to HeapIns, but we omit the pointer context range term.

`PQ`	Priority Queue (sorted by the last fire time and topology order)
`FIFO`	FIFO Queue

`ofcg`	Performs points-to analysis and builds call graph together, on-the-fly.
`cha`	Builds only a call graph using Class Hieararchy Analysis, and performs no points-to analysis.
`cha-aot`	First builds a call graph using CHA, then uses the call graph in a fixed-call-graph points-to analysis.
`ofcg-aot`	First builds a call graph on-the-fly during a points-to analysis, then uses the resulting call graph to perform a second points-to analysis with a fixed call graph.
`cha-context-aot`	First builds a call graph using CHA, then makes it context-sensitive using the technique described by Calman and Zhu in PLDI 04, then uses the call graph in a fixed-call-graph points-to analysis.
`ofcg-context-aot`	First builds a call graph on-the-fly during a points-to analysis, then makes it context-sensitive using the technique described by Calman and Zhu in PLDI 04, then uses the resulting call graph to perform a second points-to analysis with a fixed call graph.
`cha-context`	First builds a call graph using CHA, then makes it context-sensitive using the technique described by Calman and Zhu in PLDI 04. Does not produce points-to information.
`ofcg-context`	First builds a call graph on-the-fly during a points-to analysis, then makes it context-sensitive using the technique described by Calman and Zhu in PLDI 04. Does not perform a subsequent points-to analysis.

`auto`	When the bdd option is true, the BDD-based worklist implementation will be used. When the bdd option is false, the Traditional worklist implementation will be used.
`trad`	Normal worklist queue implementation
`bdd`	BDD-based queue implementation
`debug`	An implementation of worklists that includes both traditional and BDD-based implementations, and signals an error whenever their contents differ.
`trace`	A worklist implementation that prints out all tuples added to every worklist.
`numtrace`	A worklist implementation that prints out the number of tuples added to each worklist after each operation.

`auto`	When the bdd option is true, the BuDDy backend will be used. When the bdd option is false, the backend will be set to none, to avoid loading any BDD backend.
`buddy`	Use BuDDy implementation of BDDs.
`cudd`	Use CUDD implementation of BDDs.
`sable`	Use SableJBDD implementation of BDDs.
`javabdd`	Use JavaBDD implementation of BDDs.
`none`	Don't use any BDD backend. Any attempted use of BDDs will cause Paddle to crash.

`insens`	Builds a context-insensitive call graph.
`1cfa`	Builds a 1-CFA call graph.
`kcfa`	Builds a k-CFA call graph.
`objsens`	Builds an object-sensitive call graph.
`kobjsens`	Builds a context-sensitive call graph where the context is a string of up to k receiver objects.
`uniqkobjsens`	Builds a context-sensitive call graph where the context is a string of up to k unique receiver objects. If the receiver of a call already appears in the context string, the context string is just reused as is.
`threadkobjsens`	Experimental option for thread-entry-point sensitivity.

`auto`	When the bdd option is true, the Incremental BDD propagation algorithm will be used. When the bdd option is false, the Worklist propagation algorithm will be used.
`iter`	Iter is a simple, iterative algorithm, which propagates everything until the graph does not change.
`worklist`	Worklist is a worklist-based algorithm that tries to do as little work as possible. This is currently the fastest algorithm.
`alias`	Alias is an alias-edge based algorithm. This algorithm tends to take the least memory for very large problems, because it does not represent explicitly points-to sets of fields of heap objects.
`bdd`	BDD is a propagator that stores points-to sets in binary decision diagrams.
`incbdd`	A propagator that stores points-to sets in binary decision diagrams, and propagates them incrementally.

`medium-grained`	Try to identify transactional regions that can employ a dynamic lock to increase parallelism. All side effects must be protected by a single object. This locking scheme aims to approximate typical Java Monitor usage.
`coarse-grained`	Insert static objects into the program for synchronization. One object will be used for each group of conflicting synchronized regions. This locking scheme achieves code-level locking.
`single-static`	Insert one static object into the program for synchronization for all transactional regions. This locking scheme is for research purposes.
`leave-original`	Analyse the existing lock structure of the program, but do not change it. With one of the print options, this can be useful for comparison between the original program and one of the generated locking schemes.

`unsafe`	Modify the visibility on code so that all inlining is permitted.
`safe`	Preserve the exact meaning of the analyzed program.
`none`	Change no modifiers whatsoever.

`unsafe`	Modify the visibility on code so that all inlining is permitted.
`safe`	Preserve the exact meaning of the analyzed program.
`none`	Change no modifiers whatsoever.

`safe`	Safe, but only considers moving additions, subtractions and multiplications.
`medium`	Unsafe in multi-threaded programs, as it may reuse the values read from field accesses.
`unsafe`	May violate Java's exception semantics, as it may move or reorder exception-throwing statements, potentially outside of `try-catch` blocks.

`optimistic`
`pessimistic`