3-Address Code / Quadruples Code / Static Single Assignment Form

OPAL enables you to transform Java bytecode into a 3-address code (TAC) which is also sometimes called "Quadruples Code". The standard representation provided by OPAL is always in SSA form and is created after performing a low-level, highly configurable data-flow analysis; based on the configured low-level analysis, the amount of additional information about the program's values may differ greatly. In the simplest case, only basic type information is provided. In more advanced cases, constant computations are perfomed, constants are propagated, must-alias information is provided, dead paths due to programming errors/context-dependent impossible paths are automatically pruned, and so on. In all cases the 3-address code (in the following called TAC) immediately provides complete def-use information and the control-flow graph is also reified.

Using the 3-Address Code (TAC)

Exploring the 3-Address Code using the command line tool TAC

A very simple way to explore the TAC generated by OPAL is to use the TAC tool (org.opalj.tac.TAC). It is part of the Abstract Interpretation Framework. To use it, start the sbt console, change to the Abstract Interpretation Framework and specify the method for which you want to have the TAC.

current_folder$ sbt shell
> project AbstracInterpretationFramework
> run -source <source jar; e.g., ../bi/target/scala-2.13/resource_managed/test/ai.jar> -class <name of the class; e.g., ai.domain.DeadVariables> -method <name of the mehtod; e.g., initialValueIsAlwaysDead>

If the method is found, you will get the control-flow graph and the TAC generated using a lightweight abstract interpretation/advanced data-flow analysis. Alternatively, you can download a pre-packaged jar containing the TAC tool from our download site (OPALTACDisassembler.jar); you can run it using java -jar OPALTACDisassembler.jar.

Using the 3-Address Code

A very straight forward way to get a method's TAC is to use the respective ProjectInformationKey as shown next.

Step 0 - import the respective classes:

import org.opalj.ai._; import org.opalj.br._ ; import org.opalj.br.analyses._; import org.opalj.br.instructions._; import org.opalj.tac._

Step 1 - Load a project's class files - here we load a test project which is part of OPAL - and also load the public API of the JDK; the latter is primarily necessary to get the complete type hierarchy.

implicit val p = Project(new java.io.File("OPAL/bi/target/scala-2.13/resource_managed/test/ai.jar"),org.opalj.bytecode.RTJar)

Step 2 - Tell the project that it should make the 3-address code of the method available, when required:

val tac = p.get(ComputeACAIKey)

Step 3 - To get the TAC of a specific method with a body just look it up:

val cf = p.classFile(ObjectType("ai/domain/IntegerValuesFrenzy")).get
val m = cf.findMethod("someSwitch").head
val taCode = tac(m)

In some cases it is desirable to exchange the underlying data-flow analysis that is used as a foundation for creating the TAC. This can easily be done by updating the project's AIDomainFactoryKey. The latter is used when computing the TAC and can be changed before the key (ComputeTACAIKey) is passed to the project (<Project>.get(ComputeTACAIKey)). E.g., to set the domain to the simplest/most basic supported domain, you can use:

p.updateProjectInformationKeyInitializationData(AIDomainFactoryKey) {
  case None               => Set(classOf[domain.l0.BaseDomainWithDefUse[_]])
  case Some(requirements) => requirements + classOf[domain.l0.BaseDomainWithDefUse[_]]
}

For example, creating the TAC using the simplest domain – and a production build of OPAL – takes roughly one third of the time of using the default domain. However, the default domain provides much more information and is therefore the recommended base analysis. To be more precise, on a Core i7 (4 Cores - Sandy Bridge, 2011) generating the standard TAC for the rt.jar of the JDK 8u121 using the simplest possible domain takes 13 seconds and for the default domain 33 seconds.

Examples

To better understand OPAL's three-address code representation, we will discuss some examples.

You can generate the 3-address code on your own using the steps discussed above.


Loops and Exceptions

Given the following very simple loop implemented in Java.

public static void endless() {
    while (true) {
        System.out.println(System.nanoTime());
    }
}

The three-address code will be:

static void endless(){
     // <start>, 3 ->
     0:/*pc=0:*/ lv0 = java.lang.System.out
     1:/*pc=3:*/ lv1 = java.lang.System.nanoTime()
     // ! - potentially uncaught exception/abnormal return

     // 1 ->
     2:/*pc=6:*/ {lv0}/*java.io.PrintStream*/.println({lv1})
     // ! - potentially uncaught exception/abnormal return

     // 2 ->
     3:/*pc=9:*/ goto 0
}

As stated above, the def-use chain and the control-flow graph is directly made available. For example, the assignment statement with index 0 initializes the local variable lv0 with the value of System.out. This local variable is then used by statement 2 as the receiver object of the println call which is given the value of the local variable 1 (lv1).

Given that the underlying analysis by default only uses intra-procedural information, it cannot determine whether the called methods potentially throw any exceptions or not and therefore has to assume that they potentially do. In the above code, the basic block boundaries of the underlying code are determined by empty lines.


On The Origin Of Values

Now, let's assume that the target stream is determined by a parameter:

public static void endless(boolean toErr) {
    PrintStream out = toErr ? System.err : System.out;
    while (true) {
        out.println(System.nanoTime());
    }
}

In this case the initial three-address code will be:

static void endless(boolean){
    /* PARAMETERS: param1: useSites={0} (origin=-2) */

    0:/*pc=1:*/ if({param1} == 0) goto 3

    // 0 ->
    1:/*pc=4:*/ lv1 = java.lang.System.err
    2:/*pc=7:*/ goto 4

    // 0 ->
    3:/*pc=10:*/ lv3 = java.lang.System.out

    // 2, 3 ->
    4:/*pc=13:*/ ; // <= NOP can be optimized away

    // 7, 4 ->
    5:/*pc=15:*/ lv5 = java.lang.System.nanoTime()
    // ! - potentially uncaught exception/abnormal return

    // 5 ->
    6:/*pc=18:*/ {lv1, lv3}/*java.io.PrintStream*/.println({lv5})
    // ! - potentially uncaught exception/abnormal return

    // 6 ->
    7:/*pc=21:*/ goto 5
}

In the above example, the static method endless defines, e.g., a parameter which is immediately used by the first statement with index 0 (useSites); the parameter it not used any further - useSites for param1 only contains one value. Def/use information is always directly available at a local-variable initialization (DVar) or usage site (UVar).

Parameters of methods always get origins in the range [-2-method.parametersCount..-2]. This way a trivial check (-512 < origin < 0) is sufficient to determine that a parameter is used. Furthermore, the origin -1 is reserved for this; if the method is an instance method. For example, a method with the parameters (Object o, int i, double d, Float[] fs) will have the origins: o -> -2, i -> -3, d -> -4 and fs -> -5 independent of the method being static or not. By mapping the explicitly declared parameters as described, an analysis can handle static and instance methods similarily.

Furthermore, given that the def-use information is explicitly reified, the information that the receiver object of the println call (statement 6) is either lv1 or lv3, i.e., System.out or System.err, is directly available.


Getting Type Information (Type Checks and Type Casts)

The following demonstrates how to get advanced type information about a specific value. Given the following code:

 package ai;
 class MethodsWithTypeChecks {
    public static FileNotFoundException requiredCast(Cloneable o) {
        if(!(o instanceof FileNotFoundException)) return null;
        return (FileNotFoundException)o;
    }
 }

We can get the 3-address code (using, e.g., the sbt console) as follows (the method is part of OPAL's test suite):

import org.opalj._
val p = br.analyses.Project(new java.io.File("OPAL/bi/target/scala-2.13/resource_managed/test/ai.jar"))
val tacAIKey = p.get(tac.ComputeTACAIKey)
val code = tacAIKey(p.classFile(br.ObjectType("ai/MethodsWithTypeChecks")).get.findMethod("requiredCast").head)

The 3-address code of the above method is shown next:

static java.io.FileNotFoundException requiredCast(java.lang.Cloneable){
     /* PARAMETERS: param1: useSites={0,4,5} (origin=-2) */

     0:/*pc=1:*/ lv0 = {param1} instanceof java.io.FileNotFoundException
     1:/*pc=4:*/ if({lv0} != 0) goto 4

     // 1 ->
     2:/*pc=7:*/ lv2 = null
     3:/*pc=8:*/ return {lv2}

     // 1 ->
     4:/*pc=10:*/ (java.io.FileNotFoundException) {param1}
     // ! - potentially uncaught exception/abnormal return

     // 4 ->
     5:/*pc=13:*/ return {param1}
}

Given the method's three-address code, we can now get the definition sites and the upper type bound of the method's return values as follows:

  1. Get all return statements using the CFG (the implicit cast _.asBasicBlock is safe, because a CatchNode cannot be a predecessor of an ExitNode):

     val returnStmts = code.cfg.normalReturnNode.predecessors.iterator.map(bb => code.stmts(bb.asBasicBlock.endPC))
    
  2. Use simple pattern matching to get the defintion sites and the value information.

     for { tac.ReturnValue(pc,tac.UVar(v,defSites)) <- returnStmts} {
         println(defSites.mkString(", ") + " => "+ v.asDomainReferenceValue)
     }
    

In this case, the def-site is 2 for the first return statement (index: 3) and -2 for the last return statement; i.e., in the latter case the value of the first (formal/explicitly declared) parameter is returned, but now the type is FileNotFoundException with Cloneable; i.e., if the method returns successfully (statement 5), then the returned value inherits from java.io.FileNotFoundException and also implements the marker interface Cloneable. Note that the precise type information that is available is determined by the underlying data-flow analysis; however, type information at the described level is generally available.


Summary

  • The definition sites of the (explicitly declared) parameters of a method have the ids [-2...-2-#Parameters].
  • A use-site always references the statement where the variable is initialized. Hence, use-sites are always in the range [0..index of the last instruction]
  • The standard TAC AST as well as all standard optimizations/transformations keep the AST flat; i.e., nested expressions are always ValueExpr.
  • Code that was identified as dead by the underlying analysis is stripped away.
  • The initial transformation from bytecode to 3-address code inherently performs advanced dead-code removal; however, it does not perform changes which require structural changes of the CFG therefore, some NOP instructions may be found in the code. The pcs of those instructions are computed by negating the original pc and subtracting 1 (-pc-1).
  • pc directly references the program counter/index into the instruction array of the original bytecode instruction. The pc can generally be used to lookup line number etc. information in the original code attributes. (Recall that negative pcs have to be transformed first if necessary.)