Skip to content

Instantly share code, notes, and snippets.

@alexander-hanel
Last active September 29, 2023 03:31
Show Gist options
  • Save alexander-hanel/832011d8124be83e932b0f3afa9b78ec to your computer and use it in GitHub Desktop.
Save alexander-hanel/832011d8124be83e932b0f3afa9b78ec to your computer and use it in GitHub Desktop.
IL - Overview

Stages

image

Source

1. Machine Code

  • disassemble (x86, ARM, MIPS, etc)
  • disassembler (capstone, etc)

2. Low Level IL

IL: An intermediate language is the language of an abstract machine designed to aid in the analysis of computer programs. source

Tail Call Resolution

Unknown: It is likely a form of anaylsis to identify tail call function or tail call optimization. In computer science, a tail call is a subroutine call performed as the final action of a procedure. Google produces no results for "Tail Call Resolution".

Stack Pointer Analysis

Likely a subset of Pointer Analysis

The following notes are from Detecting Bugs Using Decompilation and Data Flow Analysis [PDF] [video].

  • Stack pointer analysis can be used to identify local variables based on stack pointer inference, access to memory offset to the stack
  • Procedure parameters and arguments can be recovered based off offsets relative to ESP/EBP for identifying local or arguments

The following notes are based off of "Putting Pointer Analysis to Work Rakesh Ghiya and Laurie J. Hendren" thesis. The paper deals with stack and heap based pointers but only quotes related to the stack were pasted below.

  • Stack directed pointers point to information that can be found by enumerating read/write instructions.
  • Stack directed pointers: points to stack objects.
  • Resolve all pointer relationships on the stack using points-to analysis.
    • Note: Points-to analysis is a compile-time technique that helps identify relationships between pointer variables and the memory locations that they point to during program execution.
  • Stack-directed pointers exhibit the important property that their targets always possess a name. This is because all data objects allocated on the stack have compiletime names.

Constant Propagation

Constant propagation is the process of substituting the values of known constants in expressions at compile time. Such constants include those defined above, as well as intrinsic functions applied to constant values. Constant propagation is implemented in compilers using reaching definition analysis results [Source].

The following notes are from Constant Propagation with Conditional Branches.

A reaching definition for a given instruction is an earlier instruction whose target variable can reach (be assigned to) the given one without an intervening assignment. The goal of constant propagation is to discover values that are constant on all possible executions of a program and to propagate these constant values as far forward through the program as possible.

Constant propagation techniques serve several purposes in optimizing compilers:

  • Expressions evaluated at compile time need not be evaluated at execution time. If such expressions are inside loops, a single evaluation at compile time can save many evaluations at execution time.
  • Code that is never executed can be deleted. Unreachable code (a form of dead code) is discovered by identifying conditional branches that always take one of the possible branch paths.
  • Detection of paths never taken simplifies the control flow of a program. The simplified control structure can aid the transformation of the program into a form suitable for vector processing and or parallel processing
  • Since many of the parameters to procedures are constants, using constant propagation with procedure integration can avoid the expansion of code that often results from naive implementations of procedure integration.
  • Constant propagation can be done over a variety of domains, for example, over the type fields of values.

Reads

Flag Resolution

Unknown: Likely a form of analysis that identifes instructions that sets a register flag or evaluate a register flag.

3. Medium IL

Arithmetic Expression Folding

Likely a sub-task of constant folding

Constant folding is the process of recognizing and evaluating constant expressions at compile time rather than computing them at runtime. Terms in constant expressions are typically simple literals, such as the integer literal 2, but they may also be variables whose values are known at compile time.

Constant folding is an optimization technique that eliminates expressions that calculate a value that can already be determined before code execution. These are typically calculations that only reference constant values or expressions that reference variables whose values are constant.

source

An example of constant folding would be the following on (120 − 15 · 8) · x.

A constant folding algorithm traverses the tree describing this expression, computes and merges nodes with known values and replaces the entire expression with an equivalent but simpler one, in this case the literal 0. Figure 3.6 shows this process on the above example

Step 0.

    stateDiagram-v2
        MUL --> sub
        MUL --> x
        sub --> 120
        sub --> mul
        mul --> 15
        mul --> 8

Loading

Step 1

    stateDiagram-v2
        MUL --> SUB
        MUL --> x
        SUB --> 120
        SUB --> 120. 
Loading

Step 2

    stateDiagram-v2
        MUL --> 0
        MUL --> x
Loading

Step 3

    stateDiagram-v2
        0
Loading

source

Variable Abstraction

Uknown.

This technique replaces a “detailed” variable in the original specification (that is, a variable with a large, sometimes infinite, range of values) with a more abstract variable.

source

Type Inference/Propagation

Type inference is the ability to automatically deduce, either partially or fully, the type of an expression at compile time. The compiler is often able to infer the type of a variable or the type signature of a function, without explicit type annotations having been given.

source

Dead Store Elimination

Dead Code Elimination is an optimization that removes code which does not affect the program results.

source

CS 6120: Lesson 3: Local Optimization & Dead Code Elimination

Value Set Anaysis

Calling Convention Detection

Function Parameter Detection

Control Flow Recovery

4. High Level IL

Variable Merging

Control Flow Structures

Full Expression Folding

Dead Code Elimination

Decompilation References

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment