- disassemble (x86, ARM, MIPS, etc)
- disassembler (capstone, etc)
IL: An intermediate language is the language of an abstract machine designed to aid in the analysis of computer programs. source
Unknown: It is likely a form of anaylsis to identify tail call function or tail call optimization. In computer science, a tail call is a subroutine call performed as the final action of a procedure. Google produces no results for "Tail Call Resolution".
Likely a subset of Pointer Analysis
The following notes are from Detecting Bugs Using Decompilation and Data Flow Analysis [PDF] [video].
- Stack pointer analysis can be used to identify local variables based on stack pointer inference, access to memory offset to the stack
- Procedure parameters and arguments can be recovered based off offsets relative to ESP/EBP for identifying local or arguments
The following notes are based off of "Putting Pointer Analysis to Work Rakesh Ghiya and Laurie J. Hendren" thesis. The paper deals with stack and heap based pointers but only quotes related to the stack were pasted below.
- Stack directed pointers point to information that can be found by enumerating read/write instructions.
- Stack directed pointers: points to stack objects.
- Resolve all pointer relationships on the stack using points-to analysis.
- Note: Points-to analysis is a compile-time technique that helps identify relationships between pointer variables and the memory locations that they point to during program execution.
- Stack-directed pointers exhibit the important property that their targets always possess a name. This is because all data objects allocated on the stack have compiletime names.
Constant propagation is the process of substituting the values of known constants in expressions at compile time. Such constants include those defined above, as well as intrinsic functions applied to constant values. Constant propagation is implemented in compilers using reaching definition analysis results [Source].
The following notes are from Constant Propagation with Conditional Branches.
A reaching definition for a given instruction is an earlier instruction whose target variable can reach (be assigned to) the given one without an intervening assignment. The goal of constant propagation is to discover values that are constant on all possible executions of a program and to propagate these constant values as far forward through the program as possible.
Constant propagation techniques serve several purposes in optimizing compilers:
- Expressions evaluated at compile time need not be evaluated at execution time. If such expressions are inside loops, a single evaluation at compile time can save many evaluations at execution time.
- Code that is never executed can be deleted. Unreachable code (a form of dead code) is discovered by identifying conditional branches that always take one of the possible branch paths.
- Detection of paths never taken simplifies the control flow of a program. The simplified control structure can aid the transformation of the program into a form suitable for vector processing and or parallel processing
- Since many of the parameters to procedures are constants, using constant propagation with procedure integration can avoid the expansion of code that often results from naive implementations of procedure integration.
- Constant propagation can be done over a variety of domains, for example, over the type fields of values.
Reads
Unknown: Likely a form of analysis that identifes instructions that sets a register flag or evaluate a register flag.
Likely a sub-task of constant folding
Constant folding is the process of recognizing and evaluating constant expressions at compile time rather than computing them at runtime. Terms in constant expressions are typically simple literals, such as the integer literal 2, but they may also be variables whose values are known at compile time.
Constant folding is an optimization technique that eliminates expressions that calculate a value that can already be determined before code execution. These are typically calculations that only reference constant values or expressions that reference variables whose values are constant.
An example of constant folding would be the following on (120 − 15 · 8) · x
.
A constant folding algorithm traverses the tree describing this expression, computes and merges nodes with known values and replaces the entire expression with an equivalent but simpler one, in this case the literal 0. Figure 3.6 shows this process on the above example
Step 0.
stateDiagram-v2
MUL --> sub
MUL --> x
sub --> 120
sub --> mul
mul --> 15
mul --> 8
Step 1
stateDiagram-v2
MUL --> SUB
MUL --> x
SUB --> 120
SUB --> 120.
Step 2
stateDiagram-v2
MUL --> 0
MUL --> x
Step 3
stateDiagram-v2
0
Uknown.
This technique replaces a “detailed” variable in the original specification (that is, a variable with a large, sometimes infinite, range of values) with a more abstract variable.
Type inference is the ability to automatically deduce, either partially or fully, the type of an expression at compile time. The compiler is often able to infer the type of a variable or the type signature of a function, without explicit type annotations having been given.
Dead Code Elimination is an optimization that removes code which does not affect the program results.
CS 6120: Lesson 3: Local Optimization & Dead Code Elimination
- https://www.cs.cmu.edu/~fp/courses/15411-f13/lectures/20-decompilation.pdf
- https://yurichev.com/mirrors/vanEmmerik_ssa.pdf
- https://www.backerstreet.com/decompiler/introduction.php
- https://www.lockshaw.io/static/ghidra/decompiler/doc/index.html
- http://compileroptimizations.com/index.html (example code)