In an attempt to work around other bugs or to improve behavior, it may come across as a wonderful idea to have variables be uninitialized, and to initialize them in separate stages.
The following code originates from a shared library by the author, in an attempt to work around a bug where the default initialization symbols was not crossing the shared library threshold.
MyType variable = void;
variable.tupelof[2 .. $] = MyType.init.tupleof[2 .. $];
There is an obvious solution to this problem, use a method called defaults
for MyType
that is not templated to initialize the variable.
However, could any code like the above that does not do a full initialization of all fields be appropriete for a D programmer to write? No, it is unsafe and is clearly going to result in bugs.
All variable declarations put the variable into the type state reachable
. This enables writing to it, but not reading. For @trusted
code what we are looking for is the same guarantees as @safe
, in that doing a partial initialization either via .tupleof
or by field should cause an error.
Error: example.d:2 Variable `variable` is in type state reachable, but statement would partially initialization and skip fields making it remain in reachable type state post initialization.
For @system
code, keeping track of which fields have been initialized would be an appropriete solution to enable partial initialization and once all fields have been initialized move into type state initialized. This is sometimes used when working with Windows API.
D has been defined against type state analysis from the earliest days of development. An example of this is an error when you skip variable declarations using goto
.
For a forwards goto
the check is quite simply, we do a reverse iteration over the AST to find the goto
statement from the label. If a variable declaration is found and error is emitted as that variable was skipped.
In a backwards goto instead of checking, we will destroy any variables that are no longer reachable. This is a good reliable method for handling this scenario.
import std.stdio : writeln;
import std.random : uniform01;
void main()
{
Label:
SomethingWithSideEffects var;
if (uniform01() > 0.5)
goto End;
goto Label;
End:
writeln(var);
}
struct SomethingWithSideEffects {
~this() {
writeln("Make your D-Man cry :'(");
}
}
Will be transformed into:
void main()
{
Label:
SomethingWithSideEffects var = 0;
try
{
if (uniform01() > 0.5)
goto End;
goto Label;
End:
writeln(var);
}
finally
var.~this();
return 0;
}
Instead of a dedicated check for a forwards goto
, we can remove it in favor of a general use semantic pass that is used by multiple language features. This is the Data Flow Analysis (DFA) that is proposed by this DIP.
goto Label;
int variable;
Label:
const variable2 = variable;
Should error with:
Error: example.d:4 Variable `variable` is in type state unreachable, but statement would load it into ``variable2`` during initialization.
When a forword goto
is seen, the current context is split into two. The first copy continues on as normal. The second continues on by performing a jump to the label which acts as a point of convergence. This will catch any differences between these two pathways.
At a point of convergence when multiple contexts have been processed, the convergence process is undertaken which unifies them into a single context before continuing to the next point of convergence such as end of scope.
When converging the goal is take the lowest type state of each variable to unify into a single context state for all variables.
The following is an example of this problem, where D has been defined against type state analysis without actually having type state analysis implemented. It is a known bug.
import std.stdio : writeln;
void main(string[] args) {
if (args.length == 1)
goto Fail;
Success:
int var = 22;
writeln("success ", var);
return;
Fail:
writeln("fail ", var); // fail 1061300560
}
An existing DFA in D is for @live
a singular function borrow checker that lacks guarantees cross function. It is a single use DFA that cannot be applied to other memory analysis language features. This is a problem, this proposal focusses on only one aspect of, replacing its DFA to get the most out of a single set of costs.
The DFA as used by @live
is a simplified Intermediate Representation (IR) to define a sequential list with control structures. It utilizes tree walkers for expressions within a function scope. This is a simplified and highly specific solution to the DFA required for a borrow checker which would be described as a subset of of a Control Flow Graph (CFG).
By utilizing a CFG, @live
matches other borrow checker designs in premise, but it does so with only the ability of a forward pass to construct state or to proof it. For this reason it is not the basis of the DFA being proposed here. Instead it is an example of what could be rewitten to use the new DFA by taking advantage of its IR along with its value tracking state capability which replaces its own specific solution.
Each variable gains a type state.
The predefined type states are:
- Unreachable: cannot be read or written to
- Reachable: may be written to but not read
- Initialized: may be read or written to
- Default initialized: has a value that is the default for its type (
.init
) - Non-null: the type is a pointer and is not null
The order of the type states is: unreachable < reachable < initialized < default-initialized < non-null < user-defined
.
When two or more type states are seen during analysis the lowest value is chosen.
User defined type states are ordered based upon definition:
typestate {
empty = initialized,
nonempty => !empty,
}
In the above empty
is below nonempty
but above non-null
.
The first syntax change is to introduce the type state declaration syntax for structs, classes and unions. Each expression must resolve to a boolean expression involving the this pointer.
struct Foo {
typestate {
First = initialized,
Second => !method
}
bool method();
}
DeclDef:
+ TypeStateDeclaration
+ TypeStateDeclaration:
+ typestate { TypeStateSpecifiations|opt }
+ TypeStateSpecifications:
+ TypeStateSpecifications , TypeStateSpecification
+ TypeStateSpecification ,|opt
+ TypeStateSpecification:
+ Identifier = Identifier
+ Identifier => Expression
The second change is to introduce a method for specifying a known type state for a given parameter. This will always be the first function parameter attribute.
void func(?nonnull Object input);
ParameterAttributes:
+ TypeStateParameterAttribute
TypeStateParameterAttribute:
+ ? Identifier , Identifier
+ ? Identifier
The third change is to enable the this pointer and return value to be annotated as part of the parameter list.
class AnObject {
AnotherObject method(?superduperstate,nonnull this, ?unknown return) {
}
}
The first identifier is used to specify opening type state, the second for after call. If the second is missing it defaults to the former. For return values only one may be specified. These two parameters may go in any place, in any order, they are ignored as they are not truely function parameters.
Parameter:
+ TypeStateParameterAttribute this
+ TypeStateParameterAttribute return
Unfortunately this has a conflict as return
is also in ParameterStorageClass
. This is why TypeStateParameterAttribute
is always the first attribute, it allows differentiating between them.
Is expressions are expanded to check what the type state of a given variable is at that point in time.
IdentityExpression:
+ ShiftExpression is typestate Identifier
+ ShiftExpression is typestate < Identifier
+ ShiftExpression is typestate > Identifier
+ ShiftExpression is typestate <= Identifier
+ ShiftExpression is typestate >= Identifier
Lastly finate control over accepting type states that allow exact comparisons rather than >=
only. The following example shows a function whose parameter obj1
can only accept arguments that are in non-null
type state and no others, while the second parameter obj2
can accept any user defined type states but no others.
void func(Object obj1, Object obj2)
@typestate(obj1 == nonnull ; obj2 > nonnull) {
}
AtAttribute:
+ @ typestate ( TypeStateAttributeComparisons )
+ TypeStateAttributeComparisons:
+ TypeStateAttributeComparisons ; TypeStateAttributeComparison
+ TypeStateAttributeComparison
+ TypeStateAttributeComparison:
+ Identifier == Identifier
+ Identifier < Identifier
+ Identifier > Identifier
+ Identifier <= Identifier
+ Identifier >= Identifier
As a function executes, each variable changes its type state. This is analyzed with the help of Data Flow Analysis (DFA).
The primary concern of this DFA is looking at loads, stores and any action that alters a type state of a variable by a predefined transformation.
When a variable declaration is seen the type state changes from unreachable to reachable. In the following example the variable var
is accessed by two different function calls that are contained from different labels. However only one was ever initialized (and is therefore reachable).
import std.stdio : writeln;
void main(string[] args) {
if (args.length == 1)
goto Fail;
Success:
int var = 22;
// var is typestate initialized
writeln("success ", var);
return;
Fail: // var: [unreachable, initialized] = unreachable < initialized
writeln("fail ", var); // Error: variable `var` is in type state unrecahable but is being read.
}
The above code is an example to match an existing bug wherein a goto
can skip a variable declaration when more than one label is in play.
The first expression defines what the input type state of variables that are not informed by their initializer or explicitly set.
This is a straight forward forward pass to find an expression that performs an operation on each function parameter or this pointer.
func(new Object);
func(null); // Error: parameter `obj` must have a minimum type state of non-null
void func(/* ?nonnull */ Object obj) { // The following statement gives obj its type state
obj.method(); // obj: typestate >= non-null
}
All function parameters including this pointer and return values have a minimum type state of initialized
.
Object acquire(bool condition) {
return condition ? new Object : null;
}
Both the return value an function parameter have type states of initialized. You do not need to annotate any code.
This works for the this pointer as well, in the following two methods the this pointer for increment
has a type state of non-null
whereas doNothing
remains at initialized
, as it does not touch the this pointer.
class MyClass {
int field;
void increment() {
this.field++;
}
void doNothing() {
}
}
Of course if it was synchronized
that would place a load to the monitor object and would result in non-null
as well.
Previously infering of input type states for function parameters was shown, in doing so it allowed a type state to be determined that is required. However this type state might be lower than possibly required but a type state could still be determine quickly and easily that requires minimal assumptions.
Unfortunately to infer output type states is a lot tricker and would require running the entire DFA during semantic three to enable modifying the type system. This is not acceptable.
So if we want to infer outputs type states we must limit ourselves to what we can do.
- It is unlikely that we will know what the type state is if its user-defined, but we will be able to determine if its a predefined one.
- As long as a variable is not taken by pointer to a function call (
ref
,out
,&var
), any type state that is predefined will not change. - Loops and goto's are out. They alter code flow and require multiple contexts.
This would need to work in reverse iteration from each return, looking for statements that indicate a higher type state than initialized
and the prevention of lower than the currently assumed one.
Input states are known for function parameters and the this pointer, this can resolve output states.
Object method(/* ?nonnull return */, /* ?nonnull,nonnull this */) {
return this;
}
Simpler statements like if statements may be used, with each statement getting a set of values on the compiler's stack (fixed number of contexts) can be evaluated to find the least type state.
Object func(bool condition, /* ?initialized return */) {
Object obj;
if (condition) {
obj = new Object;
// obj: nonnull
} else {
// obj: initialized
} // obj: [initialized, nonnull]
// obj: initialized
return obj;
}
Function calls can also work. If cyclic dependencies are not resolved through some other method, its always possible of resolving them by defaulting back to initialized
.
Object func(/* ?somesuperduperstate return */) {
return (new Object()).method();
}
Object accepter(/* ?somesuperduperstate return */) {
Object obj = func();
return obj;
}
But if any method is called, or function takes it in, that user defined type state must be devolved to non-null
. The resulting type state is the highest provable predefined type state below the user defined type state.
Object func(/* ?somesuperduperstate return */) {
return (new Object()).method();
}
void anotherFunction(Object obj) {
obj.method();
}
Object accepter(/* ?nonnull return */) {
Object obj = func();
anotherFunction(obj);
return obj;
}
As for ref
and out
, there is no way to know what the type state is beyond initialized
unless it has been annotated explicitly or inferred.
void someFunction(ref Object obj) {
}
Object another() {
Object obj = new Object;
someFunction(obj);
return obj;
}
It may appear that the returned object from another
would be non-null
but this wouldn't be the case. Inferring of input would result the parameter for someFunction
to be initialized
not non-null
. Even though that parameter is unused.
On the other hand if it was to be used, then it will be non-null
.
void someFunction(/* ?nonnull,nonnull */ ref Object obj) {
obj.method();
}