In recent years, the D programming language has gained more and more attention and existing C and C++ codebases are starting to incrementally integrate D components.
In order to be able to use D components, a C or C++ interface to them must be provided; in C and C++, this is done through header files. Currently, this process is entirely manual, with the responsibility of writing a header file falling on shoulders of the programmer. The larger the D portion of a codebase is, the more tedious the task becomes: the best example being the DMD frontend which amounts to roughly ~310000 lines of code for which the C++ header files that are used by other backend implementations (GDC, LDC) are manually managed. This is a repetitive, time consuming, and rather boring task: this is the perfect job for a machine.
The deliverable of the project is a tool that automatically generates C and C++ header files from D module files. This can be achieved either by a library solution using DMD as a Library, or by adding this feature in the DMD frontend through a compiler switch.
The advantage of using DMD as a Library is that this wouldn’t increase the complexity of compiler frontend codebase. The disadvantage will be that the user will be required to install a third-party tool. Contrasting to this, the addition of the feature to the frontend would result in a smoother integration with all the backends that use the DMD frontend.
We have decided to go with the compiler switch approach.
One major milestone (and success marker) for the project is to automatically generate the DMD frontend headers required by GDC/LDC.
The feature will require the implementation of a Visitor
class that will traverse
the AST
resulted after the parsing phase of the D code. For each top-level Dsymbol
(variable, function, struct, class etc.) the associated C++ correspondent will be written in
the header file.
The visitor will override the visiting methods of two types of nodes:
- Traversal nodes - these nodes simply implement the
AST
traversal logic:ModuleDeclaration
,ScopeDeclaration
, etc. - Output nodes - these nodes will implement the actual header generation logic:
FuncDeclaration
,StructDeclaration
,VarDeclaration
, etc.
The header file will consist of declarations from public extern (C++)
and public extern (C)
declarations/definitions from D modules.
I've started work with the revival of DMD's PR 8591, rebasing it and converting it into a compiler switch.
The next step was to add tests for the existing code.
Test description | Link to commit |
---|---|
Test enum declarations | link |
Test free functions | link |
Test variable declarations | link |
Test alias declarations | link |
Test struct declarations | link |
Test class declarations | link |
Test template declarations | link |
The tests revealed the following issues
-
StructDeclaration:
-
ClassDeclaration:
- align(n) does nothing. You can use align on classes in C++, though It is generally regarded as bad practice and should be avoided
-
FuncDeclaration:
- default arguments can be any valid D code, including a lambda function or a complex expression; we don't want to go down the path of generating C or C++ code, so for now default arguments get ignored.
-
TemplateDeclaration:
- templates imply code generation, so for now we don't support them
After writing the tests and understanding what are the issues and fixing the blocking ones, I got more comfortable with the codebase and I got on to the next step: generating the DMD frontend header files from DMD's *.d
frontend modules.
This took quite some time and sweat to get going: the major pain point here is given by templates. There is dmd/root/array.d
which has a templated Array(T)
that is used throughout the codebase. Since we don't support templates, we decided to keep the manual management of the dmd/root/*.h
headers, but things aren't that simple.
The issue is that while we don't explicitly pass in any of the dmd/root/*.d
modules, some of them are processed during the semantic analysis phase, which will generate the definition of some struct
s and enum
s from dmd/root/*.d
into the generated frontend header. When the generated header is used in conjunction with the manually managed header files from dmd/root/*.h
a struct
/enum
re-definition error will be thrown by the compiler.
I kept scratching my head at how to avoid this, and in the end I went with explicitly ignoring anything that comes from a dmd/root/*.d
module. Ideally, this special casing shouldn't be needed, and it should go away if we can add support for some simple D -> C++ templates.
At this point (roughly 8 weeks after GSoC had started) we were pretty confident with the project structure and behaviour, and we decided to tackle the final milestone: use the header generator to generate the frontend headers required by the GNU D Compiler (GDC), in order to replace the manually managed header files with an auto-generated one. While working on this I've encountered some challenges that I will detail bellow.
After scratching my head for a couple of days at a bug, I realised that the header generator was not taking into account the base type of an enum
.
So given the following example code
enum TOK : ubyte { /* ... */ }
class Expression
{
TOK op;
/* ... */
}
The enum TOK
above gets generated as
enum TOK { /* ... */ }
According to the C standard, the compilers are free to pick the type that can fit the enum and most of them will pick int
as a base type; thus sizeof(TOK) -> 4UL
.
As you can see, this is a problem as the D object files will consider TOK
to be one byte and the C object files will consider it to be four bytes.
The manual header implementation did this clever trick to solve the problem
#typedef unsigned char TOK;
enum
{
TOKmem1,
/* ... */
};
class Expression
{
TOK op;
/* ... */
}
The above takes advantage of the fact that enum member fields are in the global namespace, so the values will exist, and since they can fit in an unsigned char, the code will work through the typedef.
All of the above is required because C++98/03 doesn't have support for enum base types.
At first I thought that the fix should use the same trick to solve the problem, but then a community member suggested I use the -extern-std
compiler flag to drive the header generation. So, if a user used -extern-std=c++98
, code simillar to the one above would be generated; if a user uses c++11
and beyond, the C++ enum class feature would be used. This is done through the commits from here and here.
By design, the C/C++ header generator takes into consideration only struct
ures and class
es that are declared extern (C)
and extern (C++)
.
Throughout the DMD codebase, there were methods that were declared as extern (C++)
, and thus part of the manually managed header files, but the enclosing struct or class wasn't declared as extern (C++)
. This merged commit adds the missing extern (C++)
declarations.
In the DMD codebase, the Dsymbol
class had a member named namespace
.
Because Dsymbol.namespace
is a public extern (C++)
symbol, the C++ header generator will generate the following code
class Dsymbol : public ASTNode
{
public:
/* ... */
CPPNamespaceDeclaration* namespace;
/* ... */
}
As you know, and see from the syntax highlight, namespace
is a C++
keyword, so this generated code won't compile.
This issue was fixed and merged in this commit
The issue here was that D has function covariance, but C++ does not. What this means is that if I have the following D code
extern (C++) class A
{
void foo() {}
}
extern (C++) class B : A
{
override void foo() const {}
}
the generated header will look like
class A
{
virtual void foo();
}
class B : public A
{
void foo() const;
}
Note that B
has void foo() const
Compiling this code with g++ -Woverloaded-virtual
(as gdc does) will result in the following warning
warning: 'virtual void A::foo()' was hidden
by `void B::foo() const`
In the DMD codebase, there were two class hierarchies where this issue was present:
- RootObject
- Type
For the RootObject hierarchy I made the methods toChars
and equals
const
, and this solved the issue.
This is the merged commit for this work.
I attempted to do the same for the Type hierarchy with this commit.
Here the situation wasn't as simple because there are a lot of isAAA
and getAAA
methods that modify the internal
state (lazy initialization and caching reasons) of the object, which means that we can't make all the methods in the
hierarchy const
. Because of this, I attempted in another commit to remove the const
qualifier
from the methods declaration, but I wasn't really happy with this idea (nor were the members of the community as can be seen from the commit comments).
The solution to this debacle came from the suggestion of a community member: make the header generator emit the prototype of the function as declared in the introducing base class. This was done with this commit.
The header generation tool is still in the phase of open PR, but the tool should be ready to use. Until the tool gets merged into master (probably sometime next month), one can use it by checking out the PR branch and building the compiler.
The dmd compiler, through a compiler switch, is generating a C++
header file out of a list of .d
modules passed at compile time.
The simplest form of using the CLI switch is dmd -HC a.d b.d
This will visit the ASTs of modules a
and b
and output a single header file at stdout
.
By using the -HCf=<file-name>
switch, the above result will be written in specified file name. Using -HCd=<path>
will write the file-name
in the specified path
.
So, running dmd -HCf=ab.h -HCd=mypath/ a.d b.d
will write the generated header in mypath/ab.h
, relative to the current directory.
First, we want to finish the integration of the auto-generated header with GDC, as this serves as a test on a big, production ready, project. This will probably take two more weeks.
After the integration with GDC is done, the PR should go through a final round of code review and then it is ready to be merged.
I want to publicly thank my mentors and the community for their help and guidance, thus helping me deliver this project.