Skip to content

Instantly share code, notes, and snippets.

@rikkimax
Last active October 27, 2023 16:52
Show Gist options
  • Select an option

  • Save rikkimax/c319135ced982ae17a9082bde597e8d4 to your computer and use it in GitHub Desktop.

Select an option

Save rikkimax/c319135ced982ae17a9082bde597e8d4 to your computer and use it in GitHub Desktop.

Export, Symbols & Shared Libraries

Literature Review

LNK4217 https://learn.microsoft.com/en-us/cpp/error-messages/tool-errors/linker-tools-warning-lnk4217?view=msvc-170

LNK4286 https://learn.microsoft.com/en-us/cpp/error-messages/tool-errors/linker-tools-warning-lnk4286?view=msvc-170

https://learn.microsoft.com/en-us/cpp/build/reference/dot-obj-files-as-linker-input?view=msvc-170

https://learn.microsoft.com/en-us/cpp/cpp/dllexport-dllimport?view=msvc-170

https://wiki.dlang.org/DIP45

https://issues.dlang.org/show_bug.cgi?id=9816

Model

Picking a model to base D's symbol upon can very easily cause both language level problems and make porting harder. Intuitively you want to start with the easiest solution and then scale to the harder more complicated problems. But this approach while usable hasn't worked for D. Instead this model is based upon what is needed for Windows which in practice is quite involved. This is a good position to work from as it will translate directly to other platforms without having to describe new concepts.

In this model we will describe a D symbol as being in one of three different states during any given compiler instance. These are DllImport, DllExport, and Internal. In a given binary you may associate multiple linker symbols to a given D symbol. However the DllExport symbol may not be accessible from D code and should be considered a platform specific implementation detail. The DllImport and Internal symbol states should be considered to be the same symbol during program execution.

Syntax

There are three sets of changes to the syntax of the export attribute. First it is removed as a visiblity modifier, without this change you would have to expose internal details to be able to export an internal symbol for access in a templated function. Next we readd it as a linkage attribute, this is because exportation is a linker concept. Lastly we add a second instance of export as a linkage attribute, but this time include an identifier parameter which will be a version that when active will treat the symbol as internal rather than DllImport.

VisibilityAttribute:
- export

LinkageAttribute:
+ export
+ export ( Identifier )

Two new version prefixes are being reserved. The first Have_ is used by build managers to instruct that a given unit of code is available as either the source being compiled or available via includes. The second prefix Compiling_ has two use cases; build managers may use it similiary to Have_ except it only applies to code going into the currently compiled binary. The second use case is by compilers, the versions: Compiling_Libc, Compiling_DRuntime, Compiling_Phobos are reserved for use by the compiler to indicate that a given library is statically being linked into the given binary.

Identifiers

For a given targets system C compiler, a compliant D compiler must meet or exceed the allowed identifiers of the C compiler as long as the system C compiler is compliant with C23 or above. If a given platform toolchain does not support Unicode in identifier names, the compiler must match the encoding that the system C compiler uses to support it; if it does not it is implementation defined.

The C23 standard specifies identifiers using the Unicode Technical Report 31. This requires the use of the Unicode database and therefore the character set is Unicode and not ISO 10646:2020 which is derived from it.

Some changes to the existing D support include removal of emoji support and zero-width joiners and non-joiners.

As part of TR31 some tailoring takes place to complete it.

UAX31-R1

The grammar for an identifier is as follows:

IdentifierStart:
-   Letter
-   UniversalAlpha
+   XID_Start
IdentifierChar:
-   0
-   NonZeroDigit
+   XID_Continue

Both XID_Start and XID_Continue ranges can be found in DerivedCoreProperties.txt.

UAX31-R1a

No format characters are allowed.

UAX31-R1b

Identifiers are not stable and can break in future versions.

UAX31-R2

Identifiers are not immutable, more characters may be allowed in the future.

UAX31-R3

D does not have patterns as a token.

UAX31-R4

All D identifiers must be normalized using NFC.

The compiler is allowed to normalize (slow) or warn when a non NFC identifier is detected (fast).

UAX31-R5

D is case sensitive, so case folding only applies to spell checker that is an implementation detail.

UAX31-R6 & UAX31-R7

No filtering is used.

UAX31-R8

We don't use hash tags as a token.

Command line flags

The compiler if able to should offer configuring default visibility between language defaults, export everything, hide everything and if a symbol is exposed external to a module based upon visibility attributes (public, protected, package would be exported and private would be hidden). It should also provide a verbosity flag to dump all symbols that it altered. The purpose of this is to audit what code changes are required to remove the visibility modifier flag. When a symbol has its symbol mode altered by this flag, it should also affect any D interface file generators as to add export to all non-templated scopes.

Depending upon target, the compile should offer to link both statically and dynamically against libc. If static it will add Compiling_Libc version. The default should be configurable on a per target basis.

The default for linking against druntime/Phobos must be dynamically on all capable targets (subject to future evaluation on a per target basis). This may be configured at command line and in a configuration file per target. The RPATH value on Posix systems by default need not be configured differently than lookup in local directory and fixed paths that were given on the command line, any different behavior can be modified after linking.

To pair with the visibility override switch a compiler should offer a dllimport flag for configuring the DllImport status of symbols, it should also provide a verbosity flag to dump all symbols that it altered. The purpose of this is to audit what code changes are required to remove the dllimport flag. The dllimport flag will allow setting the symbol mode for non-compiling symbols to DllImport. This will be available in the following options: none, druntime, phobos, druntime/phobos, externalOnly, all. When in externalOnly mode it will set all symbols found in a module external to binary (see -extI) to be DllImport this is for when you override visibility to export all.

For any given combination of flags relating to target and linking, a verbosity flag to allow the compiler to dump a list of all shared library dependencies that are distributed with the compiler needs to be given. The purpose of this is to allow build managers to copy a libc, druntime, phobos and any other binary artifacts (like curl) that is required for distribution of a compiled binary. This is important because you cannot know what binaries are active given a set of flags when it comes to custom builds for targets not of the host system.

When building an object file, the default name must be the module name with the package and followed by the platform specific extension of an object file. A flag may be offered to only use the module name without the package (as is currently the default in dmd).

External Import Search Path Switch

A new import search path switch is to be added, this will be similar to the existing -I except it will flag modules loaded from it as being external to the binary. This switch -extI will add to all non-templated symbols and into all non-templated scopes extern attribute.

This switch has benefits over other designs, it is easy for both users and build managers to provide to the compiler. It will encourage the ability to write a D file once and not have to have multiple .di files that represent it which are marked both with and without export and extern when specifying import paths to a codebase that is able to be built as both a shared library and a static library.

When a module is external to the binary the compiler can generate error messages when trying to access a symbol that was not exported, this replaces a linker error message which will not be very understandable. Other behavior may be enabled when a symbol is external to the binary, such as ModuleInfo eliding.

Furthermore this switch replaces almost all use cases for the dllimport override switch. It should not cause any of the problems that the dllimport switch would cause.

Semantics

Every symbol is in one of three modes, being imported from outside binary, exporting outside of binary, only accessed within binary.

A symbol that is not marked as export cannot be accessed outside of the binary (shared library/dll/executable) directly. The command line flag can override this behavior to export everything by default.

The mode that a symbol is in depends on a few factors:

  1. DllExport: is marked export, not marked extern and is compiling (not passed in via -I or is templated)
  2. DllImport: is marked export
    1. Without version identifier and it is not being compiled (passed in via -I)
    2. With version identifier and that identifier was not enabled at the time of the symbol
    3. Is marked extern
  3. Internal: if no other conditions are meet.

If a struct, class or union includes members that are exported, any generated symbol must also be exported. ModuleInfo must be exported.

If export is on a struct, class, union, or template all members including generated symbols are exported.

Nested declarations inside of functions do not inherit export, they are always internally accessed unless explicitly specificed to be exported. Command line flags do not override this behavior.

If a thread local variable (TLS) is exported, it may require a wrapper function to retrieve its pointer depending on the target.

Given an inlineable function inside an out of binary module, if its body refers to a non-inlined function or a symbol that is not exported, its must not be inlined.

Notable Changes

Currently a function if marked export and without a body is in DllImport mode. After this proposal, it will be in internal mode. This is of note due to bindings to shared libraries such as system libraries like Windows API.

This change will not cause breakage. The Microsoft (and any compatible linker like LLD), will generate a wrapper function that is one instruction big. It includes a call instruction where it does the dereferencing to convert from the internal symbol to the DllImport symbol.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment