This doc captures results from investigating how to build the Arrow R bindings in Windows. The following options were explored:
- Building Arrow in MSys and bindings in RTools.
- Building Arrow and bindings in RTools.
The most promising long-term solution is to write a CMake generator that can be run from RTools; short term, we can continue making progress compiling Arrow from MSys.
This document explains other approaches considered and provides additional details.
- MSys with GCC 8+ and RTools
- MSys and Patched RTools with GCC 8+
- MSys with GCC 4.9
- MSys with GCC 5.3
An earlier project, rarrow, successfully built the Arrow and the R bindings using MSys. This was performed by using MSys and linking with backwards ABI compatibility.
This investigation showed that this approach can be replicated against arrow 0.12 with rarrow and modified versions of the current R bindings.
However, once all the new R bindings functionality was considered, it required non-header-only boost dependencies and other libraries to be linked statically, for instance, it was found that libstdc++ is required and also threading libraries (potentially, winpthread, that is currently under investigation).
It might be possible to make sure that all dependent libraries are statically linked in the MSys build of Arrow. Perhaps, the threading library issue is the last one to resolve, but there might be many more dependencies to resolve. Additionally, this investigation only explored compiling x64 architecture, there might be additional work to support i386.
This investigation was able to compile the current bindings with RTools by patching the toolchain RTools uses (rwinlib-arrow and feature/msys2), as in, if a GCC 8+ compiler could be included in RTools, the bindings link properly against the arrow library and most tests succeed.
However, it is not clear this approach would work in winbuilder nor CRAN since it was not possible to trivially override the default compiler in winbuilder by downloading GCC 8+ on-demand.
If a newer toolchain could be provided in RTools and made available in CRAN, this would be a feasible and straightforward approach.
One approach would be to build using GCC 4.9 in MSys to match RTools’ GCC; however, the MSys package manager (pacman) only supports the latest versions of the toolchain, so one can only install GCC 8 with ease. Some packages are available for manual installation, but they unfortunately only go back to GCC 5+; so in this option, we are left with:
Using an old VM prepared with GCC 8 which is not easy to replicate. Building GCC 4.9 and all dependent libraries from source.
We could potentially compromise Arrow in MSys using the oldest GCC version available (GCC 5.3) which is close to Rtools GCC 4.9 version; however, this investigation shows that the errors from compiling arrow in MSys with GCC 5.3 are similar than the ones with GCC 8+. This might be explained by ABI compatibility changes between 4.9 and 5.3; worth mentioning that the -D_GLIBCXX_USE_CXX11_ABI=0 flags was used across all builds but proved to not be enough.
- Running CMake from RTools.
- Patching Make files for RTools.
- Creating a CMake Generator for RTools.
This investigation attempted to run cmake with the MSys Generator from RTools; however, since RTools is not a full distribution of MSys but rather a custom subset, there are many assumptions (e.g. hardcoded path to ‘/bin/sh’) that are broken while running CMake.
This investigation also attempted to pre-run cmake with MSys and then patching the make files generated. While progress was achieved through this path, there are many makefiles that makes it time-consuming and not very reliable.
The remaining option is to write a custom CMake generator that is compatible with RTools, this seems like a reasonable approach; however, non-trivial since the make files generated provide support to download protobuf, etc. that might need to get customized to work properly while running under RTools. However, this option is the most reliable one since it would allow us to run from RTools without additional dependencies and potentially, from Travis as well. One unknown here is how to build not-only-headers boost with the RTools toolchain.