Lately I've been working on rInstallFriendly v2.0
, my simple and easy-to-use tool to create fancy software installers.
rInstallFriendly v2.0, some of its multiple visual styles available. Style is fully customizable!
Differently to other installer-creation tools, rInstallFriendly
is designed to make the installation process enjoyable for the users and useful for developers, so, it supports a special banner with multipl eunique features. The banner could be used as an advertising panel, to showcase the product, it could also display a playable game or just show some shinny interactive visuals.
Many users still keep staring at the installer progress bar while waiting for the software to be installed, rInstallFriendly
is intended for those users, to make their first impression of the software an unforgetable experience.
One of the main features of rInstallFriendly
is allowing an interactive banner displayed while the software is being installed, this banner can been a still image, an animated GIF but also an interactive shader or even a small game.
First version of the tool was single-thread, so, to allow the game running while the files were decompressed the solution implemented was splitting the decompression process per frame, just decompressing a number of files every frame and hopefully leaving enough frame time left to execute game logic and draw the frame. Despite this approach could seem in-appropiate, in fact it worked really well, running at 60 frames per second.
But there were some problems with this approach:
- Game stuttering: If files to decompress were slightly big (
>50MB
), some stuttering could be noticed on playable game. - Installation time: Installation time was actually bounded by framerate and the number of files to be installed.
So, for next rInstallFriendly v2.0
I decided to address those issues.
First solution to address the framerate drop was using multi-threading, moving the installation process to a second thread while the main thread keeps drawing the game banner at a stable 60 fps.
I used the excellent thread.h
library for that task, that worked like a charm.
But after moving the decompression of files to a second thread, I got a surprise:
Installation time was not reduced, as I was expecting, actually it was almost the same than previous implementation! 😕
My testbed is the raylib Windows Installer package
, including w64devkit
, Notepad++
and raylib
(library with sources and examples), a total of ~7000 files, mostly small files, packaged into a ~120MB
zip file, using level 8 deflate compression.
My tests hardware: MSI Laptop, CPU: [email protected] (8 logic processors), 16GB RAM, NVMe SSD 1TB
rInstallFriendly
basically decompresses the provided .zip
file, using the miniz
library, so, I started trying other decompression options for comparison:
Windows extractor
: 4min : 02sec.rInstallFriendly
: 1min : 31sec.7-Zip
: 6sec. !!!
I was not surprised by Windows extractor
but I was very surprised by 7-Zip
results, WHY SO FAST?!?! I needed to understand the reason because it looked like magic to me and I like to know magic tricks!
Tried some quick Google search but unfortunately I couldn't find a clear answer so I tried asking in twitter/X, I know many great programmers follow me so I expected someone could provide some answers.
Multiple answers pointed to multi-threading, to use multiple threads for the decompression process but I got my doubts about it, simply because I had already moved decompression to a second thread and numbers were almost the same as original implementation; still I tried to find-out if 7-Zip
was effectively using multi-threading for decompression and if I could force the process to run on a single-thread to verify times, unfortunately I couldn't find that info and 7z.exe
command-line neither seems to support multi-core-decompression config parameter (only support for compression).
Other replies on twitter mentioned the files disk-write cost so, I did a quick test: try decompressing the files in memory but not writting them to disk... and VOILÀ! The decompression of the +7000 files on memory on single-thread only required 3.6 seconds! So, the bottleneck was clearly on files disk-writing!
After further replies and some investigation I found that, on Windows, does exist WriteFile()
and WriteFileEx()
that, as per my understanding, operate at a kernel-level and are faster than libC provided fwrite()
, so, I decided to try that route. WriteFile()
is intended for synchronous writes while WriteFileEx()
is intended for asynchronous file writing! That seemed to be the solution!
As usual with Win32 API, documentation is quite dense and sometimes confusing and it's difficult to find specific examples for specific use cases. Still, I managed to code a quick implementation with WriteFileEx()
for my use case, despite not clearly understanding some of the provided parameters... BIG ERROR!, never do that, all parameters provided to a function should be clearly understood!
This was the solution I quickly implemented:
// Write file result callback?
VOID WINAPI WinWriteFileCallback(DWORD dwErrorCode, DWORD dwBytesTransferred, LPOVERLAPPED lpOverlapped)
{
if (dwErrorCode != 0) printf("WARNING: CompletionRoutine: Unable to write to file! Error: %u, AddrOverlapped: %p\n", dwErrorCode, lpOverlapped);
else printf("CompletionRoutine: Transferred: %u Bytes, AddrOverlapped: %p\n", dwBytesTransferred, lpOverlapped);
}
// Write file to disk
int WinWriteFile(char *filePath, char *buffer, int bufferSize)
{
BOOL errorFlag = FALSE;
OVERLAPPED overlap = { 0 };
HANDLE fileHandle = CreateFileA(filePath, GENERIC_WRITE, 0, NULL, CREATE_ALWAYS, FILE_FLAG_OVERLAPPED, NULL);
if (fileHandle == INVALID_HANDLE_VALUE)
{
printf("WARNING: Could not create file: %s! ERROR: %u\n", filePath, GetLastError());
return -1;
}
errorFlag = WriteFileEx(fileHandle, buffer, bufferSize, &overlap, (LPOVERLAPPED_COMPLETION_ROUTINE)WinWriteFileCallback);
if (errorFlag == 0) printf("WARNING: Unable to write to file! Error %u\n", GetLastError());
CloseHandle(fileHandle);
return 0;
}
On my first test, installation time was reduced to 5.8 seconds! Wow! That was fast! And running on a single thread (despite the file-writing could be happening on multiple underlying threads, due to its asynchronous nature)!
After the initial excitement for the huge time improvement I was about to close Visual Studio and call it a day... when I noticed a "small" issue... I compared the original package with the installed one and saw some files (1-2 files) were not correctly installed, they were created with a size of 0 bytes.
My first reaction was fast: it must be related to re-using the same memory buffer for decompression of every file, considering the async nature of WriteFileEx()
, I was modifying the data to be written for every file decompressed before the actual writing was happening (or that was my thought)! So, just did a quick test to verify it: using one separate buffer for every file to decompress... and at that point program started randomly crashing!
I hate these situations were you seem to be so close to a great solution but some apparently "simple" thing starts not working as expected and everything crashes... Long story short, after several hours of investigation and debugging I realized that some compressed files in the .zip
could actually be 0 bytes in size, I needed to consider that case when allocating multiple buffers and not only the possibility of files entries that are actually directories. (Bonus info: directory entries creation depends on .zip
creator software, different software tread them in different ways!)
So, after addressing those issues I tested again and no more random crashes... but still failing randomly, some files created as 0 bytes (obviously, files that were not supposed to be 0 bytes). And again, some hours trying to find-out the problem...
I was convinced that issue should be on the async process but after some hours trying to find a solution (and being a bit obfuscated), I randomly tried to replace WriteFileEx()
with the WriteFile()
synchronous alternative. To my surprise, installation took 5.6 seconds!!! And all the files were installed successfully!
After multiple tests, everything worked but I noticed the installation time could vary between 5-8 seconds depending on multiple factors (restarted computer, open programs, build mode...) but it's still quite impressive for a single-thread decompression and synchonous disk file writting!
Comparison of installation speed for rInstallFriendly v1.0
vs rInstallFriendly v2.0
. 10x Optimization! Notice the stutering happening on first image while playing the game. No time to play on second image!
I've been further investigating the possible issues with WriteFileEx()
and I found this remark in the docs:
A common mistake is to reuse an
OVERLAPPED
structure before the previous asynchronous operation has been completed. You should use a separate structure for each request. You should also create an event object for each thread that processes data. If you store the event handles in an array, you could easily wait for all events to be signaled using theWaitForMultipleObjects
function.
That was probably my issue with WriteFileEx()
, I was creating an OVERLAPPED
variable per file but internal to my custom WinWriteFile()
function, so it was discarded when out of scope, and probably before the async operation completed.
- Use multiple threads for decompression: For my test cases, decompression was not a bottleneck, actually decompression is really fast! Still, when processing so many files, processing can be probably divided into several threads. My concern is the
.zip
file-access and thread synchronization, specially for the rollback case (when user cancels the installation and intalled files are removed). - Use
WriteFileEx()
properly: Every call to the function should use its ownOVERLAPPED
structure but the async processes should be carefull synchronized, detecting when a file-writing process is ended and only at that moment freeing the file memory. It seems it requires a more complex implementation than current one. - Use memory mapped files: I got that recommendation by a highly experienced developer, instead of using
WriteFileEx()
, useCreateFileMapping()
. Knowing the required file size, just create the memory mapping and write to memory as usual, the OS should take care automatically of async writing that memory to disk, usually in a very efficiently way. Undoubtely, it worths a try!
One of the issues I detected with rInstallFriendly v1.0
was that, in case of small software packages, installation was too fast and the banner/ad/game was not displayed enough time for the users to notice/enjoy it.
Solution: Adding a configurable minimum install time, so developers could set a minimum time the display banners or game, independently of the installation speed.
What an irony! I improved installation time by an order of magnitude but I also added an option to slow down installation time while being too fast... the fun of programming!
rInstallFriendly v2.0
minimum installation time option and result installing the package that previously took ~6 seconds. Now users can enjoy the installer game... and without stuttering!
Thanks for reading! Feel free to comment or ask me in this gist thread!
Beware using
CreateFileMapping
for memory mapped file IO. The issue is when there are errors. These will be raised as SEH exceptions for the entire thread. Mapped files are great when they work, but it's an all-or-nothing approach if there is any file IO issue.(if I'm wrong, I would be v.happy to find out how to manage errors when doing memory mapped file IO)