Skip to content

Instantly share code, notes, and snippets.

@baldurk
Last active October 24, 2024 05:09
Show Gist options
  • Save baldurk/c6feb31b0305125c6d1a to your computer and use it in GitHub Desktop.
Save baldurk/c6feb31b0305125c6d1a to your computer and use it in GitHub Desktop.
Source indexing for github projects

Symbol Servers

I'm assuming you are familiar with symbol servers - you might not have one set up yourself for your own projects, but you probably use Microsoft's public symbol server for downloading symbols for system DLLs.

For your own projects it might be useful to set up a symbol server - I won't go into how you do that here since it's well documented elsewhere, but basically you just set up a folder somewhere - say X:\symbols\ or \servername\symbols or even http://servername.foo/symbols/ which has a defined tree structure:

symbols/
symbols/mymodule.pdb/
symbols/mymodule.pdb/123456789012345678901234567890122/
symbols/mymodule.pdb/123456789012345678901234567890122/mymodule.pd_

You can use symstore.exe from the debugging tools (installed as an option in the windows SDK - e.g. if you selected it, you might have it here: C:\Program Files (x86)\Windows Kits\10\Debuggers\x64\symstore.exe) to generate this tree path for you. It will also create some other metadata information that isn't necessary for symbol servers to work:

symstore.exe add /s 'X:\symbols' /compress /r /f *.pdb /t ProjectName

This will initialise the store in X:\symbols if it doesn't exist, or otherwise add all pdbs under the current directory recursively. It also compresses it which is what the .pd_ is about instead of just .pdb. You don't have to use symstore.exe though, any way of manually getting the module GUID and Age and constructing the tree is fine. NOTE: If you are serving this over HTTP, your server must be configured to 404 for modules that don't exist.

Another point - to get visual studio to load the symbols for modules from a crash dump, I also had to symstore.exe the .dll or .exe files. I don't know if this is required, but you can just substitute *.pdb for *.dll in the above command.

Once it's set up, you just add it to your _NT_SYMBOLS_PATH like so:

_NT_SYMBOL_PATH=SRV*C:\symbol_cache*X:\symbols*https://msdl.microsoft.com/download/symbols

Make sure you have a cache folder configured. This is where the source will be downloaded to as well as cached pdbs.

For the rest of this gist I'll just assume your symbol server is in X:\symbols.

Source Indexing

Symbol servers are nice and they mean you can always have the symbols available for every build you make and/or distribute, and you can just take the minidump.dmp and open it in VS and let VS figure out where the symbols are to resolve the callstack for you.

Source indexing goes one step further, and embeds information in the .pdb on how to retrieve the source from version control or anywhere really. That way you don't have to have the exact version of the source locally to debug it, and you don't get any more warnings from VS about mismatched source information when loading up a dump.

For github projects, this is really easy - all your paths are available online in an easy URL format. All we need to do is instead of embedding information on a command to run (as the default scripts do, to fetch the source from VSS or P4 or similar) is just to specify the URL format and let it look that page up.

The core tool is 'pdbstr.exe' which is under the srcsrv folder in the debugging tools. E.g. C:\Program Files (x86)\Windows Kits\10\Debuggers\x64\srcsrv\pdbstr.exe. This tool inserts a little script into each PDB to tell the debugger how to find the source.

For github it will look a little something like this:

SRCSRV: ini ------------------------------------------------
VERSION=2
VERCTRL=http
SRCSRV: variables ------------------------------------------
HTTP_ALIAS=https://raw.githubusercontent.com/baldurk/renderdoc/v0.27/
HTTP_EXTRACT_TARGET=%HTTP_ALIAS%%var2%
SRCSRVTRG=%HTTP_EXTRACT_TARGET%
SRCSRV: source files ---------------------------------------
... more files ...
t:\renderdoc\build\renderdoc\renderdoc\serialise\serialiser.h*renderdoc/serialise/serialiser.h
... more files ...
SRCSRV: end ------------------------------------------------

Notice that the HTTP_ALIAS specifies the base URL where your files can be found. Then the main body follows the form:

X:\build\path\source\file\subfolder\source.cpp*source/file/subfolder/source.cpp

Which maps the local path embedded in the PDB at build time to the URL on the website. I don't know if this path is case sensitive - chances are not since the filesystem isn't case sensitive, but it might be strcmp()'d against the path used in the build process so it might have to match how your build system defines the paths. I do know that you must use \s for path separators not /s so it could be just a string compare.

I'll leave the generation of this script file up to you. When you're done, you can insert it into the pdb like so:

pdbstr.exe -w -p:mymodule.pdb -s:srcsrv -i:pdbstr.txt

Which takes the existing .pdb, and adds a new section to it called srcsrv with the contents of the .txt file. This process doesn't modify the GUID/Age of the .pdb so you can do it safely at any time after you've built the executable. Once you've done that, store the pdb as normal and whenever it is used then the source server will be looked up (provided you've enabled it in visual studio, see tools -> options -> debugging and enable the option there).

If you want to configure the source indexing to use a command to look up your local source control you can do that. Take a look at the srcsrv.doc documentation and the example scripts for vss/p4/svn/cvs in the srcsrv/ folder to see more. You can basically customise a command and do variable substitution to tell the debugger how to fetch the source.

The source gets cached in whichever symbol cache folder you specified in your _NT_SYMBOL_PATH, under a src/ folder, if you ever want to clear it out.

IMPORTANT NOTE: Only enable source server lookups if you are using pdbs you trust. This is because the source server information embedded in the pdbs contains an arbitrary command on how to fetch the source, which could do something nefarious. There are security prompts if the .exe is unrecognised, but you might have p4.exe whitelisted or similar and it could do "revert all my local changes".

See Also

@BrunoJuchli
Copy link

The link to "Powershell script that might automate all this" is dead. But it might be this one: https://github.com/Haemoglobin/GitHub-Source-Indexer/blob/master/github-sourceindexer.ps1

@shenxiaolong-code
Copy link

here is the complete solution and test example :
https://github.com/ShenXiaolong1976/sourceIndex_forGit

@juangburgos
Copy link

This can also work for Gitlab if we use a proxy to convert URLs, because Gitlab only gives raw file access to files with query strings, but WinDBG and VisualStudio do not support http requests with query strings. So I made a small app to make the conversion:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment