Skip to content

Instantly share code, notes, and snippets.

@Arsenic-ATG
Created August 26, 2023 14:26
Show Gist options
  • Save Arsenic-ATG/65fa1d38d2677dc29d33191f47af66d4 to your computer and use it in GitHub Desktop.
Save Arsenic-ATG/65fa1d38d2677dc29d33191f47af66d4 to your computer and use it in GitHub Desktop.
GSoC 2023 : Enabling LLVM clang ExtractAPI while building

Overview

This project aims to give LLVM-clang the ability to generate ExtractAPI symbol graph files as a side-effect of a regular compilation job. This can enable using the symbol graph format as a lightweight alternative to do code intelligence offline and outside of an interactive context.

Status of ExtractAPI before this project

Before, one could generate the symbol graph file using the following two ways.

  1. Using clang -extract-api command line option : This method is geared towards generating symbol graphs that would be used to generate documentation for a project, to achieve this effectively, only the symbols that are present in the headers of the library are kept and other are ignored,these ignored symbols include symbols from standard library of the language ( like std::vector in c++ ) and symbols that are introduced by other dependency of the library which are not relevant to documentation of the target library

  2. Using libclang interface : This is mainly for interactive IDE like scenarios and is more geared towards generating symbol graph of a single symbol at a time. However these methods don't help a lot when we try to use this generated symbol graph file for getting semantic code information for code intelligence services where one is more interested in exact information about all the symbols. For example one might be interested in the functions/symbols the current function/symbol is depending on, and/or how 2 different symbols in the source are related to each other. This information is not provided by the current methods that are used to generate symbol graph and altering any of them to achieve it would conflict with their respective purposes.

Project Goals :

The project aims to provide a third way to generate symbol graph information as a side effect of regular compilation. The project should be able to work on complete sources and generate a symbol graph file, the purpose of processing full sources is to identify different relationships between the symbols which is useful information for someone who is analyzing a code or creating a tool to do the same. More precisely the project involves

  • Adding a new option to clang “--emit-symbol-graph” which would generate symbol graphs for .m/.c files as a side-effect of building the .o file.

  • Creating a tool to merge symbol graph files into a single unified symbol graph file in the same way a static linker links individual object files.

  • Identifying and Implementing support for newly discovered relationships in the implementations with different symbols and having a full graph of which symbols depend on which other symbols.

Challenges faced/ Useful learning :

  1. Because of the way clang handles command line actions, in order to make ExtractAPI run along with codegen action, a new wrapper action was needed to be created which should be multiplexed on the some other action and generate/use the ExtractAPI consumer along with it to parse the AST and generate the symbolgraph.

  2. Creating clang-symbolgraph-merger was a bigger challenge as the method we went for was to make sure the merger can be useful even if in future, symbolgraph is generated in some format other than JSON. This lead to an approach where the merger first parses all the individual symbolgraph files and then create a unified APISet from them which is then serialized back to one single symbolgraph file using the pre-existing ExtractAPI serializer. the challenge here was the fact that APISet was not exactly meant to be constructed for this purpose, so extracting all the information from JSON file and then adding them to the APISet was a tricky task.

List of Commits/Patches :

Current status of the project :

Now apart from already existing 2 options, ExtractAPI can also be invoked to generate symbolgraph as a side effect of regular compilation job using the new --emit-symbol-graph=<output_dir> option which can be provided to the driver at the time of building the project.

This new options generates a symbolgraph file for each source code file and place them in the specified directory, and unlike other 2 options, this one emits symbol information of all the discovered symbols and not just the ones that are relevant for document generation or for one perticular symbol.

The project also include adding a merging tool called "clang-symbolgraph-merger" which can be used to merge all the multiple symbolgraph files into one unified symbolgraph file. The patch containing this tool is currently not merged into the main llvm repository as while it is known to work for C project, it is still under review and testing phase ( especially for Objective-C projects ) and can be accessed via D158646 on llvm phabricator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment