Created
October 2, 2023 16:19
-
-
Save snej/2672fe996d39752e23c471f6ed789958 to your computer and use it in GitHub Desktop.
Script to find missing std #includes in C++ headers
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#! /usr/bin/env ruby | |
# | |
# missing_includes.rb | |
# By Jens Alfke <[email protected]> | |
# Version 2.0 -- 2 Oct 2023 | |
# Copyright 2021-Present Couchbase, Inc. | |
# | |
# This script scans C++ header files looking for usage of common standard library classes, like | |
# `std::vector`, without including their corresponding headers, like `<vector>`. It similarly looks | |
# for standard C functions like `strlen` that are used without including their header (`<cstring>`.) | |
# | |
# Such files may successfully build with one compiler, or standard library implementation, but | |
# fail with another, due to differences in which other headers the standard library headers include. | |
# | |
# **This script is, unapologetically, a hack.** It's not software engineering, it's a quick way to | |
# alleviate a problem I keep having when I submit my (Xcode-with-Clang built) local branch to | |
# upstream CI and get unknown-identifier errors from GCC and/or MSVC. | |
# | |
# Examples of output: | |
# - Default mode: | |
# *** include/foo.hh | |
# #include <functional> // for std::function, line 154 | |
# - Compiler-warning mode (`--warn`): | |
# include/foo.hh:154: warning: Use of 'std::function' without including <functional> | |
# | |
# Disclaimers & Limitations: | |
# | |
# * This script does not use a real parser, just a bunch of regexes. [Obligatory jwz link] | |
# * It does not know about every symbol in every library header, just the ones I've added to the | |
# tables below. You are most welcome to add more. | |
# * It assumes the `std::` namespace is used explicitly, i.e. it ignores `vector` by itself. | |
# * Some functions, like `std::swap`, are defined in multiple headers with different parameter | |
# types. A simple hack like this can't possibly understand that. | |
# * It doesn't know about the original C headers like `<string.h>`, just their C++ adapters. | |
# * **It does not follow `#includes`.** It doesn't look at local headers #include'd by a header, | |
# header, so it will complain about `std::vector` even if the current header includes another | |
# header that includes `<vector>`. This is partly laziness, but mostly intentional. In such a | |
# situation you might alter the include'd header to not use vectors any more and remove the | |
# `#include <vector>`, causing a bunch of other header files to break. Or you might copy | |
# the downstream header to another project and then it won't compile until you figure out what | |
# includes to add. | |
# * **It only looks at header files, not source files.** Due to the above limitations, it's a lot | |
# less useful in source files. Source files commonly don't repeat library includes from their | |
# matching header. Source files often do `using namespace std`; at least, mine do. | |
# | |
# | |
# Licensed under the Apache License, Version 2.0 (the "License"); | |
# you may not use this file except in compliance with the License. | |
# You may obtain a copy of the License at | |
# | |
# http://www.apache.org/licenses/LICENSE-2.0 | |
# | |
# Unless required by applicable law or agreed to in writing, software | |
# distributed under the License is distributed on an "AS IS" BASIS, | |
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |
# See the License for the specific language governing permissions and | |
# limitations under the License. | |
# | |
require "optparse" | |
require "ostruct" | |
require "pathname" | |
require "set" | |
# ANSI terminal color escapes | |
PLAIN = "\033[0m" | |
BOLD = "\033[1m" | |
DIM = "\033[2m" | |
ITALIC = "\033[3m" | |
UNDERLINE = "\033[4m" | |
CYAN = "\033[96m" | |
YELLOW = "\033[93m" | |
# The set of filename extensions I look for; the --ext flag adds to this. | |
HeaderFileExtensions = Set.new([".hh", ".hpp"]) | |
# `look_for()` adds this prefix to identifiers. | |
StdPrefix = "std::" | |
# Maps from system header name to an array of other headers it canonically includes. | |
# (I got this from looking at the header documentation at cppreference.com.) | |
Includes = { | |
"algorithm" => ["initializer_list"], | |
"array" => ["compare", "initializer_list"], | |
"chrono" => ["compare"], | |
"iostream" => ["ios", "streambuf", "istream", "ostream"], | |
"map" => ["compare", "initializer_list"], | |
"memory" => ["compare"], | |
"optional" => ["compare"], | |
"set" => ["compare", "initializer_list"], | |
"stdexcept" => ["exception"], | |
"string" => ["compare", "initializer_list"], | |
"string_view" => ["compare"], | |
"tuple" => ["compare"], | |
"unordered_map" => ["compare", "initializer_list"], | |
"utility" => ["compare", "initializer_list"], | |
"variant" => ["compare"], | |
"vector" => ["compare", "initializer_list"], | |
} | |
# Adds `name`, and any other headers known to be included by it, to the set `headers`. | |
def addIncludes(headers, name) | |
unless headers.include?(name) then | |
headers.add(name) | |
included = Includes[name] | |
if included then | |
included.each {|h| addIncludes(headers, h)} | |
end | |
end | |
end | |
# Maps a regex for a class/fn name, to the name of the header it's defined in | |
Classes = Hash.new() | |
# Adds a bunch of identifiers and the header they're defined in | |
def look_for(header, identifiers) | |
identifiers.each do |cls| | |
Classes[Regexp.new("\\b" + StdPrefix + cls + "\\b")] = header | |
end | |
end | |
def look_for_c(header, identifiers) | |
identifiers.each do |cls| | |
Classes[Regexp.new("\\b" + cls + "\\b")] = header | |
end | |
end | |
# First parameter is header name, second is a list of symbols (without the `std::`) that require that header. | |
look_for("algorithm", ["binary_search", "clamp", "lower_bound", "max", "min", "minmax", "sort", "upper_bound"]) | |
look_for("any", ["any", "make_any", "any_cast"]) | |
look_for("array", ["array"]) | |
look_for("atomic", ["atomic", "atomic_\\w+", "memory_order"]) | |
look_for("chrono", ["chrono"]) | |
look_for("compare", ["strong_order", "weak_order", "partial_ordering", "weak_ordering", "three_way_comparable", "three_way_comparable_with"]) | |
look_for("deque", ["deque"]) | |
look_for("exception", ["exception", "current_exception", "exception_ptr", "make_exception_ptr", "rethrow_exception", "terminate"]) | |
look_for("fstream", ["filebuf", "ifstream", "ofstream", "fstream"]) | |
look_for("functional", ["function", "bind", "ref", "invoke", "invoke_r", "mem_fn", "reference_wrapper", "unwrap_reference"]) | |
look_for("initializer_list",["initializer_list"]) | |
look_for("iosfwd", ["char_traits", "istream", "ostream", "fstream", "stringstream", "fpos"]) | |
look_for("iostream", ["cerr", "cin", "cout", "clog"]) | |
look_for("map", ["map", "multimap"]) | |
look_for("memory", ["make_unique", "make_shared", "shared_ptr", "unique_ptr", "weak_ptr", "allocator", "allocator_traits", "pointer_traits"]) | |
look_for("mutex", ["mutex", "timed_mutex", "recursive_mutex", "lock_guard", "unique_lock", "scoped_lock", "once_flag", "call_once"]) | |
look_for("optional", ["make_optional", "optional", "nullopt"]) | |
look_for("regex", ["regex", "sub_match", "match_results"]) | |
look_for("set", ["set"]) | |
look_for("sstream", ["string_stream", "stringbuf"]) | |
look_for("string", ["string", "basic_string", "char_traits", "stoi", "stol", "stoll", "stoul", "stoull", "stof", "stod", "to_string", "npos"]) | |
look_for("stdexcept", ["logic_error", "runtime_error", "invalid_argument", "domain_error", "length_error", "range_error", "overflow_error", "underflow_error"]) | |
look_for("string_view", ["string_view"]) | |
look_for("tuple", ["tie", "tuple"]) | |
look_for("typeinfo", ["type_info", "bad_typeid", "bad_cast"]) | |
look_for("unordered_map", ["unordered_map", "unordered_multimap"]) | |
look_for("unordered_set", ["unordered_set", "unordered_multiset"]) | |
look_for("utility", ["forward", "move", "pair", "get", "swap"]) | |
look_for("variant", ["variant", "visit", "get_if", "monostate"]) | |
look_for("vector", ["vector"]) | |
# TODO: This is obviously incomplete. I've just been adding the most common stuff I find. | |
look_for_c("cassert", ["assert"]) | |
look_for_c("cmath", ["abs", "ceil", "floor"]) | |
look_for_c("cstring", ["memcmp", "memcpy", "memmove", "strlen", "strcpy", "strchr", "strrchr"]) | |
look_for_c("cstdio", ["printf", "sprintf", "fprintf"]) | |
##### TOOL CODE | |
# Command-line options | |
Options = OpenStruct.new | |
Options.verbose = false | |
Options.prefix = nil | |
Options.commonHeaders = Set.new() | |
Options.humanReadable = true | |
Options.diagnostic = "warning" | |
# Process result | |
$finalResult = 0 | |
# Processes a file. | |
def scan_file(pathname) | |
headers = Options.commonHeaders.clone() | |
first = true | |
lineno = 0 | |
file = File.new(pathname.to_s) | |
file.set_encoding("UTF-8") | |
file.each_line do |line| | |
lineno += 1 | |
# TODO: Remove C-style comments, even multiline | |
line = line.split("//")[0] | |
if line =~ /\s*#include\s+<(\w+(\.h)?)>/ then | |
# Found an #include<...>: | |
addIncludes(headers, $1) | |
else | |
Classes.each do |classRegexp, headerName| | |
if not headers.include?(headerName) and line =~ classRegexp then | |
# Found a symbol without a prior #include of its header: | |
name = classRegexp.source[2..-3] # strip the "\b" | |
if Options.humanReadable then | |
if first then | |
first = false | |
puts "#{BOLD}*** #{pathname.parent}/#{YELLOW}#{pathname.basename}#{PLAIN}" | |
end | |
$stdout.write "\t\#include #{BOLD}#{CYAN}<#{headerName}>#{PLAIN}" | |
if Options.verbose then | |
$stdout.write "\t#{ITALIC}#{DIM}// for #{name}, line #{lineno}#{PLAIN}" | |
end | |
puts "" | |
$finalResult = 1 | |
else | |
# Machine-readable (compiler output) form: | |
$stderr.write "#{pathname.parent}/#{pathname.basename}:#{lineno}: #{Options.diagnostic}: Use of '#{name}' without including <#{headerName}> [missing_includes.rb]\n" | |
$finalResult = 1 if Options.diagnostic == "error" | |
end | |
headers.add(headerName) # So I don't complain about the same header again | |
# TODO: Would be nice to alphabetize by header name | |
end | |
end | |
end | |
end | |
return headers | |
end | |
# Processes a directory tree | |
def scan_tree(dir) | |
dir.find do |file| | |
if HeaderFileExtensions.include?(file.extname) then | |
begin | |
scan_file(file) | |
rescue => detail | |
$stderr.write "Exception scanning #{file}: #{detail}\n\t" | |
$stderr.write detail.backtrace.join("\n\t"), "\n\n" | |
$finalResult = -1 | |
end | |
end | |
end | |
end | |
OptionParser.new do |opts| | |
opts.banner = "#{BOLD}Usage: missing_includes.rb DIR...#{PLAIN}" | |
opts.on("--prefix HEADER", "--base HEADER", "Assume every header includes this file") do |p| | |
Options.commonHeaders.merge(scan_file(Pathname.new(p))) | |
end | |
opts.on("--ignore HEADER", "Ignore missing <HEADER>. May give multiple headers separated by commas.") do |h| | |
Options.commonHeaders.merge(h.split(",")) | |
end | |
opts.on("--ext EXT", "Scan filenames ending with EXT too.") do |ext| | |
ext = "." + ext unless ext.start_with?(".") | |
HeaderFileExtensions.add(ext) | |
end | |
opts.on("--warn", "--warning", "-w", "Write compiler-style warnings to stderr") do | |
Options.humanReadable = false | |
Options.diagnostic = "warning" | |
end | |
opts.on("--error", "-e", "Write compiler-style errors to stderr") do | |
Options.humanReadable = false | |
Options.diagnostic = "error" | |
end | |
opts.on_tail("-v", "--[no-]verbose", "Verbose: show why each #include is needed") do |v| | |
Options.verbose = v | |
end | |
opts.on_tail("-h", "--help", "Show this message") do | |
puts opts | |
puts "" | |
puts "Finds C++ and C standard library headers you should probably \#include." | |
puts "Looks at all '.hh' and '.hpp' files in each given directory tree." | |
puts "When it finds a standard library identifier it knows about, like `std::vector` or" | |
puts "`strlen`, it checks if the corresponding header was included; if not, it prints a warning." | |
puts | |
puts "It works from a hardcoded list of common identifiers; this list is not comprehensive." | |
puts "Nor does it scan other local headers transitively included." | |
puts "Hopefully you'll find it useful anyway! I do." | |
exit | |
end | |
end.parse! | |
if ARGV.empty? then | |
puts "Please give at least one directory to scan. (Use --help for help.)" | |
exit 1 | |
end | |
ARGV.each do |arg| | |
scan_tree(Pathname.new(arg)) | |
end | |
exit $finalResult |
Yes, with flags --ext .h
. But to make it useful you'd need to add a lot more look_for_c(...)
rules, and append .h
to the header names in the first parameter.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Does it work for C aswell?