TOC
- Detecting postfix operators in for loops
- Detecting unused functions
- Detecting wrong first include (you are here)
Introduction
There is a well known (or at least I hope that it's well known) good practice to include header that corresponds to the current source file before including any other headers. It's recommended by multiple C++ coding style guides (for example by Google C++ Style Guide) and indeed makes sense to follow.
To put it short: it makes finding incomplete headers really easy. If you forget to include required header, corresponding source file just won't compile. This way there will be no hidden (implicit) dependencies among modules of a project as well as dependencies on system headers. Below is a simple example that demonstrates such situation.
Let's look at three files:
// util.hpp
#ifndef BAD_FIRST_INCLUDE__UTIL_HPP__
#define BAD_FIRST_INCLUDE__UTIL_HPP__
std::string truncatePath(const std::string &path);
#endif // BAD_FIRST_INCLUDE__UTIL_HPP__
// util.cpp
#include <string>
#include "util.hpp"
// ...
// main.cpp
#include <string>
#include "util.hpp"
// ...
Although both cpp-files should compile without errors, it's easy to see that
util.hpp
file lacks include of <string>
system header. There is no errors
because both cpp-files include <string>
before including util.hpp
, which
hides the fact that util.hpp
is not standalone as it depends on source files
it's included in. Simple swapping two includes in util.cpp
file like this:
// util.cpp
#include "util.hpp"
#include <string>
// ...
Makes the issue pop up almost immediately, precisely when util.cpp
file is
compiled. As std::string
is unknown at point of inclusion of util.hpp
,
compiler will issue errors on its usages in util.hpp
.
This discussion is meant to outline usefulness of the approach when "self include" is placed at the top of the file above all other includes. Hope everyone are convinced now. Even though this usage of the approach clearly provides some benefits, not everyone use it and some don't even know about it. Here is where a need for tool that checks whether programmer followed described approach or not arises.
Such a tool could also check that corresponding header file is included at all. Since implementing this check doesn't add much interesting details to the code and requires some heuristics to check whether the header exists in the first place, it's left out of this memo. Nevertheless, implementing it later requires only additions to the code.
New type of FrontendAction
Comparing to previous articles, here we don't need AST at all. Parsing performed by preprocessor at lexical analysis stage of source file processing is enough to extract include directives. That is the reason why we need to get more familiar with FrontendAction. There are plenty of them and it's important to pick one that fits better.
So far we've indirectly met with the ASTFrontendAction
at the very end of our main()
functions. Here is the code:
return tool.run(newFrontendActionFactory(&finder.getMatchFinder()));
finder.getMatchFinder()
returns MatchFinder &. So where is
the FrontendAction
? In this case it's implicitly created by one of several
overloads of newFrontendActionFactory() function. If
you look at MatchFinder
closer, you'll see that it doesn't inherit from any
class. It works thanks to duck typing, overloaded used in this
case assumes that its argument implements method with the following signature
(or the similar one that can be used without compilation errors):
clang::ASTConsumer *newASTConsumer();
Then value returned by this function is wrapped within a local class derived
from ASTFrontendAction
.
To implement our current task another overload of
newFrontendActionFactory() function is needed. In this
case it takes type of FrontendAction
derived class as its only template
parameter and instantiates it when required.
If you can't understand/recollect why do we need to use any factory at all,
here is fast answer: each file is processed by new instance of
FrontendAction
, hence the need for a factory of such objects.
Now that we know how to create new FrontendAction
s, let's pick concrete type
of action we want to use. There are already several actions related to
preprocessor stage, which can be viewed at documentation page for
PreprocessorFrontendAction. It shouldn't be hard to guess that we'd
like to use PreprocessOnlyAction class, it successfully
prevents creation of AST which should make processing of really huge sources
fast and smooth for our tool.
For curious ones, here is how PreprocessOnlyAction
prevents AST generation in
code:
virtual bool usesPreprocessorOnly() const { return true; }
Here is documentation on usesPreprocessorOnly(). Yes, it's that easy, still not really obvious when you don't know it.
Implementing IncludeFinderAction
It should be clear now that IncludeFinderAction
needs to derive from
PreprocessOnlyAction
, but what methods should it implement or override?
Documentation mentions only
PreprocessOnlyAction::ExecuteAction, so the choice is
somewhat limited here and we need to overwrite exactly this method:
void
IncludeFinderAction::ExecuteAction()
{
IncludeFinder includeFinder(getCompilerInstance());
getCompilerInstance().getPreprocessor().addPPCallbacks(
includeFinder.createPreprocessorCallbacks()
);
clang::PreprocessOnlyAction::ExecuteAction();
includeFinder.diagnoseAndReport();
}
Version of ExecuteAction()
from parent class is invoked in the middle of our
code as we want this to happen:
- Create our auxiliary class that does the most interesting part of job (IncludeFinder).
- Subscribe it to preprocessor actions.
- Let preprocessor process input file as it would do normally.
- Analyze preprocessing results and print out results to a user.
Note usage of getCompilerInstance() in the code above.
It comes from FrontendAction
class and returns reference to
CompilerInstance class which holds root objects that allow
retrieval of all necessary information about source file processing.
Preprocessor callbacks
Some questions might arise when one is looking at the following lines:
getCompilerInstance().getPreprocessor().addPPCallbacks(
includeFinder.createPreprocessorCallbacks()
);
Particularly the following ones:
- Why don't we "remove" any callbacks later?
- Why callbacks are "created"?
There is one answer for both questions (well, almost): Preprocessor
takes
ownership over PPCallbacks object(s) passed to it. Actually,
that was the first part of the answer, there would be no need to create
anything if we could unsubscribe listener. So the second part is: callbacks
can't be removed after they are added. This means that callbacks handler must
be allocated on heap.
It seems to be a design decision, because internally PPChainedCallbacks class is used and there are other similar classes, e.g. ChainedASTReaderListener. Such chained structures doesn't provide means for removing chains, only to add them and to call corresponding methods.
Implementing callbacks
Due to callback details described in the previous section callbacks need to be implemented with a proxy class. Here is how it works in general:
And here is more detailed description:
ExecuteAction()
method createsCallbacksProxy
class with a reference to correspondingIncludeFinder
instance and hands it over to an instance ofPreprocessor
class (solid arrows).- When interesting callbacks occur
Preprocessor
callsCallbacksProxy
, which dispatches the call toIncludeFinder
(dashed arrows).
As for the list of callbacks, PPCallbacks
currently has 30 of them and we
need only one: PPCallbacks::InclusionDirective. It is
called once for every #include
(and alike) directive in the source code. The
callback is called even when the file inclusion directive refers to doesn't
exist. Seems to be exactly what we need.
Processing inclusions
There are two phases of processing here:
- Collecting all
#include
directives in a file. - Deciding whether self-include exists and is at correct place in the file.
The first phase is quite straightforward:
clang::SourceManager &sm = compiler.getSourceManager();
if (sm.isInMainFile(hashLoc)) {
const unsigned int lineNum = sm.getSpellingLineNumber(hashLoc);
includes.push_back(std::make_pair(lineNum, fileName));
}
Note SourceManager::isInMainFile() method, you might want to use it one day too. It's an easy way to filter out only those elements that are located in the main file that is processed (one of source files passed to the tool). This way we do not print files referenced by our includes, includes of our includes and so on.
Apart from that we also extract line number in the source file using
SourceManager::getSpellingLineNumber() method, which
is also quite handy and simpler than another way of obtaining same information
using FullSourceLoc
as was shown in previous articles.
The second phase doesn't have anything new about clang, but for completeness here are main points how check for include file is performed:
- Backward slashes (
\
) in full path to the source file are replaced with forward slashes (/
). - Trailing component of the path is extracted, which should be filename.
- Extension of the file is removed to get its name only (part of file name without extension is usually called "root").
- For each of found inclusion directives:
- Root of file is obtained and checked to match root of the source file. Skip this header if name doesn't match.
- Extension is extracted and checked against list of known extensions of header files. Headers with unknown extensions are skipped after issuing corresponding warning.
- We get to this step only if match is found. If we found more than one match, print a warning.
The last check that is left is the primitive one: just check that the first element of our collection of includes is the one that corresponds to the source file.
Testing
Repository of the tool contains some [test files]. Let's try the tool on a
couple of them. First with tests/good.cpp
file:
#include "good.hpp"
#include <utility>
> self-inc-first tests/good.cpp --
No output, as expected.
Now take a look at tests/bad.cpp
file:
#include <string>
#include "bad.hpp"
Output with truncated paths:
> self-inc-first tests/bad.cpp --
.../tests/bad.cpp:3:should be the first include in the file
Which is correct as well.
Conclusion
We've wrote another tool that enforces following good programming practices in C++ code. Although as of now it doesn't automatically guesses cases when header inclusion is completely missing in the file, the tool is already useful.
Expected workflow with it can be as follows:
- Run
self-inc-first
over all files of the project. - Move includes according to obtained output.
- Fix compilation errors.
After doing this actions, all source files should become self-contained. A simple rule that can significantly simplify building in the future.