TOC
- Detecting postfix operators in for loops (you are here)
- Detecting unused functions
- Detecting wrong first include
Introduction
Foreword
The LLVM project has been started back in 2000, but became widely known and used quite recently, about 4 years as of now (2014). It's used by hardware manufactures to implement programming languages for their products as well as by software engineering companies to develop their own programs. There are two main reasons why LLVM gained such popularity:
- Extensible compiler infrastructure built from scratch in C++.
- License that allows for proprietary usage.
We're not here to debate whether second reason is good or bad, but the first one is definitely a good one. It's the first reason which GCC is missing and which is hard to get after tenth years of development. Architecture of LLVM is unique among existing compilers and is its most significant advantage.
Probably the most well known component (tool) of LLVM is Clang, which provides not only compiles languages of C-family, but also exposes its internal data structures for dealing with program being compiled and provides various means to deal with them. Later Clang has been updated with specialized facilities to process source files and even to modify them.
As it's now, Clang can be used to implement tools for processing code written in C/C++/Objective-C/Objective-C++ relatively easily. With it anyone can write tools for code processing with quite little amount of effort. The hard part is to start writing such tools as it's easy to get lost in lots of headers and libraries supplied with LLVM and Clang. Hence the main idea behind these series of articles: give one enough knowledge to be able to use Clang for writing tools. To achieve the goal, we'll look at examples of building Clang-tools from scratch. There will also be referenced materials that are recommended to read/watch/consult as using Clang and LLVM is already documented in many sources. Although there are plenty of resources on Clang, most of them have two issues:
- there are more or less out of date (because of quite high pace of development and breaking changes from time to time);
- some of covered examples are too artificial.
That's why we're going to build a small set of tools some of which might not be self contained, but nevertheless should be usable in real life. Process of building such tools will shed light on different parts of Clang, which will help construct a better image of what it is and in what ways it can be used.
Environment
Here are some details of my working environment to help solve possible issues with initial setup (see below).
GNU/Linux operating system.
GCC version 4.8.1
LLVM from 11.03.2014 built with GCC:
llvm/trunk@203604 (git mirror hash: 819af77aa3a9da84f666dc252815aec9f1cf18f5)
- Clang from 11.03.2014 built with GCC:
cfe/trunk@203603 (git mirror hash: 640884e00a911d9a599dd8fd5dd26cdd96dfc9ea)
cmake version 2.8.8
ninja version 1.4.0
everything is built with GCC
Your first tool
To get started one needs to have a working build of LLVM and Clang. This is not a subject of this writing and there is already a good tutorial on this (read notes below before following instructions in the tutorial):
Tutorial for building tools using LibTooling and LibASTMatchers
It seems to be a bit out of date, so here are some additional moments to consider:
- Modern versions of cmake are built with ninja support, so good chances that there is no need to recompile it, check it by running:
cmake | grep Ninja
- I suggest enabling generation of documentation with Doxygen. For this one
needs to set the following
ccmake
options toYES
before starting the build (see a small hint on usingccmake
below):
- `CLANG_INCLUDE_DOCS`
- `LLVM_BUILD_DOCS`
- `LLVM_INCLUDE_DOCS`
This way you'll be sure that you're referring to the correct version of documentation while writing your tools.
You might want to set LLVM_BUILD_DOCS
to OFF
once everything is built, it
won't delete generated documentation, but will reduce time of compilation
when build is started by ninja
command with no arguments.
It's said that all tests should pass. Current version of tests display messages about expected failed tests, that seems to be OK.
If you decide to rebuilt Clang with itself don't forget to setup
CMAKE_CXX_COMPILER
as well asCMAKE_C_COMPILER
(use full path toclang
executable for it). Actually recompiling with Clang is optional, no need to waste time on that.In step 1 there is a command:
cat "int main() { return 0; }" > test.cpp
Which should be:
echo "int main() { return 0; }" > test.cpp
ccmake
hint: use arrow keys to navigate in terminal interface, j
/k
doesn't work there, but searching with /
works.
Once you're done with that tutorial, you should feel like knowing a lot of new stuff about it, still not being 100% sure that you able to write your own tool. It's advised to watch Introduction to the Clang AST now. Actually the order can be reversed (watch the video first), but as Manuel Klimek (the guy in the video) describes more advanced topics, it makes sense to do basic tutorial first.
Now, having some background, we can start creating our small, but useful tool.
Detecting postfix operators in for loops
Our goal is to analyse source files to find postfix increment and decrement operators in the last part of for-loop statements. The drawback of using postfix operator is that it can lead to performance penalty in some cases. As you probably know, postfix operator differs from prefix one in creating one extra object, which normally is not optimized. It's almost free for primitive types and can be free when it goes about iterators from library provided along with a compiler, but in general it's a good practice to use postfix increment/decrement operators only when you have a reason to do so.
There is already a plenty of tools that can detect misuse of postfix operators, but such warnings can get lost in output of mature static analyzers and be just skipped by a developer. This tool we'll do just that so skipping such warnings would be a stupid thing to do. This is just a good programming style to use postfix operators, so let's write a tool that ensures correctness of for-loops and learn more about Clang's AST at the same time.
By the way, we'll name the tool "for-postfix" (not to be confused with mail transfer agent). See "Additional resources" section below for a link to repository containing ready to use tool. History of commits is quite clear, so one can examine commits one by one while reading the article.
Boilerplate
Note, I go in detail in this section just to repeat what you have learnt from the tutorial, skip this part if you feel like you don't need to repeat anything.
Just as in "Tutorial for building tools using LibTooling and LibASTMatchers" referenced above, we're going to use LibTooling and ASTMatchers. There is also RecursiveASTVisitor, but it's now superseded by ASTMatchers.
To get started go to llvm/tools/clang/tools/extra/
directory and
create a subdirectory called for-postfix
. Navigate to new directory and
create CMakeLists.txt
containing:
set(LLVM_LINK_COMPONENTS support)
set(LLVM_USED_LIBS clangTooling clangBasic clangAST)
add_clang_executable(for-postfix for-postfix.cpp)
target_link_libraries(for-postfix clangTooling clangBasic clangASTMatchers)
Another file that needs to be created is for-postfix.cpp
. For now let's put a
dummy code there:
#include <cstdlib>
int
main(int argc, const char *argv[])
{
return EXIT_SUCCESS;
}
Add add_subdirectory(for-postfix)
line at the end of the
llvm/tools/clang/tools/extra/CMakeLists.txt
, so that our tool is built with
LLVM/Clang.
Now I suggest you to go into your build/
directory and run ninja
for-postfix
just to make sure the configuration is correct. If it is, you'll
find an executable build/bin/for-postfix
that does nothing when it's run.
Now let's write a skeleton of the tool (I give this as it is because you should already know basics of this kind of stuff):
#include <llvm/Support/CommandLine.h>
#include <clang/ASTMatchers/ASTMatchFinder.h>
#include <clang/Tooling/CommonOptionsParser.h>
#include <clang/Tooling/Tooling.h>
using namespace clang::ast_matchers;
using namespace clang::tooling;
static llvm::cl::OptionCategory toolCategory("for-postfix options");
static llvm::cl::extrahelp commonHelp(CommonOptionsParser::HelpMessage);
int
main(int argc, const char *argv[])
{
CommonOptionsParser optionsParser(argc, argv, toolCategory);
ClangTool tool(optionsParser.getCompilations(),
optionsParser.getSourcePathList());
MatchFinder finder;
return tool.run(newFrontendActionFactory(&finder));
}
This should compile and link fine. By the way, notice unusual const
in front
of argv
parameter declaration, this is needed to satisfy prototype of
CommonOptionsParser constructor.
Matching builtin types
Finally we're ready to start matching things we're interested in. Our first matcher will match increment operators, both prefix and postfix:
static StatementMatcher incMatcher =
forStmt( // for ([init]; [condition]; [increment])
hasIncrement( // "increment" part of for-loop
unaryOperator( // any unary op, e.g. *, &, --
hasOperatorName("++") // exact unary op: ++
).bind("op") // bind matched unary op to "op" name
)
);
Above piece of code should be self-explaining, but notice where bind
method
is called. It can be called on objects that correspond to real AST nodes, like
for-statements (result of forStmt
) and
operators (result of calling unaryOperator
), but can't be called on something
that just examines properties of AST nodes (like hasIncrement
or
hasOperatorName
).
To proceed further we need one more entity, MatchFinder
's callback handler:
class MatchHelper : public MatchFinder::MatchCallback
{
public:
virtual void run(const MatchFinder::MatchResult &result)
{
using namespace clang;
typedef UnaryOperator UnOp;
if (const UnOp *op = result.Nodes.getNodeAs<UnOp>("op")) {
op->dump();
std::cout << '\n';
}
}
};
Also don't forget to include <iostream>
before compiling.
You should be familiar with the idea behind this code: it tries to extract
unary operator node named op
and dumps it on the screen when succeed.
Let's create our helper and use it to add our matcher in the main
function:
MatchHelper helper;
finder.addMatcher(incMatcher, &helper);
At this point running our tool over this example file:
#include <cstdlib>
inline void doNothing()
{
// do nothing on purpose
}
int
main(void)
{
for (int i = 0; i < 10; i++) {
doNothing();
}
for (int i = 0; i < 10; ++i) {
doNothing();
}
return EXIT_SUCCESS;
}
Should output both i++
and ++i
like this (ignore addresses):
UnaryOperator 0x2727ed0 'int' postfix '++'
`-DeclRefExpr 0x2727ea8 'int' lvalue Var 0x2727d90 'i' 'int'
UnaryOperator 0x2728160 'int' lvalue prefix '++'
`-DeclRefExpr 0x2728138 'int' lvalue Var 0x2728020 'i' 'int'
Lets filter out the second match in our helper. Just surround dumping code with:
if (op->isPostfix()) {
// print nodes here
}
What we have now is not capable of handling decrement operator. To fix that, simplify the matcher to remove operator name check (note name change):
static StatementMatcher builtinMatcher =
forStmt( // for ([init]; [condition]; [increment])
hasIncrement( // "increment" part of for-loop
unaryOperator( // any unary op, e.g. *, &, --
).bind("op") // bind matched unary op to "op" name
)
);
Note that there was one condition, which is now removed, while actually one
can use many conditions separated by commas. We're not removing the check
completely, just moving it to the MatchHelper
, where op->isPostfix()
condition needs to be replaced with:
if (op->isIncrementDecrementOp() && op->isPostfix()) {
// print nodes here
}
That's it for builtin operators.
Matching overloaded operators
This example uses postfix increment operator, but our for-postfix
is not able
to find it yet:
#include <cstdlib>
#include <vector>
inline void
doNothing()
{
// do nothing on purpose
}
int
main(void)
{
std::vector<int> v;
typedef std::vector<int> IntVector;
for (IntVector::const_iterator cit = v.begin(); cit != v.end(); cit++) {
doNothing();
}
return EXIT_SUCCESS;
}
Our matcher could look like this:
static StatementMatcher incMatcher =
forStmt( // for ([init]; [condition]; [increment])
hasIncrement( // "increment" part of for-loop
operatorCallExpr( // call of overloaded operator
hasOverloadedOperatorName("++"), // named "++"
argumentCountIs(2) // that requires two arguments
).bind("op") // bind matched unary op to "op" name
)
);
This one is too specific and requires its clone to match decrement operator (couldn't find a way to implement match chooser, something like OR operator). It also has two interesting properties:
operatorCallExpr
got two conditions that areAND
ed with each other;argumentCountIs
is used to match postfix operator, here first argument isthis
and the second one is uselessint
to make overloading of postfix operators possible.
Let's throw away matching of operator name and do this in MatchHelper
later:
static StatementMatcher opMatcher =
forStmt( // for ([init]; [condition]; [increment])
hasIncrement( // "increment" part of for-loop
operatorCallExpr( // call of overloaded operator
argumentCountIs(2) // that requires two arguments
).bind("op") // bind matched unary op to "op" name
)
);
Now we need to update MatchHelper
a bit to handle overloaded operators, which
it won't match now as CallExpr
doesn't subclass UnaryOperator
:
typedef CXXOperatorCallExpr Call;
// ...
} else if (const Call *op = result.Nodes.getNodeAs<Call>("op")) {
const OverloadedOperatorKind opKind = op->getOperator();
if (opKind == OO_PlusPlus || opKind == OO_MinusMinus) {
printOut(result, op);
}
}
Where printOut
contains two lines from above that dump operator nodes. As
you can see matching specific operators is quite easy. The crucial part in
this case is to use right type of node. Regular CallExpr
won't let us to
check for operator type, CXXOperatorCallExpr
is needed instead.
Now let's register new matcher in MatchFinder
:
finder.addMatcher(opMatcher, &helper);
Note that by adding multiple matchers to match finder one can match different things in one traverse through AST.
Formatting output
We're almost done. Current way of printing out nodes is not readable, it would be better to print path to source file with match and number of line. First of all we need to include more headers and add one more using directive:
#include <clang/Basic/SourceLocation.h>
#include <clang/Basic/SourceManager.h>
using namespace clang;
Then update printOut
method:
FullSourceLoc fullLoc(expr->getLocStart(), *result.SourceManager);
const std::string &fileName = result.SourceManager->getFilename(fullLoc);
const unsigned int lineNum = fullLoc.getSpellingLineNumber();
std::cout << fileName
<< ":"
<< lineNum
<< ":"
<< "dangerous use of postfix operator"
<< '\n';
This snippet outputs match in the following format:
<path to source file>:<line number>:<description>
Example output:
/home/xaizek/repos/clang-llvm/build/tst4.cpp:16:dangerous use of postfix operator
/home/xaizek/repos/clang-llvm/build/tst4.cpp:24:dangerous use of postfix operator
It's alike output of Unix grep command (with -F and -n switches) and can be accepted by many other tools accepted by many other tools out there, e.g. one can store output of the tool to a file and open it in Vim to navigate through errors and edit them like this:
bin/for-postfix test.cpp -- > errors-list
vim -q errors-list
This tool can already be used in real projects to find misuse of postfix operators.
Additional resources (not referenced above)
The Design of LLVM by Chris Lattner, the primary author of the LLVM and related projects, such as the Clang compiler.