Some notes on computer stuff

Clang by example: Detecting postfix operators in for loops

May 2, 2014
[c++] [clang] [llvm] [tutorial]

TOC

  1. Detecting postfix operators in for loops (you are here)
  2. Detecting unused functions
  3. Detecting wrong first include

Introduction

Foreword

The LLVM project has been started back in 2000, but became widely known and used quite recently, about 4 years as of now (2014). It's used by hardware manufactures to implement programming languages for their products as well as by software engineering companies to develop their own programs. There are two main reasons why LLVM gained such popularity:

  1. Extensible compiler infrastructure built from scratch in C++.
  2. License that allows for proprietary usage.

We're not here to debate whether second reason is good or bad, but the first one is definitely a good one. It's the first reason which GCC is missing and which is hard to get after tenth years of development. Architecture of LLVM is unique among existing compilers and is its most significant advantage.

Probably the most well known component (tool) of LLVM is Clang, which provides not only compiles languages of C-family, but also exposes its internal data structures for dealing with program being compiled and provides various means to deal with them. Later Clang has been updated with specialized facilities to process source files and even to modify them.

As it's now, Clang can be used to implement tools for processing code written in C/C++/Objective-C/Objective-C++ relatively easily. With it anyone can write tools for code processing with quite little amount of effort. The hard part is to start writing such tools as it's easy to get lost in lots of headers and libraries supplied with LLVM and Clang. Hence the main idea behind these series of articles: give one enough knowledge to be able to use Clang for writing tools. To achieve the goal, we'll look at examples of building Clang-tools from scratch. There will also be referenced materials that are recommended to read/watch/consult as using Clang and LLVM is already documented in many sources. Although there are plenty of resources on Clang, most of them have two issues:

  • there are more or less out of date (because of quite high pace of development and breaking changes from time to time);
  • some of covered examples are too artificial.

That's why we're going to build a small set of tools some of which might not be self contained, but nevertheless should be usable in real life. Process of building such tools will shed light on different parts of Clang, which will help construct a better image of what it is and in what ways it can be used.

Environment

Here are some details of my working environment to help solve possible issues with initial setup (see below).

  • GNU/Linux operating system.

  • GCC version 4.8.1

  • LLVM from 11.03.2014 built with GCC:

llvm/trunk@203604 (git mirror hash: 819af77aa3a9da84f666dc252815aec9f1cf18f5)

  • Clang from 11.03.2014 built with GCC:

cfe/trunk@203603 (git mirror hash: 640884e00a911d9a599dd8fd5dd26cdd96dfc9ea)

  • cmake version 2.8.8

  • ninja version 1.4.0

  • everything is built with GCC

Your first tool

To get started one needs to have a working build of LLVM and Clang. This is not a subject of this writing and there is already a good tutorial on this (read notes below before following instructions in the tutorial):

Tutorial for building tools using LibTooling and LibASTMatchers

It seems to be a bit out of date, so here are some additional moments to consider:

  1. Modern versions of cmake are built with ninja support, so good chances that there is no need to recompile it, check it by running:

cmake | grep Ninja

  1. I suggest enabling generation of documentation with Doxygen. For this one needs to set the following ccmake options to YES before starting the build (see a small hint on using ccmake below):
- `CLANG_INCLUDE_DOCS`
- `LLVM_BUILD_DOCS`
- `LLVM_INCLUDE_DOCS`

This way you'll be sure that you're referring to the correct version of documentation while writing your tools.

You might want to set LLVM_BUILD_DOCS to OFF once everything is built, it won't delete generated documentation, but will reduce time of compilation when build is started by ninja command with no arguments.

  1. It's said that all tests should pass. Current version of tests display messages about expected failed tests, that seems to be OK.

  2. If you decide to rebuilt Clang with itself don't forget to setup CMAKE_CXX_COMPILER as well as CMAKE_C_COMPILER (use full path to clang executable for it). Actually recompiling with Clang is optional, no need to waste time on that.

  3. In step 1 there is a command:

cat "int main() { return 0; }" > test.cpp

Which should be:

echo "int main() { return 0; }" > test.cpp

ccmake hint: use arrow keys to navigate in terminal interface, j/k doesn't work there, but searching with / works.

Once you're done with that tutorial, you should feel like knowing a lot of new stuff about it, still not being 100% sure that you able to write your own tool. It's advised to watch Introduction to the Clang AST now. Actually the order can be reversed (watch the video first), but as Manuel Klimek (the guy in the video) describes more advanced topics, it makes sense to do basic tutorial first.

Now, having some background, we can start creating our small, but useful tool.

Detecting postfix operators in for loops

Our goal is to analyse source files to find postfix increment and decrement operators in the last part of for-loop statements. The drawback of using postfix operator is that it can lead to performance penalty in some cases. As you probably know, postfix operator differs from prefix one in creating one extra object, which normally is not optimized. It's almost free for primitive types and can be free when it goes about iterators from library provided along with a compiler, but in general it's a good practice to use postfix increment/decrement operators only when you have a reason to do so.

There is already a plenty of tools that can detect misuse of postfix operators, but such warnings can get lost in output of mature static analyzers and be just skipped by a developer. This tool we'll do just that so skipping such warnings would be a stupid thing to do. This is just a good programming style to use postfix operators, so let's write a tool that ensures correctness of for-loops and learn more about Clang's AST at the same time.

By the way, we'll name the tool "for-postfix" (not to be confused with mail transfer agent). See "Additional resources" section below for a link to repository containing ready to use tool. History of commits is quite clear, so one can examine commits one by one while reading the article.

Boilerplate

Note, I go in detail in this section just to repeat what you have learnt from the tutorial, skip this part if you feel like you don't need to repeat anything.

Just as in "Tutorial for building tools using LibTooling and LibASTMatchers" referenced above, we're going to use LibTooling and ASTMatchers. There is also RecursiveASTVisitor, but it's now superseded by ASTMatchers.

To get started go to llvm/tools/clang/tools/extra/ directory and create a subdirectory called for-postfix. Navigate to new directory and create CMakeLists.txt containing:

set(LLVM_LINK_COMPONENTS support)
set(LLVM_USED_LIBS clangTooling clangBasic clangAST)

add_clang_executable(for-postfix for-postfix.cpp)
target_link_libraries(for-postfix clangTooling clangBasic clangASTMatchers)

Another file that needs to be created is for-postfix.cpp. For now let's put a dummy code there:

#include <cstdlib>

int
main(int argc, const char *argv[])
{
    return EXIT_SUCCESS;
}

Add add_subdirectory(for-postfix) line at the end of the llvm/tools/clang/tools/extra/CMakeLists.txt, so that our tool is built with LLVM/Clang.

Now I suggest you to go into your build/ directory and run ninja for-postfix just to make sure the configuration is correct. If it is, you'll find an executable build/bin/for-postfix that does nothing when it's run.

Now let's write a skeleton of the tool (I give this as it is because you should already know basics of this kind of stuff):

#include <llvm/Support/CommandLine.h>

#include <clang/ASTMatchers/ASTMatchFinder.h>

#include <clang/Tooling/CommonOptionsParser.h>
#include <clang/Tooling/Tooling.h>

using namespace clang::ast_matchers;
using namespace clang::tooling;

static llvm::cl::OptionCategory toolCategory("for-postfix options");

static llvm::cl::extrahelp commonHelp(CommonOptionsParser::HelpMessage);

int
main(int argc, const char *argv[])
{
    CommonOptionsParser optionsParser(argc, argv, toolCategory);
    ClangTool tool(optionsParser.getCompilations(),
                    optionsParser.getSourcePathList());

    MatchFinder finder;

    return tool.run(newFrontendActionFactory(&finder));
}

This should compile and link fine. By the way, notice unusual const in front of argv parameter declaration, this is needed to satisfy prototype of CommonOptionsParser constructor.

Matching builtin types

Finally we're ready to start matching things we're interested in. Our first matcher will match increment operators, both prefix and postfix:

static StatementMatcher incMatcher =
    forStmt(                          // for ([init]; [condition]; [increment])
        hasIncrement(                 // "increment" part of for-loop
            unaryOperator(            // any unary op, e.g. *, &, --
                hasOperatorName("++") // exact unary op: ++
            ).bind("op")              // bind matched unary op to "op" name
        )
    );

Above piece of code should be self-explaining, but notice where bind method is called. It can be called on objects that correspond to real AST nodes, like for-statements (result of forStmt) and operators (result of calling unaryOperator), but can't be called on something that just examines properties of AST nodes (like hasIncrement or hasOperatorName).

To proceed further we need one more entity, MatchFinder's callback handler:

class MatchHelper : public MatchFinder::MatchCallback
{
public:
    virtual void run(const MatchFinder::MatchResult &result)
    {
        using namespace clang;

        typedef UnaryOperator UnOp;

        if (const UnOp *op = result.Nodes.getNodeAs<UnOp>("op")) {
            op->dump();
            std::cout << '\n';
        }
    }
};

Also don't forget to include <iostream> before compiling.

You should be familiar with the idea behind this code: it tries to extract unary operator node named op and dumps it on the screen when succeed.

Let's create our helper and use it to add our matcher in the main function:

MatchHelper helper;

finder.addMatcher(incMatcher, &helper);

At this point running our tool over this example file:

#include <cstdlib>

inline void doNothing()
{
    // do nothing on purpose
}

int
main(void)
{
    for (int i = 0; i < 10; i++) {
        doNothing();
    }

    for (int i = 0; i < 10; ++i) {
        doNothing();
    }

    return EXIT_SUCCESS;
}

Should output both i++ and ++i like this (ignore addresses):

UnaryOperator 0x2727ed0 'int' postfix '++'
`-DeclRefExpr 0x2727ea8 'int' lvalue Var 0x2727d90 'i' 'int'

UnaryOperator 0x2728160 'int' lvalue prefix '++'
`-DeclRefExpr 0x2728138 'int' lvalue Var 0x2728020 'i' 'int'

Lets filter out the second match in our helper. Just surround dumping code with:

if (op->isPostfix()) {
    // print nodes here
}

What we have now is not capable of handling decrement operator. To fix that, simplify the matcher to remove operator name check (note name change):

static StatementMatcher builtinMatcher =
    forStmt(                          // for ([init]; [condition]; [increment])
        hasIncrement(                 // "increment" part of for-loop
            unaryOperator(            // any unary op, e.g. *, &, --

            ).bind("op")              // bind matched unary op to "op" name
        )
    );

Note that there was one condition, which is now removed, while actually one can use many conditions separated by commas. We're not removing the check completely, just moving it to the MatchHelper, where op->isPostfix() condition needs to be replaced with:

if (op->isIncrementDecrementOp() && op->isPostfix()) {
    // print nodes here
}

That's it for builtin operators.

Matching overloaded operators

This example uses postfix increment operator, but our for-postfix is not able to find it yet:

#include <cstdlib>

#include <vector>

inline void
doNothing()
{
    // do nothing on purpose
}

int
main(void)
{
    std::vector<int> v;
    typedef std::vector<int> IntVector;

    for (IntVector::const_iterator cit = v.begin(); cit != v.end(); cit++) {
        doNothing();
    }

    return EXIT_SUCCESS;
}

Our matcher could look like this:

static StatementMatcher incMatcher =
    forStmt(                       // for ([init]; [condition]; [increment])
        hasIncrement(              // "increment" part of for-loop
            operatorCallExpr(      // call of overloaded operator
                hasOverloadedOperatorName("++"), // named "++"
                argumentCountIs(2) // that requires two arguments
            ).bind("op")           // bind matched unary op to "op" name
        )
    );

This one is too specific and requires its clone to match decrement operator (couldn't find a way to implement match chooser, something like OR operator). It also has two interesting properties:

  • operatorCallExpr got two conditions that are ANDed with each other;
  • argumentCountIs is used to match postfix operator, here first argument is this and the second one is useless int to make overloading of postfix operators possible.

Let's throw away matching of operator name and do this in MatchHelper later:

static StatementMatcher opMatcher =
    forStmt(                       // for ([init]; [condition]; [increment])
        hasIncrement(              // "increment" part of for-loop
            operatorCallExpr(      // call of overloaded operator
                argumentCountIs(2) // that requires two arguments
            ).bind("op")           // bind matched unary op to "op" name
        )
    );

Now we need to update MatchHelper a bit to handle overloaded operators, which it won't match now as CallExpr doesn't subclass UnaryOperator:

typedef CXXOperatorCallExpr Call;

// ...

} else if (const Call *op = result.Nodes.getNodeAs<Call>("op")) {
    const OverloadedOperatorKind opKind = op->getOperator();
    if (opKind == OO_PlusPlus || opKind == OO_MinusMinus) {
        printOut(result, op);
    }
}

Where printOut contains two lines from above that dump operator nodes. As you can see matching specific operators is quite easy. The crucial part in this case is to use right type of node. Regular CallExpr won't let us to check for operator type, CXXOperatorCallExpr is needed instead.

Now let's register new matcher in MatchFinder:

finder.addMatcher(opMatcher, &helper);

Note that by adding multiple matchers to match finder one can match different things in one traverse through AST.

Formatting output

We're almost done. Current way of printing out nodes is not readable, it would be better to print path to source file with match and number of line. First of all we need to include more headers and add one more using directive:

#include <clang/Basic/SourceLocation.h>
#include <clang/Basic/SourceManager.h>

using namespace clang;

Then update printOut method:

FullSourceLoc fullLoc(expr->getLocStart(), *result.SourceManager);

const std::string &fileName = result.SourceManager->getFilename(fullLoc);
const unsigned int lineNum = fullLoc.getSpellingLineNumber();

std::cout << fileName
          << ":"
          << lineNum
          << ":"
          << "dangerous use of postfix operator"
          << '\n';

This snippet outputs match in the following format:

<path to source file>:<line number>:<description>

Example output:

/home/xaizek/repos/clang-llvm/build/tst4.cpp:16:dangerous use of postfix operator
/home/xaizek/repos/clang-llvm/build/tst4.cpp:24:dangerous use of postfix operator

It's alike output of Unix grep command (with -F and -n switches) and can be accepted by many other tools accepted by many other tools out there, e.g. one can store output of the tool to a file and open it in Vim to navigate through errors and edit them like this:

bin/for-postfix test.cpp -- > errors-list
vim -q errors-list

This tool can already be used in real projects to find misuse of postfix operators.

Additional resources (not referenced above)