TOC
- Detecting postfix operators in for loops
- Detecting unused functions (you are here)
- Detecting wrong first include
Introduction
This memo provides description of a tool that scans a set of source files looking for two types of functions declared with external linkage:
Those that are defined, but completely unused.
Those that are used only inside same translation unit they are defined in.
As a result we want to get:
- Diagnostics that functions of the first type are unused, which means that one
of the following holds:
- a function can and needs to be removed;
- there are duplicated pieces of code;
- another function is used by mistake.
- Diagnostics that functions of the second type can be marked as static (in
case the code is not part of a library; going to ignore this as it can be
added later), as they are probably:
- just not marked as static, which can be done harmless;
- are not used anymore from outside translation unit they reside in;
- have wrong name in public header.
Not all static analyzers provide such diagnostics and those that do do not advise to mark functions that are used only inside one translation unit with static rather than external linkage. As existing tools don't do exactly this, the "programmers way" of fixing this is to write our own tool. Actually, much better option is too implement this diagnostic in cppcheck and send patch upstream, but we're mostly interested in learning more about AST of Clang and want to use it to solve the task. Still nothing stops one from contributing to cppcheck.
For simplicity, we're going to do this for C rather that C++ to omit dealing with namespaces and methods. This allows us to concentrate on new stuff about Clang representation of source, leaving extending the tool to cover more use cases out of the scope.
This article also contains less sources, only the most interesting excerpts from the code are presented.
Matching
To do something useful we need to find elements of AST we're interested in first. This time they are:
- Function declarations.
- Function calls.
- Getting address of a function.
Matchers used for the first item is very simple and its name is easy to find in documentation/headers with AST matchers/or even just to guess it:
static DeclarationMatcher funcDecl = functionDecl().bind("func");
The last two items we want to be able to find require additional investigation.
Lets make it simple by asking clang to dump AST of the following simple code to
the screen (file named func-ptr.c
):
void
func(void)
{
}
int
main(void)
{
void (*f)(void) = &func;
f();
return 0;
}
Using this command:
clang -Xclang -ast-dump -fsyntax-only func-ptr.c
Here is full output:
TranslationUnitDecl 0x2e86560 <<invalid sloc>>
|-TypedefDecl 0x2e86a60 <<invalid sloc>> __int128_t '__int128'
|-TypedefDecl 0x2e86ac0 <<invalid sloc>> __uint128_t 'unsigned __int128'
|-TypedefDecl 0x2e86e10 <<invalid sloc>> __builtin_va_list '__va_list_tag [1]'
|-FunctionDecl 0x2e86f20 <func-ptr.c:1:1, line:4:1> func 'void (void)'
| `-CompoundStmt 0x2e86fc0 <line:3:1, line:4:1>
`-FunctionDecl 0x2e870a0 <line:6:1, line:12:1> main 'int (void)'
`-CompoundStmt 0x2ece4a0 <line:8:1, line:12:1>
|-DeclStmt 0x2ece3e0 <line:9:5, col:28>
| `-VarDecl 0x2ece340 <col:5, col:24> f 'void (*)(void)'
| `-UnaryOperator 0x2ece3c0 <col:23, col:24> 'void (*)(void)' prefix '&'
| `-DeclRefExpr 0x2ece398 <col:24> 'void (void)' Function 0x2e86f20 'func' 'void (void)'
|-CallExpr 0x2ece438 <line:10:5, col:7> 'void'
| `-ImplicitCastExpr 0x2ece420 <col:5> 'void (*)(void)' <LValueToRValue>
| `-DeclRefExpr 0x2ece3f8 <col:5> 'void (*)(void)' lvalue Var 0x2ece340 'f' 'void (*)(void)'
`-ReturnStmt 0x2ece480 <line:11:5, col:12>
`-IntegerLiteral 0x2ece460 <col:12> 'int' 0
Look at this part:
|-DeclStmt 0x2ece3e0 <line:9:5, col:28>
| `-VarDecl 0x2ece340 <col:5, col:24> f 'void (*)(void)'
| `-UnaryOperator 0x2ece3c0 <col:23, col:24> 'void (*)(void)' prefix '&'
| `-DeclRefExpr 0x2ece398 <col:24> 'void (void)' Function 0x2e86f20 'func' 'void (void)'
which corresponds to obtaining address of the function:
void (*f)(void) = &func;
Lets construct AST matcher for it:
static StatementMatcher funcAddrOp =
unaryOperator( // any unary operator, e.g. *, &, --
hasOperatorName("&"), // exact unary operator: &
declRefExpr( // referencing a variable/declaration
to( // something that is ...
functionDecl( // ... a function
).bind("ref") // bind matched function ref to "ref" name
)
)
).bind("op"); // bind matched unary op to "op" name
Looks like a nice matcher, but we're not going to use it. The reason is that
address of a function can be taken by implicit cast if one removes &
in front
of function name. That's why it makes sense to use a simpler and more general
matcher, which is just an inner part of the one listed above:
static StatementMatcher funcRef =
declRefExpr( // referencing a variable/declaration
to( // something that is ...
functionDecl( // ... a function
)
)
).bind("ref"); // bind matched function ref to "ref" name
This effectively matches the leaf node:
| `-DeclRefExpr 0x2ece398 <col:24> 'void (void)' Function 0x2e86f20 'func' 'void (void)'
As it's similar to the leaf node of a call expression, we're getting all referencing cases we want with only one matcher.
Note that funcDecl
is of type DeclarationMatcher
rather then usual
StatementMatcher
. This is because each of core components of AST have its
own hierarchies with different root objects, which means that such elements
must be matched using different types of matchers.
Filtering
One might ask: how would we get function definition if we're only looking for declarations of functions? It's easy to understand if recall that every definition is also a declaration. So there is no such thing as function definition in Clang's AST, there are declarations with bodies instead. To check for body, use isThisDeclarationADefinition() method. There are also methods that check whether given function has body at all, don't confuse them with the method we actually need.
On each match of a function declaration we want to make sure that function is visible outside current module as we're not interested in static functions. This can be done with the help of isExternallyVisible() method.
If you think of checking general programs, the first external function that
comes to mind is probably the main()
function. We don't want to mark it as
unused, so filter it out by invoking handy isMain() method.
Match of funcRef
matcher gives us result of type DeclRefExpr
which we need to resolve to function declaration it's referring to. This is
done by the following code:
if (const FunctionDecl *func = ref->getDecl()->getAsFunction()) {
// ...
}
Here getDecl() returns ValueDecl which corresponds to
a variable, function or enumeration constant definition. Then we query
obtained ValueDecl
object whether it can be converted to a function and get
it as a function if the answer is yes. The check of return value is needed
even if "it's definitely a function" because a node can return 0
in case of
parsing errors (say, the code is correct, but some headers are missing).
Counting
Counting functions and references to them is more tricky than finding ones for the following reasons:
Functions can be declared in any number of modules or can be declared multiple times in one translation unit.
Function can be referred to before it's defined.
Third-party and system functions are matched as well.
As we want to get same results while scanning one file at a time in any order, bullets listed above should be treated carefully.
The implementation addresses items listed above in the following way:
Functions are stored in a map indexed by their names. There will be no name conflicts as we match only external functions and there are no overloaded functions in C.
Each function information object stores list of references.
Each function declaration and reference is associated with name of a file it resides in used to check whether function is ever referenced outside translation unit it is defined in.
Printing
To print exact position of something SourceLocation class lacks
connection with actual source code. That's why FullSourceLoc
needs to be constructed from an instance of SourceLocation
and a reference to
SourceManager. Here's how retrieval of source file name and
line number can look like:
FullSourceLoc fullLoc(func->getNameInfo().getBeginLoc(), *sm);
const std::string &fileName = sm->getFilename(fullLoc);
const unsigned int lineNum = fullLoc.getSpellingLineNumber();
Note getNameInfo().getBeginLoc()
part. Getting location by calling
getLocation()
directly on an object of type FunctionDecl
will return
location of return type of the function. To be more programmer-friendly
we want to guide one directly to function name, which is more convenient in my
opinion. If it's still unclear why, here are two samples:
void func1(void) // <- getLocation() <- getNameInfo().getBeginLoc()
...
void // <- getLocation()
func2(void) // <- getNameInfo().getBeginLoc()
...
Blocking diagnostics output
One somewhat annoying in our use case thing about Clang is that by default it
prints diagnostics on code being analyzed. We want to suppress such diagnostic
messages to leave only our own. The correct way of doing this is to call
DiagnosticsEngine::setSuppressAllDiagnostics, but it's not clear how to
get instance of DiagnosticsEngine
used by tool while it builds ASTs. So we
go another way and subclass DiagnosticConsumer to override its
IncludeInDiagnosticCounts
method and make it return false
:
class : public DiagnosticConsumer
{
public:
virtual bool
IncludeInDiagnosticCounts() const
{
return false;
}
} diagConsumer;
tool.setDiagnosticConsumer(&diagConsumer);
This way such diagnostics are not counted as relevant when Clang tries to present parsing results to a user.
Testing
Assuming that you have successfully built the tool from the repository
lets give run it over a simple test files. The first file (main.c
) looks
like this:
static void firstStatic(void);
static void secondStatic(void);
void firstExtern(void);
extern void secondExtern(void);
static void firstStatic(void) { }
static void secondStatic(void) { }
void firstExtern(void) { }
void secondExtern(void) { }
int
main(void)
{
firstExtern();
secondStatic();
return 0;
}
Lets check output when the first file is analyzed alone (paths are truncated):
> unused-funcs main.c --
.../main.c:20:firstExtern:can be made static
.../main.c:25:secondExtern:unused
Let's go over the file manually to check whether obtained output is correct:
main()
is treated separately.firstStatic()
andsecondStatic()
are both ignored because they are marked asstatic
.firstExtern()
declared as notstatic
and isn't used.secondExtern()
declared asextern
and used only within the same translation unit.
Looks good.
Now add the second file (util.c
):
extern void firstExtern(void);
void secondExtern(void);
void
thirdExtern(void)
{
firstExtern();
secondExtern();
}
And see what's changed in the output (paths are truncated):
> unused-funcs main.c util.c --
.../util.c:5:thirdExtern:unused
Expected changes are as follows:
- Both
firstExtern()
andsecondExtern()
are not used outside their home module, so no diagnostics should mention them. - New unused
extern
function (thirdExtern()
) was introduced.
Looks correct too.
By the way, here's the output for func-ptr.c
test file from the "Matching"
section above:
> unused-funcs func-ptr.c --
.../func-ptr.c:2:func:can be made static
Conclusion
As you've been warned, this is more C-related implementation than C++, but such limitation allowed for concise description and ready-to-use state after putting not that much effort in the implementation.
The resultant tool can be adjusted in multiple ways by changing matching/counting/output parts independently:
- getting list of all external functions references from the code;
- building graph description of cross-module dependencies to be rendered by Graphviz (fine-grain version could list exact functions used);
- collecting statistics like ratio of provided
extern
function vs. number of usedextern
functions, or number of external usages for each function marked asextern
; - previous bullet combined with some thresholds can be used to detect translation units with low cohesion/high coupling;
- etc.
Note that as Clang takes macros into account the tool can produce not accurate results if conditional compilation is used. Precisely, it analyzes some particular combination of defines and ignores all other. That's why it's better to check updated code against combinations of macro defines or at least remember about them. This is especially important for cross-platform applications or programs that allow to disable some of their features at compile-time.