Out of bounds check in C with clang-tidy

291 Views Asked by At

Why I can easily detect this trivial case of out of bounds array accessing (marked as OK), using clang-tidy, but I can't detect the one marked as KO?

I understand pointer decay in C, but I would suppose that if I'm passing the array as a fixed size type, it could be considered inside the function.

clang-tidy -checks=clang-analyzer main.c -- -std=c99
/* main.c code snipet */

#define CAPACITY 3

typedef float f32_t;

typedef f32_t f32v_t [ CAPACITY ];

static void out_of_bounds_not_detected ( f32v_t v );

int main ( void )
{
  f32v_t vec01 = { 0.0f, 0.0f, 0.0f };
  f32_t out_of_bounds = vec01 [ CAPACITY ]; /* detected OK */

  out_of_bounds_not_detected ( vec01 );

  return (int) out_of_bounds;
}

static void out_of_bounds_not_detected ( f32v_t v )
{
  v [ CAPACITY ] = 0.0f; /* non-detected KO */
}

I would expect to clang-tidy emit a warning like this inside the function out_of_bounds_not_detected:

warning: array index 3 is past the end of the array
2

There are 2 best solutions below

0
Scott McPeak On BEST ANSWER

As noted in the answer by alagner, the C language standard (e.g., C99 6.7.5.3p7) specifies that a parameter specified with array type is "adjusted" to a pointer type. However, the standard only requires that the argument supply "at least as many elements as specified by the size expression" (emphasis mine).

Consequently, the code:

void not_detected(int arr[3])
{
  arr[3] = 0;
}

does not necessarily violate the language requirements, since a caller could pass an array with four elements. That said, I think it would still be nice for a static analysis tool to report this case, since doing that deliberately would be very poor form at best.

A clear violation would be something like:

void what_about(void)
{
  int arr[2];
  not_detected(arr);   // same 'not_detected' as above
}

but clang-tidy also does not report this case.

The reason is that clang does not record the array parameter size information in its AST, so clang-tidy cannot see it. To observe this, we can dump the AST like so:

$ cat test.c
// test.c
// https://stackoverflow.com/questions/76642174/out-of-bounds-check-in-c-with-clang-tidy

void detected(void)
{
  int arr[3];
  arr[3] = 0;
}

void not_detected(int arr[3])
{
  arr[3] = 0;
}

void what_about(void)
{
  int arr[2];
  not_detected(arr);
}

// EOF

$ clang -Xclang -ast-dump -fsyntax-only test.c
test.c:7:3: warning: array index 3 is past the end of the array (that has type 'int[3]') [-Warray-bounds]
  arr[3] = 0;
  ^   ~
test.c:6:3: note: array 'arr' declared here
  int arr[3];
  ^
TranslationUnitDecl 0x555ab1e132b8 <<invalid sloc>> <invalid sloc>
|-TypedefDecl 0x555ab1e13ae0 <<invalid sloc>> <invalid sloc> implicit __int128_t '__int128'
| `-BuiltinType 0x555ab1e13880 '__int128'
|-TypedefDecl 0x555ab1e13b50 <<invalid sloc>> <invalid sloc> implicit __uint128_t 'unsigned __int128'
| `-BuiltinType 0x555ab1e138a0 'unsigned __int128'
|-TypedefDecl 0x555ab1e13e58 <<invalid sloc>> <invalid sloc> implicit __NSConstantString 'struct __NSConstantString_tag'
| `-RecordType 0x555ab1e13c30 'struct __NSConstantString_tag'
|   `-Record 0x555ab1e13ba8 '__NSConstantString_tag'
|-TypedefDecl 0x555ab1e13ef0 <<invalid sloc>> <invalid sloc> implicit __builtin_ms_va_list 'char *'
| `-PointerType 0x555ab1e13eb0 'char *'
|   `-BuiltinType 0x555ab1e13360 'char'
|-TypedefDecl 0x555ab1e141e8 <<invalid sloc>> <invalid sloc> implicit __builtin_va_list 'struct __va_list_tag[1]'
| `-ConstantArrayType 0x555ab1e14190 'struct __va_list_tag[1]' 1 
|   `-RecordType 0x555ab1e13fd0 'struct __va_list_tag'
|     `-Record 0x555ab1e13f48 '__va_list_tag'
|-FunctionDecl 0x555ab1e6f668 <test.c:4:1, line:8:1> line:4:6 detected 'void (void)'
| `-CompoundStmt 0x555ab1e6f960 <line:5:1, line:8:1>
|   |-DeclStmt 0x555ab1e6f868 <line:6:3, col:13>
|   | `-VarDecl 0x555ab1e6f800 <col:3, col:12> col:7 used arr 'int[3]'
|   `-BinaryOperator 0x555ab1e6f940 <line:7:3, col:12> 'int' '='
|     |-ArraySubscriptExpr 0x555ab1e6f900 <col:3, col:8> 'int' lvalue
|     | |-ImplicitCastExpr 0x555ab1e6f8e8 <col:3> 'int *' <ArrayToPointerDecay>
|     | | `-DeclRefExpr 0x555ab1e6f880 <col:3> 'int[3]' lvalue Var 0x555ab1e6f800 'arr' 'int[3]'
|     | `-IntegerLiteral 0x555ab1e6f8a0 <col:7> 'int' 3
|     `-IntegerLiteral 0x555ab1e6f920 <col:12> 'int' 0
|-FunctionDecl 0x555ab1e6ff08 <line:10:1, line:13:1> line:10:6 used not_detected 'void (int *)'
| |-ParmVarDecl 0x555ab1e6fe10 <col:19, col:28> col:23 used arr 'int *':'int *'
| `-CompoundStmt 0x555ab1e70070 <line:11:1, line:13:1>
|   `-BinaryOperator 0x555ab1e70050 <line:12:3, col:12> 'int' '='
|     |-ArraySubscriptExpr 0x555ab1e70010 <col:3, col:8> 'int' lvalue
|     | |-ImplicitCastExpr 0x555ab1e6fff8 <col:3> 'int *':'int *' <LValueToRValue>
|     | | `-DeclRefExpr 0x555ab1e6ffb8 <col:3> 'int *':'int *' lvalue ParmVar 0x555ab1e6fe10 'arr' 'int *':'int *'
|     | `-IntegerLiteral 0x555ab1e6ffd8 <col:7> 'int' 3
|     `-IntegerLiteral 0x555ab1e70030 <col:12> 'int' 0
`-FunctionDecl 0x555ab1e70120 <line:15:1, line:19:1> line:15:6 what_about 'void (void)'
  `-CompoundStmt 0x555ab1e703e0 <line:16:1, line:19:1>
    |-DeclStmt 0x555ab1e702d8 <line:17:3, col:13>
    | `-VarDecl 0x555ab1e70270 <col:3, col:12> col:7 used arr 'int[2]'
    `-CallExpr 0x555ab1e703a0 <line:18:3, col:19> 'void'
      |-ImplicitCastExpr 0x555ab1e70388 <col:3> 'void (*)(int *)' <FunctionToPointerDecay>
      | `-DeclRefExpr 0x555ab1e702f0 <col:3> 'void (int *)' Function 0x555ab1e6ff08 'not_detected' 'void (int *)'
      `-ImplicitCastExpr 0x555ab1e703c8 <col:16> 'int *' <ArrayToPointerDecay>
        `-DeclRefExpr 0x555ab1e70310 <col:16> 'int[2]' lvalue Var 0x555ab1e70270 'arr' 'int[2]'
1 warning generated.

The key line of the dump is:

|-FunctionDecl 0x555ab1e6ff08 <line:10:1, line:13:1> line:10:6 used not_detected 'void (int *)'

which shows the recorded function type as void (int *). This is all clang-tidy can see, so it cannot report either case.

(Well, in this case it can see the definitions of what_about and not_detected in the same file, which together comprise a clear violation, but evidently its interprocedural analysis is insufficient to notice the problem that way, which is a little disappointing.)

Out of curiosity, and to confirm it isn't stored but omitted in the dump, I decided to look for where in clang this information is dropped. The parser itself, in Parser::ParseBracketDeclarator(), keeps the array and its size. But the semantic analyzer, in Sema::CheckParameter(), adjusts the type (changing the array to a pointer), retaining only the adjusted type in the resulting ParmVarDecl AST object:

  ParmVarDecl *New = ParmVarDecl::Create(Context, DC, StartLoc, NameLoc, Name,
                                         Context.getAdjustedParameterType(T),
                                         TSInfo, SC, nullptr);

The variable T holds the type as computed by the parser (an array, for the case of an array parameter), but getAdjustedParameterType() returns a pointer type, and the original T and its associated size are not retained anywhere.

Consequently, for clang-tidy to report the original case, some changes would be required to the clang parser.

4
alagner On

From the standard:

A declaration of a parameter as ‘‘array of type’’ shall be adjusted to ‘‘qualified pointer to type’’, where the type qualifiers (if any) are those specified within the [ and ] of the array type derivation. If the keyword static also appears within the [ and ] of the array type derivation, then for each call to the function, the value of the corresponding actual argument shall provide access to the first element of an array with at least as many elements as specified by the size expression.

void f(int x[]); and void f(int* x); are totally the same from language standard's standpoint. Therefore the function taking array is just taking the pointer. Why would a linter care about pointer bounds, if technically there are none?