strtok() sometimes(??) causing stack smashing?

132 Views Asked by At

Using Kubuntu 22.04 LTS, Kate v22.04.3, and gcc v11.3.0, I have developed a small program to investigate the use of strtok() for tokenising strings, which is shown below.

#include <stdio.h>
#include <string.h>

int main(void)
{
   char inString[] = "";         // string read in from keyboard.
   char * token    = "";         // A word (token) from the input string.
   char delimiters[] = " ,";     // Items that separate words (tokens).
   
   // explain nature of program.
   printf("This program reads in a string from the keyboard"
          "\nand breaks it into separate words (tokens) which"
          "\nare then output one token per line.\n");
   printf("\nEnter a string: ");
   scanf("%s", inString);
   
   /* get the first token */
   token = strtok(inString, delimiters);
   
   /* Walk through other tokens. */
   while (token != NULL)
   {
      printf("%s", token);
      printf("\n");
      
      // Get next token.
      token = strtok(NULL, delimiters);
   }
   return 0;
}

From the various web pages that I have viewed, it would seem that I have formatted the strtok() function call correctly. On the first run, the program produces the following output.

$ ./ex6_2
This program reads in a string from the keyboard
and breaks it into separate words (tokens) which
are then output one token per line.

Enter a string: fred ,  steve ,   nick
f
ed

On the second run, it produced the following output.

$ ./ex6_2
This program reads in a string from the keyboard
and brakes it into separate words (tokens) which
are then output one token per line.

Enter a string: steve ,  barney ,   nick
s
eve
*** stack smashing detected ***: terminated
Aborted (core dumped)

Subsequent runs showed that the program sort of ran, as in the first case above, if the first word/token contained only four characters. However, if the first word/token contained five or more characters then stack smashing occurred.

Given that "char *" is used to access the tokens, why :-

a) is the first token (in each case) split at the second character ?

b) are the subsequent tokens (in each case) not output ?

c) does a first word/token of greater than four characters cause stack smashing?

Stuart

2

There are 2 best solutions below

3
Andreas Wenzel On BEST ANSWER

The declaration

char inString[] = "";

is equivalent to:

char inString[1] = "";

This means that you are allocating an array of only a single element, so it only has space for storing a single character.

The function call

scanf("%s", inString);

requires that the function argument inString points to a memory buffer that is sufficiently large to store the matched input. Your program is violating this requirement, as the memory buffer has only space for a single character (the terminating null character). It can therefore only store strings with a length of zero.

By violating the requirement, your program is invoking undefined behavior, which means that anything can happen, including the strange behavior that you observed. The function scanf is probably overflowing the buffer inString, overwriting other important data on your program's stack, causing it to misbehave. This is called "stack smashing".

To fix this, you should give the array inString more space, for example by changing the line

char inString[] = "";

to:

char inString[200] = "";

However, in that case, if the user enters more than 200 characters of input as a single word, then you will have the same problem again and your program may crash. Therefore, you may want to additionally limit the number of characters matched by scanf to 199 characters (200 including the terminating null character). That way, you can ensure that the user will not be able to crash your program.

You can add such a limit like this:

scanf("%199s", inString);

Note, however, that the %s specifier will only match a single word. If you want to read an entire line of input, you may want to use the function fgets instead of scanf.

4
Vlad from Moscow On

This declaration of a character array

char inString[] = "";   

is equivalent to

char inString[1] = { '\0' };; 

That is it declares an array with only one element that is able to store only an empty string. So any attempt to read a string in this character array using this call of scanf

scanf("%s", inString);

invokes undefined behavior.

You need to specify the number of elements much more greater. For example

enum { N = 100 };
char inString[N] = "";   

This initialization of a pointer

char * token    = "";

does not make a great sense. It is better to write for example

char * token = NULL;

This call of scanf

scanf("%s", inString);

can read only one word that is a sequence of characters separated by white space characters.

Instead write for example

scanf( " %99[^\n]", inString);

It makes sense to include the tab character '\t' in the list of delimiters

const char *delimiters = " \t,";

Instead of these calls of printf

  printf("%s", token);
  printf("\n");

it will be simpler to write

puts( token );