Tokenization is essential in determining what a programme does.
What Bjarne is referring to in respect to C++ code is how tokenization rules alter the meaning of a programme.
We need to know what the tokens are and how they are determined.
Specifically, how can we recognise a single token when it comes among other characters, and how should tokens be delimited if there is ambiguity?
Consider the prefix operators ++ and +, for example. Assume we have just one token + to deal with.
What does the following excerpt mean?
int i = 1;
Is the above going to apply unary + on i twice with + only? Or will it only increase it once? Naturally, it's vague.
We require an additional token, thus ++ is introduced as its own "word" in the language.
But there is now another (though minor) issue.
What if the programmer just wants to use unary + twice without incrementing?
Rules for token processing are required.
So, if we discover that a white space is always used as a token separator, our programmer may write:
int i = 1;
A C++ implementation begins with a file full of characters, converts them to a series of tokens ("words" with meaning in the C++ language), and then tests to see if the tokens occur in a "sentence" with some valid meaning.