what is a token , what is a lexeme

357 Views Asked by At

i've been watching a video on how compilers break down code and converts them into tokens , i came across this part which kind of means that the lexer actually figures out lexemes , which are closer to the words of a sentence and tokens are kind of like verbs . can anybody just explain the heirarchy and what sits where, if token is like a verb are there types of tokens and lexemes are just examples of these token types, like run is an example of a verb
the video

1

There are 1 best solutions below

0
mochaccino On

Here we have our input:

The quick brown fox jumps over the lazy dog.

Let's get the lexemes as a list now:

["The", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog", "."]

Please remember that these are lexemes, meaningless strings of characters. "The", although in English, has a meaning, in our example, it currently conveys no meaning or information. It is simply "T", then "h", followed by "e". Nothing else.

The next step is to look at these lexemes and give each of them a meaning, creating tokens. Often, the meaning changes depending on the given context.

So for "quick", we could say it is:

Token
    type: "ADJECTIVE"
    lexeme: "quick"

Notice that we have now associated a meaning to this lexeme, its type, which is an adjective. Here is what a "fox" could be:

Token
    type: "NOUN"
    lexeme: "fox"

However, tokens are not useful by themselves. We need a meaningful structure from the list of tokens. That's what the next step is usually for: parsing.

In our example, a parser might turn our input into the following:

Sentence
    type: "SIMPLE"
    subject: Subject
        adjectives: [
            Token
                type: "ADJECTIVE"
                lexeme: "quick"
            Token
                type: "ADJECTIVE"
                lexeme: "brown"
        ]
        noun: [ ... ]
    ...

Now we've got a useful data structure from a 1 dimensional input! Fascinating! It's important to note that tokens can contain more information than just a type and the lexeme. They could, for example, also contain the line and column the lexeme was found.


You also may not realize it, but this is how you read! You are always reading in the list of lexemes, giving them meanings, then creating a structure from that and interpreting what the meaning of the entire sentence was.