The Lexer
The script generated is passed through a Lexer . This is a script that breaks down into logical Tokens . Your programming language will have a dictionary of tokens, which will have associated metadata. Basically, you are describing “what is this element, and what does it mean?” You are quite literally writing a dictionary for the entire language here.
Here is an example. Say your script was just literally hello = 'Hello World'
. The lexer would generate the following tokens:
IDENT, hello
ASSIGN, =
STRING, 'Hello World!'
So here we can see that we have a naked string “hello”, then an assignment “=”, and a string “Hello World!” at the bottom. Also, note that each token can have one or more characters.
Logically since some tokens have values they must mean something. This is where I opted to represent basic types. There is a different type of token for each of Boolean, Lambda, Nil, Number, String, and Table. Some of these imply recursion, we might come back to this.
Don’t be thinking too far ahead now, we still don’t know what to do with these tokens. All we have done is turn this complicated series of characters into a horribly verbose mess. We are working backward!
Well, not really. What we have done is define the language we can work with. The language itself does not mean anything, but the important thing is now that we have a language we can express it.
The Abstract Syntax Tree
Let’s just call it AST for short, ok?
Think back on what we are trying to do. We want a programming language where we can describe instructions on how to perform a task. If you missed the importance of the middle part in that last sentence I won’t hold it against you, it went by quickly.
"describe instructions"
As in, multiple instructions. What do we do now? Well, we start the process of writing instructions.
What is an instruction? Well logically if you want to give an instruction, you might want a certain value to be return ed. You might also want the instruction to be performed, but the response could be irrelevant and void. Hey, look, instructions!
The job of the AST compiler is to take our dictionary and form an idea from a sequence of tokens. We now get to describe the possible ideas. It seems to me that in a broad stroke there are two types of AST-Tokens: Expressions and Statements.
An Expression is something that means something, it represents “something.” A statement will act in control flow, allowing for different actions to be taken depending on an expression’s value, but themselves are void.
For example, things like a function definition, a function application, variables, variable offsets, etc are all expressions. Things like conditionals and loops would be statements.
So now we have a system that will use an application of a dictionary to input words, a system that takes those words and turns them into ideas, it feels like we are close!
The Parser
The parser is where all the real magic happens. To extend the metaphors above, the parser turns ideas into actions. It does its work by iterating over the AST nodes and “executing” them in order.
There are a few intermediate steps that I am going to gloss over. In short, while the AST compiler is stateful recursively within (and not between) nodes, the Parser needs to be stateful between nodes. Managing this state is delegated to an external Environment class, that holds all the defined variables during execution flow.
During node iteration, there is a check to see what type of node we are working with. If it is a “Return Expression” then return the result of the expression, if it is an assignment then write a variable to the environment, and so on.
The one special case is if it is a statement. Remember that statements are void, but direct execution flow. The job of a statement is that when parsed, it yields additional AST nodes to be executed in the current context. For example, a “while” statement will continue to yield its body until its condition turns false.
Do this enough and you now have the basics of a programming language. At this point everything is pretty mathematically pure, there is no way to really get information in or out and interact with the real world. That will all have to come in a future update!