Return to

Lua Lexer/Compiler/Parser Preview App

Hello L1T!

I have a project that I thought I might as well show off. It was started long before Devember so I won’t add it as a submission, but since we are all in the programming mood I thought I might join in on the fun.

Although I am not submitting my little script parser for Devember, I do intend to submit a little web app. This app will allow you to write and share little Lua scripts, and also display (in as close to real-time as possible) the generated AST and Lexer tokens.

In short, because I want the judging to be fair:

  • Script Parser === Out Of Scope because of the head start.
  • Web App === In Scope.

With this in mind, my main project is still the actual parser so a lot of my updates will likely center around that. With that said, let’s talk more about that project and y’all can judge me later.


My motivation is three-fold.

  1. I wanted to get a much deeper understanding of how programming languages actually worked at a much lower level. Not quite as low level as Ben Eaters 8-bit Computer (seriously go check it out), but I wanted to see how one would go about developing a programming language itself.
  2. I am working on a different project where I would like the end-users to do some automation scripting in a super-sandboxed way. Lua is pretty straightforward and seemed like it should be easy enough for me to tackle. I realize there are existing Lua parsers but I wanted to learn dangit.
  3. I was very, very bored at my previous job.

Brief Overview

All right let’s get into it. This blog post already feels like a recipe with too much backstory, just give me that code darn it!

Patience; The journey is the point. I have a lot more of this blog written, but I figured I should start with posting the introduction and go from there. In the meantime, you are welcome to check out the git repository here.

The Lexer

The script generated is passed through a Lexer . This is a script that breaks down into logical Tokens . Your programming language will have a dictionary of tokens, which will have associated metadata. Basically, you are describing “what is this element, and what does it mean?” You are quite literally writing a dictionary for the entire language here.

Here is an example. Say your script was just literally hello = 'Hello World'. The lexer would generate the following tokens:

IDENT, hello
STRING, 'Hello World!'

So here we can see that we have a naked string “hello”, then an assignment “=”, and a string “Hello World!” at the bottom. Also, note that each token can have one or more characters.

Logically since some tokens have values they must mean something. This is where I opted to represent basic types. There is a different type of token for each of Boolean, Lambda, Nil, Number, String, and Table. Some of these imply recursion, we might come back to this.

Don’t be thinking too far ahead now, we still don’t know what to do with these tokens. All we have done is turn this complicated series of characters into a horribly verbose mess. We are working backward!
Well, not really. What we have done is define the language we can work with. The language itself does not mean anything, but the important thing is now that we have a language we can express it.

The Abstract Syntax Tree

Let’s just call it AST for short, ok?

Think back on what we are trying to do. We want a programming language where we can describe instructions on how to perform a task. If you missed the importance of the middle part in that last sentence I won’t hold it against you, it went by quickly.

"describe instructions"

As in, multiple instructions. What do we do now? Well, we start the process of writing instructions.

What is an instruction? Well logically if you want to give an instruction, you might want a certain value to be return ed. You might also want the instruction to be performed, but the response could be irrelevant and void. Hey, look, instructions!

The job of the AST compiler is to take our dictionary and form an idea from a sequence of tokens. We now get to describe the possible ideas. It seems to me that in a broad stroke there are two types of AST-Tokens: Expressions and Statements.
An Expression is something that means something, it represents “something.” A statement will act in control flow, allowing for different actions to be taken depending on an expression’s value, but themselves are void.

For example, things like a function definition, a function application, variables, variable offsets, etc are all expressions. Things like conditionals and loops would be statements.

So now we have a system that will use an application of a dictionary to input words, a system that takes those words and turns them into ideas, it feels like we are close!

The Parser

The parser is where all the real magic happens. To extend the metaphors above, the parser turns ideas into actions. It does its work by iterating over the AST nodes and “executing” them in order.

There are a few intermediate steps that I am going to gloss over. In short, while the AST compiler is stateful recursively within (and not between) nodes, the Parser needs to be stateful between nodes. Managing this state is delegated to an external Environment class, that holds all the defined variables during execution flow.

During node iteration, there is a check to see what type of node we are working with. If it is a “Return Expression” then return the result of the expression, if it is an assignment then write a variable to the environment, and so on.
The one special case is if it is a statement. Remember that statements are void, but direct execution flow. The job of a statement is that when parsed, it yields additional AST nodes to be executed in the current context. For example, a “while” statement will continue to yield its body until its condition turns false.

Do this enough and you now have the basics of a programming language. At this point everything is pretty mathematically pure, there is no way to really get information in or out and interact with the real world. That will all have to come in a future update!

Web application status update:

Pretty bare-bones. The backend for this is open-sourced so you can see the submission.

I am not hosting this anywhere publicly yet just because I don’t have much of anything useful done yet. I want to get the database layer finished before publishing.

Web application status update:

You can now preview the runtime, result, and stdout from your input.

maybe … or Lex LuaTHere ?

(sorry could not help myself - nice project BTW)

The site is up and running.

Feel free to poke at it. If you can break it in any creative way definitely report back with your findings. Unfortunately, the available API for languages is pretty slim for now.

Available expression types:

  • Function Applications
  • Variable Assignment
  • Function Expression
  • Literal Expression
  • Offsets (for tables)
  • Operators (math, equality, comparison)
  • Parenthesis
  • Returns
  • Variables

Available Statement Types

  • Conditionals (ifs)
  • For Loops for i = 10, 1, -1 do ... end
  • While Loops

“Standard Library”

  • abs() → returns absolute value
  • max() → returns the maximum of two inputs
  • min() → The opposite of max()
  • print() → All inputs are sent to script output