Implementation of a Programming Language

I've been thinking about how cool it would be to implement a programming language from scratch for about a year! After a long time, I decided to leave all other projects at least for a while (even a startup company)...

Specifying the Desired Syntax

First of all, maybe I should talk about the name of the language! This is the timeline of the chosen names in order:

Ammepasand (in Farsi, it means something that everyone likes)
Amme (in Farsi, it means all people)
Farr (in Farsi, it means glory)

My initial idea was to create a language heavily inspired by Bash; a language with understandable and simple codes (I'm still not sure if Bash is a simple language?!) and can be used in the terminal for many tasks (such as manipulating strings and files, automating processes, etc.)...

And this was the chosen grammar for the Ammepsand language:

In the middle of the night, I said to myself, maybe I should make the syntax more similar to the C family... But after about 15 minutes, I saw that we were far away from Bash! After that, I decided to change the name of the language to Amme along with the syntax.

This was the calculator program I wrote in the fictional Amme language:

It didn't take long for me to decide that we should have some rare features in the language we were going to implement:

Polish prefix notation for mathematical expressions
Starting indexes from one
Member-based structs

Implementation

All parts cannot be explained in this post, I will only mention the challenges and solutions that came to my mind during that period of time.

Chopping the Code

For many people, the first solution that comes to mind for tokenization and even parsing is to use ANTLR, Bison, and other similar tools... I am not saying that it is a bad idea, but we are supposed to understand the process of creating a programming language from scratch; not just the implementation of several node visitors!

Instead of reading character by character and basic labeling, I went to use regular expressions... I really liked to implement the project as much as possible so that others can easily manipulate it; and this was the main reason for using regular expressions in this phase!

A little further, you will understand what GroupedTokens is for!? And that I know that for a data class that all fields have a specific parameter, it can be defined in the decorator itself (e.g., @dataclass(frozen=True, kw_only=True))...

Our scanner logic:

In the constructor, we create a pattern based on the groups defined by the developer(s) to break the code into smaller parts. The tokenize method is probably not that difficult to understand; but I must mention that it is much better to use generators instead of returning a list at the end of the function...

Here we define our own language tokens beautifully:

Generating Syntax Trees

This part is very, very long, so I will only take small parts of it...

This is the base parser we use with a very simple logic:

We have over 65 nodes in Farr, but I'll only include 30% of them here to give you an idea of how they're structured and inherited:

A not so interesting solution that can be used to create the nodes needed to build a syntax tree is to reduce them to one; with fields such as node type, and its children... There are definitely different ideas for different phases; for example, another simple solution that can be used to provide nodes is to create general groups for them, such as literals, loops, etc.

Let's go to parse three literals (_parse_integer, _parse_string and _parse_identifier), a term (_parse_call) and a statement (_parse_function):

As a programmer and maintainer of this language, I have to admit that Farr could have been smarter in parsing! I mean more in the part of combining terms (_process_expression)...

Code Execution

Oh, we have to go to the back-end part of our project! This part is both interesting and scary! Very interesting and very scary!

Preparing the environment for managing variables, functions, structs and methods; and the not so good base of the interpreter:

This section can be very difficult to understand without getting into the code, especially for chained handle expressions (_process_chain_target) and calls (_populate_params)...

A half of the main code of the Farr language interpreter is as follows:

I can say that my biggest problem in implementing the interpreter was my inability to understand the logic of matching arguments with parameters! I was involved in the implementation of the interpreter for about a week...

Some objects that the language provides as its native:

Of course, it is not necessary in this section to not divide the objects into two groups of expression and statement (it wasn't even necessary in the parser phase, but it was done for simplicity)...

Conclusion

If I want to design and implement a programming language again (seriously, not just to gain experience); with this project, I realized that I should spend more time on the design phase so that I don't get into ideological conflicts on the way...

See the project on GitHub, test it and also give feedback (if you like)!

Implementation of a Programming Language

April 16, 2024

Specifying the Desired Syntax

Implementation

Chopping the Code

Generating Syntax Trees

Code Execution

Conclusion

Comments on this post

Be the first to comment!

Leave a comment