Creating Your Own Programming Language Using Python

O

Ohidur Rahman Bappy

MAR 22, 2025

Introduction

In this guide, you'll learn how to create your own programming language using Python. We'll use SLY (Sly Lex-Yacc) to simplify the process of lexical analysis and parsing.

Install SLY

Start by installing SLY for Python:

pip install sly

Building a Lexer

The first phase of a compiler is to convert character streams into token streams through lexical analysis. SLY simplifies this process.

First, import the necessary module:

from sly import Lexer

Create a BasicLexer class that extends the Lexer class. This lexer will handle simple arithmetic operations, requiring tokens such as NAME, NUMBER, and STRING. Define an ignore literal for spaces and line comments:

class BasicLexer(Lexer): 
    tokens = { NAME, NUMBER, STRING } 
    ignore = '\t '
    literals = { '=', '+', '-', '/', '*', '(', ')', ',', ';' } 

    NAME = r'[a-zA-Z_][a-zA-Z0-9_]*'
    STRING = r'".*?"'

    @_(r'\d+') 
    def NUMBER(self, t): 
        t.value = int(t.value) 
        return t 

    @_(r'//.*') 
    def COMMENT(self, t): 
        pass

    @_(r'\n+') 
    def newline(self, t): 
        self.lineno = t.value.count('\n')

Building a Parser

Import the Parser module:

from sly import Parser

Create a BasicParser class extending the Parser class. Pass the token stream from BasicLexer and set precedence rules:

class BasicParser(Parser): 
    tokens = BasicLexer.tokens 
    precedence = ( 
        ('left', '+', '-'), 
        ('left', '*', '/'), 
        ('right', 'UMINUS'), 
    ) 

    def __init__(self): 
        self.env = {}

    @_("")
    def statement(self, p): 
        pass

    @_("var_assign")
    def statement(self, p): 
        return p.var_assign

    @_("NAME '=' expr")
    def var_assign(self, p): 
        return ('var_assign', p.NAME, p.expr)

    @_("NAME '=' STRING")
    def var_assign(self, p): 
        return ('var_assign', p.NAME, p.STRING)

    @_("expr")
    def statement(self, p): 
        return p.expr

    @_("expr '+' expr")
    def expr(self, p): 
        return ('add', p.expr0, p.expr1)

    @_("expr '-' expr")
    def expr(self, p): 
        return ('sub', p.expr0, p.expr1)

    @_("expr '*' expr")
    def expr(self, p): 
        return ('mul', p.expr0, p.expr1)

    @_("expr '/' expr")
    def expr(self, p): 
        return ('div', p.expr0, p.expr1)

    @_("'-' expr %prec UMINUS")
    def expr(self, p): 
        return p.expr

    @_("NAME")
    def expr(self, p): 
        return ('var', p.NAME)

    @_("NUMBER")
    def expr(self, p): 
        return ('num', p.NUMBER)

By parsing arithmetic operations, you can create expressions that return parse trees. For example:

GFG Language > a = 10
GFG Language > b = 20
GFG Language > a + b
30

Execution

The interpreter takes the parse tree, evaluates it hierarchically, and retrieves the final result:

class BasicExecute: 
    def __init__(self, tree, env): 
        self.env = env 
        result = self.walkTree(tree) 
        if result is not None and isinstance(result, int): 
            print(result) 
        if isinstance(result, str) and result[0] == '"': 
            print(result) 

    def walkTree(self, node): 
        if isinstance(node, int): 
            return node 
        if isinstance(node, str): 
            return node 

        if node is None: 
            return None

        if node[0] == 'program': 
            if node[1] is None: 
                self.walkTree(node[2]) 
            else: 
                self.walkTree(node[1]) 
                self.walkTree(node[2]) 

        if node[0] == 'num': 
            return node[1] 

        if node[0] == 'str': 
            return node[1] 

        if node[0] == 'add': 
            return self.walkTree(node[1]) + self.walkTree(node[2]) 
        elif node[0] == 'sub': 
            return self.walkTree(node[1]) - self.walkTree(node[2]) 
        elif node[0] == 'mul': 
            return self.walkTree(node[1]) * self.walkTree(node[2]) 
        elif node[0] == 'div': 
            return self.walkTree(node[1]) / self.walkTree(node[2]) 

        if node[0] == 'var_assign': 
            self.env[node[1]] = self.walkTree(node[2]) 
            return node[1] 

        if node[0] == 'var': 
            try: 
                return self.env[node[1]] 
            except LookupError: 
                print("Undefined variable '"+node[1]+"' found!") 
                return 0

Displaying the Output

To display the interpreter's output, integrate the lexer, parser, and execution:

if __name__ == '__main__': 
    lexer = BasicLexer() 
    parser = BasicParser() 
    print('GFG Language') 
    env = {} 

    while True: 
        try: 
            text = input('GFG Language > ') 
        except EOFError: 
            break
        if text: 
            tree = parser.parse(lexer.tokenize(text)) 
            BasicExecute(tree, env)

SLY will handle errors if any of your inputs don't match the defined rules.

To run your program, use:

python your_program_name.py

This setup provides a foundational understanding of creating a basic programming language using Python and SLY.

Source: GeeksforGeeks