Writing a Compiler: Parsing Expressions Writing a Compiler: Parsing Expressionspratt parser syntactic-analysis

In this post we will build a simple expression parser. We will be using the Pratt Parsing technique. The source code for the complete parser can be found here

Demo

Parser

We will define our parser as a function that processes the token at the current position and generates a new syntax tree node. Our compiler will have at least 3 parsers eventually - one for expressions, one for statements, and one for types. To ensure all parsers can share the program state, we'll keep this state and related general parsing methods and helper functions within an object. All of our parsers will accept this object as a parameter.

AST Nodes

We will use a type map to define our AST nodes. Using a map will give us some additional type safety once we start manipulating and transforming nodes. More properties will be added to the nodes as we improve our syntactic analysis. Non-terminal nodes have a children property to allow for easy traversal of the AST.

type MakeNodeMap<Base> = {
	[K in keyof Base]: Token<K> & Base[K];
};
type Infix = { children: [Node, Node] };
type MakeInfix<T extends string> = { [K in T]: Infix };

export type BaseNodeMap = {
	root: { children: Node[] };
	ident: {};
	string: {};
	number: {};
	comment: {};
	'?': { children: [Node, Node, Node | undefined] };
	'~': { children: [Node] };
	'!': { children: [Node] };
	'+': { children: [Node] };
	'-': { children: [Node] };
	call: { children: [Node, Node | undefined] };
} & MakeInfix<
	| '='
	| '.'
	| '>>'
	| ','
	| '||'
	| '&&'
	| '+'
	| '-'
	| '*'
	| '/'
	| '|'
	| '&'
	| '=='
	| '!='
	| '<'
	| '>'
	| '<='
	| '>='
	| '^'
>;
export type NodeMap = MakeNodeMap<BaseNodeMap>;
export type Node = NodeMap[keyof NodeMap];
export type NodeKind = keyof NodeMap;

Parsing Expressions

This is the core of our Pratt parser. As you can see, it is a very simple function. First it checks if the current token has a prefix function, and if it does, it calls it. The prefix function is used to parse prefix operators, literals and identifiers. The resulting node becomes our left operand.

If a left node is found, we proceed to check if the following token has an infix function and if its precedence is higher than the precedence parameter. If a valid operator is found, it is called. The result of the infix function becomes our new expression node. We do this last step in a loop until no more operators with the required precedence are found.

function expression(precedence = 0) {
	const left = current();
	const prefixOp = (table[left.kind] as Operator<string>)?.prefix;
	let result = prefixOp ? (next(), prefixOp(left)) : undefined;

	while (result) {
		const n = current();
		const op = table[n.kind] as Operator<string>;
		if (op?.infix && precedence < op.precedence) {
			next();
			result = op.infix(n, result);
		} else break;
	}
	return result;
}

Prefix Operators

To parse prefix operators, such as as ! and ~, We will use the prefix helper function. Prefix operators are right associative. The rbp (Right Binding Power) parameter, determines the precedence level of the token on its right.

function prefix<K extends UnaryNode['kind']>(rbp = 0) {
	return (tk: Token<K>) => {
		const right = expression(rbp);
		if (!right) throw error('Expected expression', tk);
		return {
			...tk,
			children: [right],
			end: right.end,
		} as unknown as NodeMap[K];
	};
}

Infix Operators

Like prefix, the infix function is used to create infix operators. The resulting node will have two children, the left and right operands.

Most infix operators are left-associative, meaning the operations are grouped from the left. For example, the expression a OP1 b OP2 c is interpreted as (a OP1 b) OP2 c. The rbp value of a left-associative operator is the same as its precedence value.

To define a right-associative operator a OP1 (b OP2 c, like the = assignment operator, we pass an rbp value of 0 instead.

function infix<N extends InfixNode>(rbp: number) {
	return (tk: Token<string>, left: Node) => {
		const right = expectNode(expression(rbp), 'Expected expression');
		const node = tk as N;
		node.start = left.start;
		node.children = [left, right];
		node.end = right.end;
		return node;
	};
}

Ternary Operators

The following ternary function parses the conditional operator ? .. : ...

function ternary<N extends TernaryNode<false>>(
	precedence: number,
	operator2: Kind,
) {
	const _infix = infix(precedence);
	return (node: Token<N['kind']>, left: Node) => {
		const result = _infix(node, left) as unknown as N;
		expect(operator2);
		const child3 = expectNode(
			expression(precedence),
			'Expected expression',
		);
		result.end = child3.end;
		result.children.push(child3);
		return result;
	};
}

In our language, however, I want our conditional operator to have an optional third operand. To achieve this, we will modify the previous function slightly.

function ternaryOptional<N extends TernaryNode<true>>(
	precedence: number,
	operator2: Kind,
) {
	const _infix = infix(precedence);
	return (node: Token<N['kind']>, left: Node) => {
		const result = _infix(node, left) as unknown as N;
		if (optional(operator2)) {
			const child3 = expectNode(
				expression(precedence),
				'Expected expression',
			);
			result.end = child3.end;
			result.children.push(child3);
		}
		return result;
	};
}

Handling Special Cases

Operators such as + and - can be both a prefix and an infix operator, depending on the position of their tokens. To handle this, we can specify both methods in their definition. The precedence value only applies to the infix operator.

...
	'+': {
		precedence: 11,
		infix: infix(11),
		prefix: prefix(14),
	},

The ( token can be either a function call or a group. To parse a group, we define a prefix function. As a prefix, the operator does not create a new node, but uses the node of its inner expression. ( is left associative, so we do not pass a precedence value to expression. As an infix operator, the right operand is optional, i.e. a function call with no arguments.

...
	'(': {
		precedence: 20,
		prefix() {
			const node = expectNode(expression(), 'Expected expression');
			expect(')');
			return node;
		},
		infix(tk, left) {
			return {
				...tk,
				kind: 'call',
				children: [left, expression()],
				end: expect(')').end,
			};
		},
	},

The Operator Table

Now that we have gone over all the different types of parsing methods, we are ready to define our operator table.

export type OperatorTable = {
	[K in Kind]?: Operator<K>;
};
export type Operator<K extends string> =
	| {
			precedence: number;
			infix(token: Token<K>, left: Node): Node;
			prefix?(token: Token<K>): Node;
	  }
	| {
			prefix(token: Token<K>): Node;
			infix?: never;
	  };

const table: OperatorTable = {
	'>>': infixOperator(2),
	'||': infixOperator(3),
	'&&': infixOperator(4),
	'|': infixOperator(5),
	'^': infixOperator(6),
	'&': infixOperator(7),
	'==': infixOperator(8),
	'!=': infixOperator(8),
	'<': infixOperator(9),
	'>': infixOperator(9),
	'<=': infixOperator(9),
	'>=': infixOperator(9),
	'+': {
		precedence: 11,
		prefix: prefix(14),
		infix: infix(11),
	},
	'-': {
		precedence: 11,
		prefix: prefix(14),
		infix: infix(11),
	},
	'~': {
		prefix: prefix(14),
	},
	'!': {
		prefix: prefix(14),
	},
	'/': infixOperator(12),
	'*': infixOperator(12),
	'.': {
		precedence: 17,
		infix(tk, left) {
			const right = expect('ident') as NodeMap['ident'];
			return {
				...tk,
				start: left.start,
				children: [left, right],
				end: right.end,
			};
		},
	},
	',': infixOperator(3),
	'=': infixOperator(2, 0),
	'(': {
		precedence: 20,
		prefix() {
			const node = expectNode(expression(), 'Expected expression');
			expect(')');
			return node;
		},
		infix(tk, left) {
			return {
				...tk,
				kind: 'call',
				children: [left, expression()],
				end: expect(')').end,
			};
		},
	},
	'?': {
		precedence: 2,
		infix: ternaryOptional(2, ':'),
	},
	number: { prefix: n => n },
	string: { prefix: n => n },
	ident: { prefix: n => n },
};

What's Next

The Pratt parsing method offers a simple way to parse expressions. By defining an operator table, prefix and infix functions, and setting rules for precedence and associativity, we can easily parse expressions and generate an abstract syntax tree (AST).

Stay tuned as we build towards a fully functional compiler that not only understands the code's syntax but also its semantics.

Back to Main Page