Home

Writing a Compiler: Functions

compiler typescript type-checker parser

In this post, I'll go over the details of implementing functions in the programming language I'm developing. We'll break down the process of how functions are parsed and compiled.

For details on how functions work in this language, check out the language reference.

In this post, we'll focus on implementing both the parser and the code generation step for the recursive factorial function shown below.

factorial = fn(n: int): int {
    next (n <= 1) ? 1 : n * factorial(n - 1)
}

The Parser

Syntax

The EBNF syntax for a function in this language is defined as follows:

function       ::= "fn" [parameters] block ;
parameters     ::= "(" param_list ")" [":" return_type];
param_list     ::= [param {"," param}] ;
param          ::= identifier [":" type] ;
block          ::= "{" {statement} "}" ;
return_type    ::= type ;

The function definition starts with the keyword "fn", followed by optional parameters enclosed in parentheses, an optional return type (denoted by ":" return_type), and a block of statements within curly braces.

The parameters consist of an optional list of param, where each parameter can have an optional type annotation (e.g., identifier: type). The block represents the function body, containing zero or more statements.

Creating the Scope

The parseBlock function sets up the process of parsing a block of code. It takes a token marking the start of the block and a callback function to customize how the block is parsed. Within this function, a new scope for symbols is created using symbolTable.withScope.

A function symbol is created and stored in the special property ScopeOwner, enabling future parsing functions to access it easily.

This function also creates the $ variable symbol, which points to the arguments passed to the function.

function parseBlock(
	tk: ScannerToken,
	cb: (node: NodeMap['fn']) => Node[],
): NodeMap['fn'] {
	return symbolTable.withScope(() => {
		const node: NodeMap['fn'] = {
			...tk,
			kind: 'fn',
			children: [],
			flags: 0,
			symbol: EmptyFunction,
		};
		const symbol = symbolTable.set(ScopeOwner, {
			kind: 'function',
			definition: node,
			flags: 0,
		});
		node.symbol = symbol;
		symbolTable.set('__BLOG_CONTENT__#x27;, {
			name: '__BLOG_CONTENT__#x27;,
			kind: 'variable',
			flags: 0,
		});
		node.statements = cb(node);
		node.children.push(...node.statements);
		return node;
	});
}

Parsing The Function

To parse a function definition, we'll add a handler for the fn token to our parser. For more information about how the parser works, refer to my parser post.

...
fn: {
	prefix: tk =>
		parseBlock(tk, node => {
			if (optional('(')) {
				blockParameters(node);
				if (optional(':'))
					node.children.push(
						(node.returnType = expectType()),
					);
			}
			expect('{');
			const result = parseUntilKind(statement, '}');
			node.end = expect('}').end;
			return result;
		}),
},
...

When the parser encounters an fn token, it will refer to the parser table and use the prefix function to process it.

If the next token is an opening parenthesis (, it indicates the function has parameters. The function then calls blockParameters(node) to parse and store those parameters into the node object.

After that, if a colon (:) is found, it implies the presence of a type annotation for the function. The type is expected to follow the colon and is stored in node.type.

The parser expects an opening curly brace {, which signals the start of the function's body. The parseUntilKind function parses statements inside the function body until it encounters a closing curly brace (}).

After parsing the statements, the function expects a closing curly brace } and marks the end of the node in the AST by storing the end position.

Parsing Parameters

The blockParameters function is responsible for parsing parameters. It uses the parseList helper function to process parameters separated by commas.

The parameter function parses each parameter and creates a corresponding variable symbol. It checks for an identifier, optionally followed by a type annotation, and registers each parameter in the symbol table as a variable.

function parameter(): NodeMap['parameter'] | undefined {
	const ident = optional('ident');
	if (!ident) return;
	let type: Node | undefined;
	const name = text(ident);

	if (optional(':')) type = expectNode(typeParser(), 'Expected type');

	const symbol: SymbolMap['variable'] = symbolTable.set(name, {
		name,
		kind: 'variable',
		flags: 0,
	});
	const nameNode = { ...ident, symbol };
	return (symbol.definition = {
		...ident,
		kind: 'parameter',
		symbol,
		name: nameNode,
		type,
		children: [nameNode, type],
	});
}

function blockParameters(node: NodeMap['fn']) {
	node.parameters = parseList(parameter, ',', n => !!n);
	if (node.symbol)
		node.symbol.parameters = node.parameters.map(p => p.symbol);
	expect(')');
	node.children.push(...node.parameters);
}

AST

The Abstract Syntax Tree (AST) produced by the parser for the factorial function has the following structure:

(root (def :factorial (fn (parameter :n typeident) typeident (? (next (<= :n 1)) 1 (* :n (call :factorial (- :n 1)))))))

Type Checking

This is where things get interesting. We'll revise our existing type checker to ensure it appropriately enforces the type constraints for the factorial function.

The type checker must validate the following constraints:

The operands for the <= and * operators are numbers.
The recursive call factorial(n - 1) passes an integer as the argument (n - 1 should be a valid integer).
The return value of the recursive call is of the correct type (an integer).

To achieve this, we'll divide the type checker into two main components: the checker and the resolver.

The resolver is responsible for determining the expected type of a given node in the AST. It tracks variable types, function return types, and ensures that expressions are well-typed.

const resolveMap: {
	[K in NodeKind]?: (node: NodeMap[K]) => Type | undefined;
} = {
	def: node => {
		let type = node.left.symbol.type;
		if (type) return type;
		type = (node.type && resolver(node.type)) || resolver(node.right);
		if (type) node.left.symbol.type = type;
		return type;
	},
	ident: n => n.symbol.type,
	typeident: n => n.symbol,
	call: n => resolveReturnType(n.children[0]),
	number: n => BaseTypes[Number.isInteger(n.value) ? 'int' : 'float'],
	parameter: node => {
		if (node.symbol.type) return node.symbol.type;
		if (node.type) {
			node.symbol.type = resolver(node.type);
			return node.symbol.type;
		}
	},
	fn: node => {
		const sym = node.symbol;

		if (!sym.returnType) {
			if (node.returnType) sym.returnType = resolver(node.returnType);
		}
		if (node.parameters?.length) node.parameters.forEach(resolver);

		return sym;
	},
};

function resolveReturnType(node: Node) {
	if (node.kind === 'ident') {
		const type = resolver(node);
		if (type?.kind === 'function' && type.returnType)
			return type.returnType;
	}
}

/**
 * Determines the type of a node based on its kind and associated type declarations.
 */
function resolver(node: CheckedNode) {
	return (node[typeSymbol] ||= resolveMap[node.kind]?.(node as never));
}

The checker component uses information from the resolver to enforce the type rules throughout the code. It verifies that every node in the AST adheres to the expected types, ensuring that operators like <= and * receive numeric operands and that the recursive call factorial(n - 1) returns and accepts appropriate types, like integers.

The checker structure is essentially a large switch statement that traverses the AST and verifies the types of each node. For a complete view of the source code, visit this link.

function check(node: Node): void {
	switch (node.kind) {
		case 'root':
			return checkEach(node.children);
		case 'fn':
			resolver(node);
			return node.statements && checkEach(node.statements);
		case 'main':
			return checkEach(node.statements);
		case 'next': {
			const fn = node.owner;
			const val = node.children?.[0];
			const type = val ? resolver(val) : BaseTypes.void;

			if (!fn.returnType) fn.returnType = type;
			else if (type && !canAssign(fn.returnType, type))
				error(
					`Type "${typeToStr(
						type,
					)}" is not assignable to type "${typeToStr(
						fn.returnType,
					)}".`,
					node,
				);
			return;
		}
		...
	}
}

The type checker is in its early stages and currently only performs the checks required to compile the factorial function. As we progress in this series, we'll implement additional features.

Code Generation

Now for the code generation step. Once the AST nodes are created and validated, we can proceed to generate functional JavaScript code.

The generated code by our compiler looks like this:

const factorial = n => {
	return n <= 1 ? 1 : n * factorial(n - 1);
};

The compiler step will walk the AST, similarly to our type checker, and generate output based on node and symbol information. The full source code for this version is available here

export function compile(node: Node): string {
	switch (node.kind) {
		case 'data':
			return `[${compileEach(node.children)}]`;
		case 'done':
			return 'return;';
		case 'loop':
			return `(function*(){while(${compile(node.children[0])})yield})()`;
		case 'root':
			return compileEach(node.children);
		case 'main':
			return compileEach(node.statements);
		case 'number':
			return node.value.toString();
		case '++':
		case '--':
			return `${compile(node.children[0])}${node.kind}`;
		case 'next':
			return node.generator
				? nextGenerator(node.children?.[0])
				: next(node.children?.[0]);
		case 'parameter':
		case 'ident':
		case 'string':
		case '~':
		case '!':
			return text(node);
		...
	}
}

Conclusion

In this post, we've covered parsing, type checking, and code generation for the recursive factorial function. In upcoming posts, I'll dive into more complex function features to enhance the language and compiler. These features will include handling parameter types, implementing type inference, and more. Stay tuned.

Back to Main Page