Homework 6
- Due Oct 25, 2016 by 11:59pm
- Points 12
- Submitting a file upload
Below is the definition of an extremely simple programming language. Your assignment is to (for Thurs Oct 20) write a program in it, and submit it using a github pull request to the calc-compiler project linked below. Also (for Tues Oct 25) write a compiler for it to LLVM.
The lexical elements of the language are as follows:
- comment lines: ^#.*$ (a hash at the start of a line through the newline)
- any number of whitespace characters: space, tab, newline
- parentheses
- integer constants: -?[0-9]+
- arithmetic operators: + - * / %
- arguments: a0-a5
- comparison operators: > >= < <= == !=
- Boolean constants: true, false
- the "if" keyword
A program in this language is an int expression. An int expression E can be any of:
- optional: (E)
- an integer constant
- an argument
- (op E E)
- (if B E E)
A Boolean expression B can be any of:
- optional: (B)
- (op E E)
- a Boolean constant
Your compiler must reject (with a reasonable diagnostic) any program not matching this grammar. There are not typing rules outside of the grammar. Integer overflow in expression evaluation results in two's complement wraparound. Integer overflow while converting a token into a 64-bit signed integer is a compile-time error. Division or mod by zero is a true undefined behavior, meaning that (as a compiler implementor) you do not need to worry about it at all! But you must avoid these behaviors while programming in the calculator language.
In your translation to LLVM each Boolean expression should have type i1 and each int expression should have type i64. The semantics of arithmetic and comparison operators are derived from the equivalent signed LLVM instructions.
You should feel free to implement if-expressions either by computing both sides and using a select instruction to choose the appropriate value, or by branching, computing the values, and then choosing the appropriate result using a phi node.
Please let me know about bugs in this spec.
A skeleton compiler can be found here:
https://github.com/regehr/calc-compiler Links to an external site. Links to an external site.
Please use LLVM 3.9 for this assignment, that way we can avoid tricky bugs due to version skew.
Please read the skeleton and the LLVM tutorial and then ask questions here! You are not expected to just find everything to be obvious! This stuff isn't conceptually difficult but there are lots of details and the interfaces are sometimes opaque.
Implement your compiler in C++. Or, if you wish, use the LLVM bindings from a different language such as C or OCaml, but keep in mind that I will give you very little help if you do this and things go badly. Regardless of implementation language, your compiler must return status code 0 to the OS upon success and status code 1 upon detecting an error in the input file -- this makes automated testing possible.
You may reuse any code you find on the web but you must acknowledge your sources. For example, I have been stealing snippets of code from a lexer and recursive descent parser found in one of my other projects:
https://github.com/google/souper/blob/master/lib/Parser/Parser.cpp Links to an external site.
Your compiler should read its program from STDIN.
I strongly suggest a simple hand-rolled lexer and a recursive-descent parser, but you may feel free to use any lexer/parser generator tool you like. Recursive descent parsing is the simplest parsing technique of all, there is much information on the web but the Wikipedia article is quite nice and includes code:
https://en.wikipedia.org/wiki/Recursive_descent_parser Links to an external site.
The next part of this assignment will be to support mutable variables and while loops. Feel free to lay any groundwork you like to make your future work easier. After that we will be detecting integer overflows and then working to make LLVM smarter about eliminating provably-useless overflow checks.