NCC
NCC is a compiler I wrote from scratch for a compiler design course. It takes source files written in a small custom language and compiles them directly to x86 machine code executed from memory.
Written in C, the language supports arithmetic, strings, integer variables, basic I/O, relational and logical expressions, if/else, and while loops.
Pipeline
Each source file passes through four stages before anything runs.
- Buffer The source file is read entirely into memory. The buffer tracks line and column position and supports ungetting characters. Setup for the lexer.
- Lexer The token stream is produced character by character. Handles identifiers, integer and float constants, string literals with escape sequences, operators, block and line comments.
- Parser A recursive descent parser produces an AST. Arithmetic, relational, and logical expressions are handled according to precedence and associativity rules. Statements are parsed into sequences.
- Codegen The AST is walked post-order. x86 machine code is written directly into a byte array using a stack machine model. The finished program is cast to a function pointer and called.
Other Notes
The stack machine keeps the top of stack in the accumulator register. Jumps for if/else and while are patched in after the target addresses are known.
Short-circuit evaluation for & and | is implemented with conditional jumps. The result of every relational and logical expression lives in AL as a 0 or 1.
Variables are stored at fixed heap addresses tracked in a symbol table. String constants are interned at compile time into a string table that persists through execution.
This class was difficult, but worth every lecture. Kirk Duffin is a fantastic professor.