CS 445 Compiler and Translator Design
Total Credits: 4 cr
Course Coordinator: Robert Heckendorn
URL: http://marvin.cs.uidaho.edu/~heckendo/CS445
Current Catalog Description: Algorithms used by the following system
software: assemblers, macro-processors, interpreters, and compilers; compiler
design options and code optimization; all concepts implemented in major programming
assignments. Prereq: CS 210 and CS 385.
Textbook: Kenneth Louden, Compiler Construction: Principles and Practice,
Brooks and Cole or similar text.
References: Numerous handouts on the web page.
Course Goals:
- Understand the basic theory of programming languages and techniques
for simple language translation.
- Understand the features of compilers and interpreters.
- Understand the basic components and layered design of a compiler and
the rationale for their use.
- Each student will be responsible for independently building a simple
compiler and augment it with new features. The instructor will help the
student to achieve this goal over a series of assignments, but it is up
to the student to keep up with the assignments and complete each on time.
Prerequisites by Topic:
CS 210 Programming Languages, CS 385 Theory of Computation, and it is expected that:
- Students are skilled at programming in C or C++, including scope and
recursion, and understand the process of edit/compile/debug cycle.
- Understand the use, design and implementation of many common data structures.
- Understand principles of good program design and be able to apply that
to a large incremental programming project.
- Understand the basics of computer architecture including machine organization,
assembly language, instruction sets, and memory organization.
- Understand operating systems concepts including details of program loading,
system calls, libraries, memory allocation and initializing for execution.
- Understand systems concepts such as dependency driven build, linking,
tarring, d ifferences in operating systems and system tools.
- Understand the basics of formal grammar and language theory including
FSA, CFGs, REs, lex, yacc.
The course will take all of these basic skills to the next level.
Major Topics Covered in the Course: (duration) (CC 2001 reference)
- Types of language translation (1 hour) (PL3)
- Language standards (1 hour) (SP4)
- The role of a compiler in the world of software (1 hour) (SE3 , PL11)
- Organization of a compiler (3 hours) (PL8)
- Bison (2 hours) (PL3)
- Flex (2 hours) (PL3)
- Formal language theory incl. RE, BNF, EBNF, grammars and their organization
(3 hours) (AL7,PL11)
- How to control derivation of sentences (1 hour) (PL8)
- FSA (1 hours) (AL7)
- Some UNIX tools (make, tar, etc) (1 hours) (SE3)
- Tokens (1 hours) (PL3)
- LL and LR parsers including LALR and SLR parsers and how they work (6
hours) (PL8)
- First and follow sets (2 hours) (PL8)
- Syntax trees (2 hours) (PL8)
- Tree annotation (4 hours) (PL8)
- Nice error reporting in a lookahead parser, (1 hour) (PL8)
- Attribute grammars (1 hour) (PL8 , PL10, PL11)
- Ambiguity and how to avoid it (1 hour) (PL8)
- Ambiguity and how to use it in Bison (1 hour) (PL8)
- Controlling precedence (1 hour) (PL8)
- Dangling else (2 hours) (PL8, PL3)
- Shift reduce conflicts (1 hour) (PL8)
- Reduce reduce conficts (2 hours) (PL8)
- Error handling via the error token mechanism (2 hours) (PL8)
- Memory allocation for variables and procedures (4 hours) (PL8, PL4,
OS5 , PL11)
- Scoping of variables (2 hours) (PL4, PL5 ,PL10, PL11)
- Procedure call implementation and frames (2 hours) (PL10)
- Global variables (1 hour) (PL10)
- Type checking (2 hours) (PL4)
- Expression compilation (1 hour) (PL3)
- Code generation (4 hours) (PL8)
- Code optimizations including intermediate representation optimization,
and backend optimizations such as instruction selection and scheduling and
register allocation, peephole optimization, variable length data allocation
(6 hours) (AR3, AR4, PL8)
- Linkers and loaders and how linking works (1 hour) (OS2)
Laboratory projects (specify number of weeks on each): Over the length
of the whole course we build a compiler for a C like language and target a virtual
machine for execution which provides great student feedback on execution and
debugging. The command line interface to the compiler is improved with each
assignment.
- Build a scanner (2 weeks)
- Build a parser (2 weeks)
- Build a scope-type checker with memory allocation (3 weeks)
- Build in error recovery (2 weeks)
- Build a code generator (4 weeks)
Estimated Curriculum Category Content:
| Area |
Core |
Advanced |
Area |
Core |
Advanced |
| Algorithms |
|
1 cr |
Data Structures |
0.5 cr |
0.5 cr |
| Software Design |
0.5 cr |
|
Prog. Languages |
0.5 cr |
0.5 cr |
| Computer Arch |
0.5 cr |
|
Other |
|
|
Oral and Written Communications: None.
Social and Ethical Issues: I discuss the obligation to the compiler
developers to provided reliability and backward compatibility and what that
means, standards and how compilers set standards.
Theoretical Content: Languages, FSAs, grammars, parser types and how
they work. The theory of grammar is tied directly to their tasks in modifying
the language specification and grammar to make a practical grammar for Bison
that includes reasonable error processing.
Problem Analysis: The students are not told how to implement a compiler
line by line but rather all that they need to know to apply the knowledge they
have to implement a compiler. This requires a solid base in programming and
computer science. I give them as much freedom in implementation as I can give
them while still grading them using automated scripts for testing their code
in development and then grading their results.
Solution Design: They must design the compiler based on my basic layout.
I test for functionality and not how they did it but success is dependent on
a good design. Every exercise requires that they complete the outline of the
design I supply.
Course Outcomes: The following list documents the course outcomes
and crossreferences them to the BSCS program
outcomes. The letter at the beginning of each reference identifies the program
outcome supported. The numbers sequentially identify the course outcome for
this course. After completing CS 445 a student should:
- Be able to apply knowledge of context free grammars to language translation
and file parsing problems. (a-1-445)
- Be able to write and integrate a scanner, parser, semantic analyzer,
and code generator into a simple working compiler. (c-1-445)
- Be able to use a lexical analyzer and parser generator to write and
compile a front-end for a compiler. (i-1-445)
- Be knowledgeable about scanners and how they are defined; demonstrate
proficiency with regular expressions and lexical analysis techniques. (k-1-445)
- Be knowledgeable about parsers and how they are defined; understand
recursive descent and related LL parsing algorithms; know and use LR parsing
theory and methods. (k-2-445)
- Be familiar with semantic analysis principles; understand and use synthesized
and inherited grammar attributes; be able to employ symbol tables and implement
type checking algorithms. (k-3-445)
- Know principles of intermediate and final code generation; be aware
of instruction selection and register allocation; be familiar with common
optimizations. (k-4-445)
CS faculty approval on 4/3/2007
Course outcomes revision approved by CS faculty on 3/26/2008