Wolmer’s Trust High School for Girls
Grades: Upper and Lower Six Teacher: Mrs. McCallum-Rodney
THE COMPILER
A compiler is a program that reads a program in one language, the source language and translates into an equivalent program in another language, the target language.
The translation process should also report the presence of errors in the source program.
Source Program |
→ |
Compiler |
→ |
Target Program |
|
|
|
|
|
|
|
Error Messages |
|
|
There are two parts of compilation.
The analysis part breaks up the source program into constant piece and creates an intermediate representation of the source program.
The synthesis part constructs the desired target program from the intermediate representation.
What a Compiler Does?
A compiler translates programs from one language to the other as shown in Figure below. It has a nick name called translator. Traditionally, a compiler translates programs written in high-level language to assembly languages. Assemblers covert programs written in assembly languages to machine code. Interpreters like Java virtual machine run programs by interpreting intermediate code such as Java bytecode. By and large, a program that translates input in some form to output in some form can be called a compiler. There are some basic properties for a compiler.
- A compiler must be error-free.
- A compiler must always terminate, no matter what the input looks like.
- A compiler should attempt to find as many errors as possible during a single compilation pass.
Figure: Basic Compiler Concept
PHASES OF COMPILER
The compiler has a number of phases plus symbol table manager and an error handler.
The cousins of the compiler are
- Preprocessor.
- Assembler.
- Loader and Link-editor.
· The process of lexical analysis is called lexical analyzer, scanner, or tokenizer. Its purpose is to break a sequence of characters into a subsequences called tokens.
· The syntax analysis phase, called parser, reads tokens and validates them in accordance with a grammar. Vocabulary, i.e., a set of predefined tokens, is composed of word symbols (reserved words), names (identifiers), numerals (constants), and special symbols (operators).
· During compilation, a compiler will find errors such as lexical, syntax, semantic, and logical errors.
o If a token is found not belonging to the vocabulary, it is an lexical error. A grammar dictates the syntax of a language.
o If a sentence does not follow the syntax, it is called a syntax error.
o Semantic errors are like assigning an integer to a double variable!
o Logical errors simply refer to the program logic is not correct, even though it is syntactically and semantically correct!
Front End vs Back End of a Compilers
· The phases of a compiler are collected into front end and back end.
o The front end includes all analysis phases end the intermediate code generator.
o The back end includes the code optimization phase and final code generation phase.
· The front end analyzes the source program and produces intermediate code while the back end synthesizes the target program from the intermediate code.
A naive approach (front force) to that front end might run the phases serially.
1. Lexical analyzer takes the source program as an input and produces a long string of tokens. The lexical analysis scans the characters of the same program from left to right and builds the actual symbols of the program – integers, identifiers, reserved words, etc. The lexical analyzer reads input and breaks it into tokens; in fact, it determines what constitutes a token. For example, some lexical analyzers may return numbers one digit at a time, whereas others collect numbers in their entirety before passing them.
- Syntax Analyzer takes an output of lexical analyzer and produces a large tree.
- Semantic analyzer takes the output of syntax analyzer and produces another tree. Takes constructs and check them for semantic correctness, stores the necessary information about the constructs in the symbol table.
- Similarly, intermediate code generator takes a tree as an input produced by semantic analyzer and produces intermediate code.
The backend approach includes:
- Code Optimization reduces execution time for project by looking at how the code execute more efficiently.
- Code Generation is the actual translation of the internal source program into assembly language or machine language. It is the most detailed part of compilation.