This post is a follow up from my previous post. Here I will talk about parts of a compiler. A Compiler usually contains following parts:
- Lexical Analyser (Lexer)
- Syntax Analyser (Parser)
- Semantic Analyser
- Symbol Table
- Intermediate Code Generator
- Code Generator
- Code Optimizer
The Source Code goes through the these phases and is finally converted to Target Language (which can be Machine Language or Byte Code).
Let us have a short talk about each one of them.
Lexical Analysis: This is the phase when the source code is converted into a stream of Tokens. These tokens are meaningful character strings. They would have some predefined meaning in the syntax of a language. For example, in C “int” could be converted to a Token called “Integer Type”. This token will then be used in the subsequent phases of the compiler which tells them that “int” describes “Integer Type”.
Syntax Analysis: This is the phase when the Source Code is analysed and converted into its Parse Tree or Syntax Tree. If any error is found in this phase then it is reported to the user. This phase takes the stream of Tokens from Lexical Analyser. Syntax Analyser is actually an Automata (which is a type of Turing Machine). It is created on the basis of the Syntax of Language. Syntax Analyser for C and of C++ are different. Most of the Syntax Analyser creates an Abstract Syntax Tree which actually depicts the structure of the source code.
Semantic Analysis: In this phase Compiler checks for Semantic errors of a language. Semantic errors mainly consists of type errors and whether variable has been declared or not. It makes a heavy use of Symbol Table. For every variable it encounters it gets variable’s type in Symbol Table and check with the language rules. It also performs Type Conversions based on the language rules. For example, the expression “1.5 + 2” will give a Float but “2” is an Integer. In this phase compiler will convert “2” to Float and apply float addition operator.
Symbol Table: This part of compiler is responsible for storing all the Type Information about all variables or functions. For example, if you have declared your variable named “foo” as an integer then Symbol Table will store it. This part is used by all the stages of compilers. It is mainly created at the Syntax Analysis stage.
Intermediate Code: Intermediate Code may or may not be present in a Compiler structure. It depends on the Compiler Designers whether they want the Intermediate Code to be generated or not. Intermediate Code can be of many forms like Abstract Syntax Tree, Three Address Code, Static Single Assignment etc.
Code Optimizer: Code Optimizers mainly work on Intermediate Code. They apply many algorithms for optimizing the code. Similar to Intermediate Code, Code Optimization is not a necessary part of compiler. Even many compilers like GCC or LLVM implements a command line switch to turn off/on and set level of Code Optimization.
Code Generation: Code Generation is the final step of Compiler. At this step the target language code is generated. It can be Byte Code or x86 Assembly Code.
In the next post I will start with Lexical Analysis phase of compiler.