Compiler Structure

This post is a follow up from my previous post. Here I will talk about parts of a compiler. A Compiler usually contains following parts:

  • Lexical Analyser (Lexer)
  • Syntax Analyser (Parser)
  • Semantic Analyser
  • Symbol Table
  • Intermediate Code Generator
  • Code Generator
  • Code Optimizer

The Source Code goes through the these phases and is finally converted to Target Language (which can be Machine Language or Byte Code).

flowchart2

Let us have a short talk about each one of them.

Lexical Analysis: This is the phase when the source code is converted into a stream of Tokens. These tokens are meaningful character strings. They would have some predefined meaning in the syntax of a language. For example, in C “int” could be converted to a Token called “Integer Type”. This token will then be used in the subsequent phases of the compiler which tells them that “int” describes “Integer Type”.

Syntax Analysis: This is the phase when the Source Code is analysed and converted into its Parse Tree or Syntax Tree. If any error is found in this phase then it is reported to the user. This phase takes the stream of Tokens from Lexical Analyser. Syntax Analyser is actually an Automata (which is a type of Turing Machine). It is created on the basis of the Syntax of Language. Syntax Analyser for C and of C++ are different. Most of the Syntax Analyser creates an Abstract Syntax Tree which actually depicts the structure of the source code.

Semantic Analysis: In this phase Compiler checks for Semantic errors of a language. Semantic errors mainly consists of type errors and whether variable has been declared or not. It makes a heavy use of Symbol Table. For every variable it encounters it gets variable’s type in Symbol Table and check with the language rules. It also performs Type Conversions based on the language rules. For example, the expression “1.5 + 2” will give a Float but “2” is an Integer. In this phase compiler will convert “2” to Float and apply float addition operator.

Symbol Table: This part of compiler is responsible for storing all the Type Information about all variables or functions. For example, if you have declared your variable named “foo” as an integer then Symbol Table will store it. This part is used by all the stages of compilers. It is mainly created at the Syntax Analysis stage.

Intermediate Code: Intermediate Code may or may not be present in a Compiler structure. It depends on the Compiler Designers whether they want the Intermediate Code to be generated or not. Intermediate Code can be of many forms like Abstract Syntax Tree, Three Address Code, Static Single Assignment etc.

Code Optimizer: Code Optimizers mainly work on Intermediate Code. They apply many algorithms for optimizing the code. Similar to Intermediate Code, Code Optimization is not a necessary part of compiler. Even many compilers like GCC or LLVM implements a command line switch to turn off/on and set level of Code Optimization.

Code Generation: Code Generation is the final step of Compiler. At this step the target language code is generated. It can be Byte Code or x86 Assembly Code.

In the next post I will start with Lexical Analysis phase of compiler.

Cheers 😀

Advertisements

2 thoughts on “Compiler Structure

  1. Sorry to say that, but from your last blog post and this one, it seems that you don’t have any experience with formal languages nor compilers. You focus very much on “trivialities” like lexing and parsing and nearly skip the complicated parts like type analysis and optimizations. I really don’t mean to insult you, but it really seems like you just started into the topic and are impressed by the simple stuff. So to prevent you from a hard crash, I suggest to get some first hands experience with a high-level language like OCaml or Haskell first, before going further. This also gives you a tool much better suited for formal languages (an AST ina functional language is really concisely represented as an algebraic datatype). Then you should decide which aspect of a compiler (frontend/backend) you really like the most and ideally start helping out in the existing communities (they always need volunteers ;)).

    • Hi Ted,
      I dont think you read my posts completely. It is a post for those who want to get started and need some push for it. I already said I will directly be going into the details of implementation with some theory part just enough to develop that implementation. This blog is mainly related to how you could implement and whatever the problems you could face and their solution. I never intend this post to go to more advanced stuff of compiler. Morever, I already wrote that I would go into the trivial stuff and how to make them. Also, if you would have gone through the index of my first post you could see that I haven’t even reach there. I have just completed the parts of compiler which itself is a trivial stuff. It will take some posts until I reach there.
      BTW, I have studied Haskell and it has some very good features. I will certainly blog about them also.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s