Part 1: Lexical Analyzer

155 views 9:10 am 0 Comments July 26, 2023

i need simple project in Compiler Construction – Details in the image below

– Help me please Or explain to me in detail how to do it

Part 1: Lexical Analyzer

Lex is a program designed to generate scanners, also known as tokenizers, which recognize lexical patterns in text by specifying regular expressions. Lex is an acronym that stands for “lexical analyzer generator”. Lex can perform simple transformations by itself, but its main purpose is to facilitate lexical analysis, the processing of character sequences such as source code to produce symbol sequences called tokens for use as input to other programs such as parsers. The input notation for the Lex tool is referred to as the “Lex language” and the tool itself is the “Lex compiler”. Behind the scenes, the Lex compiler transforms the input patterns into a transition diagram and generates code, in a file called lex.yy.c, that simulates this transition diagram.

Task 1: Search and read and try to learn what is the Lex? How it works? Is there a specific syntax for the Lex code? Can we use IDE to write a Lex code?

Task2: If you learn enough about Lex, go ahead and implement a lexical analyzer for a Pascal-like language using Lex tool. The lexical analyzer should recognize identifiers, integer literals, string literals, keywords, and predefined symbols. Their definitions are below: 1- Lexical Classes: (note that &#39;l&#39; and the outermost &#39;(&#39; and &#39;)&#39; are meta-symbols and not part of the alphabet) ID = letter (letter | digit_)* INT = digit+ STR = “*” WS = ( | | )+ SYM = ( + 1-1 * 1 = 1 < |=””><= |=””> | >> | <>1.11:1; 1 .. 1 := |(1)|[1]) In the above definitions, can be any printing character capital or small. Of course, WS (white space) only serves as a delimiter and no corresponding token should be returned by the lexical analyzer. 2- Keywords and reserved symbols: The symbols and names that the lexer should recognize are: and, begin, forward, div, do, else, end, for, function, if, array, mod, not, of, or, procedure, program, record, then, to, type, var, while, +, *, -5,<,><,>, >5,>,., , :,:,:, … (,), [] The language is case sensitive. Each keyword should be uniquely identified by its own token. You may choose to return the same token (e.g. RELOP) for every relational operator (with different lexemes, of course) or a different token for each relational operator. The same is true for arithmetic operators. 3- Comments: In the programming language you are building a compiler for, comments consist of text enclosed in matching curly braces, namely {and }. No token should be produced for comments. The lexer, upon encountering a comment, should produce the first token following the end of the comment.

Task3: Symbol table is an important data structure created and maintained by compilers in order to store information about the occurrence of various entities such as variable names, function names, objects, classes, interfaces, etc. Symbol table is used by both the analysis and the synthesis parts of a compiler. Consider this functionality in your lexer for identifiers and keywords if it is appeared in the source code.

Tags: , , , , , ,

Leave a Reply

Your email address will not be published. Required fields are marked *