Welcome

Our website is made possible by displaying online advertisements to our visitors.
Please disable your ad blocker to continue.

Computer Engineering - Formal Languages and Compilers

Full exam

FORMAL LANGUAGES AND COMPILERS prof.s Giovanni Agosta, Luca Breveglieri and Angelo Morzenti Exam of THURSDAY8JUNE2023– Theory Part WITH SOLUTIONS– FOR TEACHING PURPOSES THE SOLUTIONS ARE WIDELY COMMENTED LAST NAME (SURNAME) + FIRST NAME: (capital letters please) MATRICOLA: SIGNATURE: (or PERSON CODE) TEACHER:Prof. G. AGOSTA –Prof. L. BREVEGLIERI –Prof. A. MORZENTI INSTRUCTIONS - READ CAREFULLY: • The exam is in written form and consists of two parts:1. Theory (80%): syntax and semantics of languages, dividedin four sections: –regular expressions and ﬁnite automata –free grammars and pushdown automata –syntax analysis and parsing methodologies –language translation and semantic analysis 2. Lab (20%): compiler design by Flex and Bison • To pass the exam, the candidate must succeed in both parts (theory and lab), in one call or more calls separately, but within one year (12 months) between the two parts. • For the theory part to be valid, the candidate must achieve a grade of at least 10/20 on each of the four sections. • Correctly answering all the questions that are not marked optional, allows the candi- date to achieve a high grade (although not the full mark). • For the lab part to be valid, the candidate must achieve a grade of at least 15/30. • The ﬁnal grade is the weighted average of the theory part (80 %) and of the lab part (20 %), and must be≥18/30. • The exam is open book: textbooks and personal notes are permitted. • Please write in the free space left and if necessary continueon the back side of the sheet; do not attach new sheets and do not replace the existing ones. • Time: lab part 1 h – theory part 2 h. 1 Regular Expressions and Finite Automata20% 1. Consider the nondeterministic ﬁnite-state automatonA 1below, with spontaneous transitions (ε-transitions), over the two-letter alphabet Σ ={a, b}: 1 2 3 5 4 A 1→ → a ε b a b ε Answer the following questions:(a) List all the short strings of languageL(A 1) that are of length ≤3, i.e., from length 0 to length 3 included. (b) Cut all the spontaneous transitions of automatonA 1and so obtain an equivalent automatonA 2, still possibly nondeterministic. Test the correctness of A 2with the short valid strings found at point (a), i.e., write theircomputations. (c) Through the Berry-Sethi (BS) method, from automatonA 1obtain an equivalent deterministic automatonA 3. Test the correctness of A 3with the short valid strings found at point (a), i.e., write their computations. (d) Check if automatonA 3is minimal, and if necessary minimize it. (e) Through the Brzozowski-McCluskey (BMC) method, from automatonA 3obtain an equivalent regular expressionR 1. Test the correctness of R 1with the short valid strings found at point (a), i.e., write their derivations. (f ) (optional) Write a regular expressionRR 1that generates the language L(R 1)R , i.e., the mirror image of languageL(R 1). 2 Solution(a) Here are the valid strings ofL(A 1) with length ≤3:a,a b aanda b b. (b) Cut of theε-transitions of automatonA 1through backward propagation. Cut transition 2ε −→3 (the red elements are the back-propagated ones): 1 2 3 5 4 → ↑ → a ε b a b b ε State 3 becomes unreachable and the automaton can be cleanedbe removing the dashed elements: 1 2 5 4 → ↑→ a a b b ε Cut transition 5ε −→2 (the red elements are the back-propagated ones): 1 2 5 4 → ↑→ ← a a b b b ε Finally (by removing the dashed elements and reshaping the graph): 1 2 4 5 A 2→↑→ → a a b b b AutomatonA 2is (incidentally) deterministic. It is correct as it accept s all the short strings of point (a) and no other strings of length≤3. Equivalent automata may be obtained, deterministic or not, and even with fewer states. Writing the computations of the short strings is left to the reader. 3 (c) Berry-Sethi method for a det. automatonA 3equivalent to automaton A 1: 1 2 3 5 4 A #→ → ⊣ a1 ε b2 a3 b4 ε initialsa 1 terminal followers a1b 2⊣ b 2a 3b 4 a3b 2⊣ b 4b 2⊣ a 1 b2⊣ a 3b 4 A 3→↑ a b a, b With a change of state names, automatonA 3becomes: A B C A 3→↑ a b a, b The correctness test of automatonA 3(computations) is left to the reader. (d) The deterministic automatonA 3is already in clean form. The two non-ﬁnal statesAandCofA 3are distinguishable, as state Clacks theb-transition. The ﬁnal stateBis unique. Therefore automatonA 3is even already minimal. Note that automatonA 2is deterministic, too (thus minimization theory is appli- cable to it), and that its states 2 and 5 are undistinguishable. By merging such states, automatonA 2becomes minimal and of course identical to automaton A 3 (except for the state names). 4 (e) AutomatonA 3has to be normalized with a ﬁnal state with no outgoing arc (while the initial state is already non-reentering), as follows: A B C ⊣ A 3→ a b a, b Eliminate node C: A B ⊣ A 3→ a b (a|b) Eliminate node B: A ⊣ A 3→ a b(a|b) ∗ Finally:R1= a b(a|b) ∗ As for the derivations of the valid short strings:R1choose ∗= 0 ========⇒a ε=a R1choose ∗= 1 ========⇒a b(a|b)choose left member of | ==============⇒a b a R1choose ∗= 1 ========⇒a b(a|b)choose right member of | ===============⇒a b b Thus expressionR 1successfully passes the correctness test. (f ) Here is the mirror regular expressionRR 1: RR 1= (b|a)b ∗ a Of course, writingb|arather thana|bis equivalent, as union is commutative (choosing which form should be used is only a matter of style). 5 2 Free Grammars and Pushdown Automata20% 1. A series of BNF grammars is given below. Answer the following questions: (a) Consider the following grammarG 1(axiom S): G1: S→a S b|S b|ε Argue that grammarG 1is ambiguous by providing the shortest ambiguous string and drawing all its syntax trees. (b) Write a non-ambiguous BNF grammarG 2equivalent to grammar G 1. (c) Consider next the following grammarG 3(axiom S): G3       S →A B A→a A b|A b|ε B→b B c|b B|ε Argue that grammarG 3is ambiguous by drawing all the (three) syntax trees for the valid string: b b (d) Write a non-ambiguous BNF grammarG 4equivalent to grammar G 3. (e) (optional) Draw the syntax tree for the valid string: a b b b c according to grammarG 4. 6 Solution(a) GrammarG 1is ambiguous, as the order of use of the non-null alternative rules for the axiomSis free. Here are the shortest ambiguous string and all its syntax trees (two): S a S S ε b b S S a S ε b b (b) Here is a non-ambiguous grammarG 2(axiom S) equivalent to grammarG 1: G2( S→a S b|X X→X b|ε In grammarG 2, the order of use of the non-null alternative rules is ﬁxed, b y introducing an additional nonterminalX. (c) GrammarG 3is ambiguous. Here are all the syntax trees for string b b(three): S A A ε b B b B ε S A A A ε b b B ε S A ε B b B b B ε GrammarG 3suﬀers from concatenation ambiguity, due to the unbalanced l etters b, which can be generated by both nonterminalsAandB. 7 (d) Here is a non-ambiguous grammarG 4(axiom S) equivalent to grammarG 3: G4          S →A B A→a A b|ε B→b B|C C→b C c|ε In grammarG 4, ambiguity is eliminated by generating all the unbalanced l etters bonly through nonterminalB. (e) Here is the syntax tree for the stringa b b b caccording to grammarG 4: S A a A ε b B b C b C ε c The unique unbalanced letterbcan be generated only by nonterminalB. 8 2. Consider (a simpliﬁed version of ) the LA T EX language for the deﬁnition of text docu- ments. Every document contains a preamble that speciﬁes thedocument class, e.g., article, book, ... (the class name is a text string), and thatincludes zero or more packages. The preamble is followed by a document block. Every package is speciﬁed with a command\ usepackage [options ]{package _name }. The op- tions can be omitted, along with the surrounding square brackets. The square brackets can contain one or more options, separated by commas. An option can be an identiﬁer or a pair identiﬁer/value, where the identiﬁer and the valueare separated by a sign “=”, and the value is a text string. The package name is an identiﬁer. The document block is introduced by the clause\ begin {document }and is concluded by the clause\ end {document }. The block (possibly empty) contains text strings possibly interspersed with environments. Every environment is delimited by the clauses\ begin {env _name }and\ end {env _name }, where the environment name is an identiﬁer. An environment (possibly empty) contains text strings as well as other environments. Note that since the number of environments is arbitrary, the equality of the environment names in a pair ofclauses\ begin {env _name } and\ end {env _name }cannot be modeled syntactically. \ d o c u m e n t c l a s s {a r t i c l e } \ u s e p a c k a g e [a4p a p e r , m a r g i n = 1 c m ] { g e o m e t r y } \ u s e p a c k a g e [d i s a b l e ] { t o d o n o t e s } \ u s e p a c k a g e {c o m m e n t } \ b e g i n {d o c u m e n t } \ b e g i n {c o m m e n t } c o m m e n t e d t e x t i n c l u d i n g a l s o a n o t h e r e n v i r o n m e n t\b e g i n {c o m m e n t } a n e s t e d c o m m e n t \e n d {c o m m e n t } \ e n d {c o m m e n t } m o r e a r b i t r a r y t e x t a n d c o m m a n d s c o u l d f o l l o w \e n d {d o c u m e n t } Answer the following questions:(a) Write an EBNF grammar for the LA T EX subset described above, considering the nonterminalSTRINGas already deﬁned to generate any text string, and the terminalidas representing any identiﬁer. Spaces can be safely ignored. (b) Draw the abstract syntax tree of the sample document below: \ d o c u m e n t c l a s s {a r t i c l e } \ u s e p a c k a g e [d i s a b l e ] { t o d o n o t e s } \ u s e p a c k a g e {c o m m e n t } \ b e g i n {d o c u m e n t } t e x t \e n d {d o c u m e n t } 9 Solution(a) Here is a possible EBNF grammar (axiomDOCUMENT), with layered rules:                                                               h DOCUMENTi → hPREAMBLEi hBLOCKi hPREAMBLEi → \documentclass{ hSTRINGi } hPACKAGEi∗ hBLOCKi → \begin{document} hTEXTi \end{document} hPACKAGEi → \usepackage “[”hOPTIONi ,hOPTIONi ∗ “]” {id} hTEXTi → hSTRINGi | hENVIRONi ∗ hOPTIONi →id =hSTRINGi hENVIRONi → \begin{id} hTEXTi \end{id} . . . . . . As usual, the square brackets indicate optionality. The square brackets enclosed between apices are instead used as grammar terminals. NonterminalSTRINGis expanded by rules not shown here. 10 (b) Syntax tree of the sample document (drawn according to the grammar): DOCUMENT PREAMBLE \documentclass {STRING article } PACKAGE \usepackage [OPTION disable ]{todonotes } PACKAGE \usepackage {comment } BLOCK \begin{document} TEXT STRING text \end{document} Note that nonterminalSTRINGis here expanded to the text string it generates. The subtree ofSTRINGis just represented as a dashed edge. Terminalidinstead is just replaced by the actual identiﬁer that occurs in the sample text. 11 3 Syntax Analysis and Parsing Methodologies20% 1. Consider the grammarGbelow, represented as machine net, over the two-letter ter- minal alphabet Σ ={a, b}and the two-symbol nonterminal alphabetV={S, X} (axiomS): 0 S 1S 2S 3S S → ↓→ a S b X 0 X 1X 2X X →→ b S Answer the following questions (use the ﬁgures / tables / spaces on the next pages): (a) List the four shortest strings that are generated by grammarG, and show a syntax tree for each of them. (b) Draw the complete ELR pilot of grammarG, examine all the possible conﬂicts (if any), and say if grammarGis of type ELR (1) or not. (c) Find all the guide sets for the machine net of grammarG, withk= 1, examine all the possible conﬂicts (if any), and say if grammarGis of type ELL (1) or not. (d) Is grammarGambiguous or not ? Justify your answer. (e) (optional) Is languageL(G) inherently ambiguous or not ? Justify your answer. 12 space for question (a) – short strings and their trees13 space for question (b) – complete ELR pilot and conﬂicts 0S ⊣ 1S ⊣ 0S b 1S b 0S b I0 a a 14 space for question (c) – guide sets and conﬂicts 0S 1S 2S 3S S → ↓→ a S b X 0 X 1X 2X X →→ b S 15 space for question (d) – ambiguity of the grammar space for question (e) – inherent ambiguity of the language 16 Solution(a) Here are the four shortest strings:ε,a b,a a b banda b a b. Their trees: Sε S a S ε b S a S a S ε b b S a S ε X b S a S ε b (b) Here is the complete ELR pilot of grammarG, with eleven m-states: 0 S ⊣ 3S ⊣ 1S ⊣ 0S b 2 S ⊣ 0X ⊣ 3S ⊣ 1X 0S ⊣ 2X ⊣ 1S b 0S b 2S b 0X b 3S b 1X 0S b 2X b 3S b I 0 I1 I2I 3 I4 I5I 6 I7I 8 I 9 I10 a S b S a a X S b S a X a The pilot has two RR conﬂicts on⊣andbin the m-statesI 3and I 7, respectively. There are two multiple (double) transitions, namelyI 2b −→I 3and I 6b −→I 7, but neither is convergent, so there are no Convergence conﬂicts. Anyway, grammar Gha conﬂicts, therefore it is not of type ELR (1). Note that thepilot does not have the STP either. 17 (c) Here is the PCFG of grammarG, with all the call arcs and guide sets: 0 S 1S 2S 3S S → ↓ {b,⊣ }→ { b,⊣ } a S b X 0 X 1X 2X X →→ { b,⊣ } b S {a, b} {b} {a, b,⊣ } The PCFG has one conﬂict onbin the state 2 S. Therefore grammar Gis not of type ELL (1). This was expected, as grammarGis not of type ELR (1) either. (d) Yes, grammarGis ambiguous. The valid stringa bhas two syntax trees. One (the simpler) is already shown at point (a), and the other oneis as follows: S a S ε X b S ε Also the valid stringsa a b banda b a bare ambiguous, of course. (e) From the four shortest strings of point (a), which are exactly the Dyck strings of length≤4 withaandbopen and closed brackets, respectively, one might conjecture that languageL(G) is Dyck. This turns out to be true. In fact, the net can be rewritten as two rules: S→a S(b|X)|ε X→b S then by substitution of nonterminalX: S→a S(b|b S)|ε that is, by factoring letterb: S→a S b(ε|S)|ε and since the axiomSis nullable (henceε|Sis equivalent toS): S→a S b S|ε which is a well-known (non-ambiguous) grammar of Dyck. Therefore language L(G) coincides with Dyck and is not inherently ambiguous. 18 4 Language Translation and Semantic Analysis20% 1. Consider the following source grammarGs (axiomS): Gs          S →a S b S→T T→c T d T→ε which generates the source languageLs below: Ls = ah ck dk bh |h, k≥0 Answer the following questions:(a) Deﬁne a translation grammarGτ 1 for the translation τ 1deﬁned as follows on the source languageLs generated by grammarGs : τ1( ah ck dk bh ) =bh dk ck ah (b) Say if the inverse translationτ− 1 1is a function. In the positive case, provide a translation grammarGτ − 1 1for τ− 1 1. (c) Consider now the translationτ 2deﬁned as follows on the source language Ls generated by grammarGs : τ2( ah ck dk bh ) =( bh ck dk ah ifkis even bh dk ck ah ifkis odd By suitably modifying the source grammarGs , provide a translation grammar Gτ 2 that deﬁnes translation τ 2. (d) Draw the translation syntax tree for the sample translationτ 2below, according to grammarGτ 2 : τ2( a c d b) =b d c a 19 Solution(a) Here is the translation grammarGτ 1 for τ 1(axiom S): Gτ 1          S →a b Sb a S→T T→c d Td c T→ε ε (b) Translationτ 1simply computes the mirror image of the source string, i.e., τ1( x) =xR , and clearly is a function. The inverse function of mirroring is still mirroring, since xR R =x. Thus the inverse translation isτ− 1 1( x) =xR , and is a function as well. Therefore, a translation grammarGτ − 1 1for τ− 1 1is the one where the source and target parts of grammarGτ 1 are switched: Gτ − 1 1          S →b a Sa b S→T T→d c Tc d T→ε ε (c) Here is the (modiﬁed) translation grammarGτ 2 for τ 2(axiom S): Gτ 2                          S →a b Sb a S→E S→O E→c cc c Ed dd d E→ε ε O→c dc d Od cd c O→c dd ccompacted Gτ 2                          S →a b Sb a S→E S→O E→c c c c Ed d d d E→ε ε O→c c d d Od d c c O→c d d c (d) Here is the translation tree of the translation sample forτ 2: S a b S O cd dc ba The tree is drawn according to the non-compacted form of grammarGτ 2 . 20 2. Consider the grammar below, which represents a (non-empty) list of names separated by commas (axiomS):       1 : S→L 2 :L→n c L 3 :L→n where symbolnrepresents an alphabetical string terminal and symbolcis a comma. A valid string generated by the grammar could be: Bianchi, Rossi, Verdi We want to deﬁne an attribute grammar that computes whether the names are listed in alphabetical order. Answer the following questions (use the tables / trees / spaces on the next pages): (a) We want to check if the names are listed in alphabetical order. To this end, we deﬁne two attributesfando, as follows: • Attributefis synthesized (left) and of type string. It represents the string corresponding to the ﬁrst name in a list. To initialize this attribute, consider the availability of a functionstringthat, when applied to a terminal, returns the associated text in the parsed string. • Attributeois synthesized (left) and of type boolean. It represents whether the names in a list are ordered (true) or not (false). Write the semantic functions for computing the two attributes. You can assume that strings can be compared alphabetically by means of the usual relational operator “