Forward Reference: - A forward reference of a program entity is a reference to the entity which precedes its -------------- in the program.
Language processor pass: - A language processor pass is the processing of every statement in a source program to perform a language processing function.
Intermediate representation:- An intermediate representation is a representation of a source program which reflects the effect of some, but not all, analysis and synthesis tasks performed during language processing.
The first pass performs analysis of source program and gives its results in IR.The second pass reads and analysis the IR instead of source program. This avoids repeated processing of source program. Desirable properties of IR are:-
1.Ease of use-:Should be easy to construct and analyse.
2.Memory efficiency-:IR should be compact.
3.Processing efficency-:Efficients algorithms should exist.
A simple assembly language or assembly language:-
In this language, each statement has two operands .The first operand is always a register Which can be any one of AREG, BREG, CREG and DREG. The second operand refers to a memory word using a symbolic name and an optional displacement.The figure lists the mnemonic opcodes for machine instructions.
• The MOVE instruction moves a value b/w a memory word and a register. In the MOVER instruction, the second operand is the source operand and the first operand is the target operand. Converse is true for MOVEM instruction.
• All arithmetic is performed in a register i.e. the results replaces the contents of a register and sets a condition code.
• A comparison instruction sets a condition code analogous to a instruction without affecting values of its operands.
• The condition code can be tested by a French on code instruction. The assembly language instruction corresponding to BC has format:-
            BC [condition code spec],[memory address]
It transfers control to the memory word with the address [memory address] if current value of condition code matches [condition code spec]. For simplicity, we assume
• In a machine language program, we show all addresses and constants in decimal form.
The fig. shows machine inst. Format corresponding to an assembly language instruction .The opcode, register operand and memory operand occupy 2,1and3 digits respectively. The sign is not a part of instruction.
Assembly language Statements:- An assembly language contains three kinds of statements:-
1. Imperative statements
2. Declaration statements
3. Assembler directives
1. Imperative statements:- An imperative statement indicates an action to be performed during the execution of the program. Each imperative statement translates into one machine instruction.
2. Declaration statements:- The syntax of declaration statement is as follows:
            [Label] DS
            [Label] DC
The DS is declare storage. The DS statement reserves areas of memory and associates names with them.
For eg:- The first statement reserves a memory area of 1 word and associates the name A with it.
The second statement reserves a block of 200 memory words. The name G is associated with first memory word. Other words can be accessed through offsets from G.
e.g. :- G+5 for 6th word of memory block etc.
The DC is short for declare constants, and DC statements constructs memory words containing constants.
The statement:-
           ONE DC ‘1’.
       associates the name ‘one’ with a memory word containing the value ‘1’. Constants can be declared by the programmer in different forms-decimal, binary, hexadecimal.
Use of Constants: - The DC does not really implement constants, it just initializes memory words to the given words to the given values. These values may be changed by moving a new value into the memory word. An assembly program can use constants in two ways – as immediate operands and as literals.
The immediate operands can be used in an assembly statement only if architecture of target machine includes the necessary features.
Consider the assembly statement
             ADD AREG,5
This statement is translated into an introduction with two operands-AREG and value ‘5’ as an immediate operand.
• A literal is an operand with the syntax=’[value]’
It differs from a constant because its location cannot be specified in the assembly program. Due to this fact, its value is not changed during the execution of the program. It differs from an immediate operand because no specific architecture is needed for its use. An assembler handles a literal by mapping its use into the features of assembly language.
3. Assembler Directives:- Assembler directives instruct the assembler to perform certain activities during the assembly of a program. Some assembler directives are:-
             1 START [constant]
This directive indicates that the first word of the target program should be placed in the memory word with address[constant].
             2 END [operand spec]
This directive indicates the end of the source program. The [operand spec] indicatesthe address of the inst. where execution of the program begin.
Advantages of Assembly Language: - The main advantages of assembly language programming arise from the use of symbolic operand specifications. If there is some change in program or some statement is inserted in a program, then in machine language program, this leads to changes in addresses of constants and memory areas. As a result addresses used in most instructions of the program had to change. But such changes need not to be made in assembly language program as operand specifications are symbolic in nature.
Also, assembly language holds an advantage over high level languages programming in situations where it is necessary to use specific architectural features of a computer.
e.g. special instructions supported by CPU
DESIGN SPECIFICATION OF AN ASSEMBLER
To develop a design specification for an assembler a four step approach is used1. Identify the information necessary to perform a task.
2. Design a suitable data structure to record the information.
3. Determine the processing necessary to obtain & maintain the information.
4. Determine the processing necessary to perform a task.
Here mainly two phases are involved:-
1. Synthesis phase
2. Analysis phase
The fundamental info requirements arise in the synthesis phase. After this , it is decided that whether this info should be collected during analysis or derived during analysis.
Synthesis phase:- Consider the assembly statement
             MOVER BREG, ONE
To synthesize machine instruction corresponding to this statement, we must have the following information:-
1 Address of the memory word with which name ‘one’ is associated.
2.Machine operation code corresponding to mnemonic MOVER.
The first item of info. Depends on the source program. So, it must be made available by analysis phase.
The second item does not depends on source program, it depends on the assembly language. Hence the synthesis phase can determine this information Itself.
During synthesis phase, two data structures are used:-
1.) Symbol Table
2.) Mnemonics Table
The symbol table’s each entry has two fields:- name and address. This table is built by analysis phase. An entry in the mnemonics table has two fields:- mnemonic and opcode.
By using symbol table, synthesis phase obtains the machine address with which name is associated.
And by using Mnemonics table, synthesis phase obtains machine opcode corresponding to a mnemonic.
Analysis phase:- The primary function performed by the analysis phase is the building of the symbol table. For this purpose, it must determine the addresses with which symbolic names used in a program are associated. Some addresses can be determined directly while some other must be inferred.
Now, to determine the address of an element, we must fix the addresses of all elements preceding it. This is called memory allocation. To implement memory allocation, a data structure called Location Counter (LC) is introduced. The location counter contains the address of the next memory word in the target program. LC is initialized to the constant specified in the START statement.
To update the contents of LC, analysis phase needs to know the lengths of different instructions. This info. Depends on the assembly language. To include this info. Mnemonic table can be extended and a new field called as length is used for this.
The processing involved to maintain the LC is called LC processing. The mnemonic table is a fixed table and merely accessed by analysis and synthesis phases. The Symbol Table is constructed during analysis and used during synthesis.
Tasks performed by Analysis Phases:-
(1)    Isolate the label, mnemonic opcode and operand fields of a statement.(2)    If a label is present, enter the pair (symbol,
(3)    Check validity of mnemonic opcode by look-up in Mnemonics table.
(4)    Perform LC processing i.e. update value contained in LC.
Tasks performed by Synthesis Phase:-
(1)    Obtain the machine opcode corresponding to mnemonics.(2)    Obtain address of memory operand from symbol table.
(3)    Synthesize the machine form of a constant, if any.
Pass Structure of Assemblers:- The pass of a language processor is one complete scan of the source program. There are mainly tow assembly schemes:-
(1)    Two pass assembly scheme. (2)    One pass assembly scheme.
(1) Two Pass Translator / Two Pass Assembler:- The two pass translator of an assembly language program can handle forward references easily. The LC processing is performed in the first pass and also the symbols defined in the program are entered into the symbol table in this pass. The second pass synthesizes the target form using the address information found in the symbol table. The first pass performs analysis of the source program while the second pass performs the synthesis of the target program. The first pass constructs an intermediate representation (IR) of the source program for use by the second pass. This representation consists of two main components – data structures e.g. Symbol table, and a processed form of source program. The processed form of source program is also called as Intermediate Code (IC).
IMAGE
(2) Single Pass Translation / Single Pass Assembler:- In a single pass assembler, the LC processing and construction of symbol table proceeds as in two pass assembler. The problem of forward references is tackled using a process called back patching.
Initially, operand field of an instruction containing a forward reference is left blank. When the definition of forward referenced symbol in encountered, its address is put into this field which is left blank initially.
Consider the following statement:-
            MOVER   BREG,   ONE
The instruction corresponding to this statement can be only partially synthesized as ONE is a forward reference. So, the instruction opcode and address of BREG will be assembled to reside in location 101 and the insertion of second operand’s address at a later stage can be indicated by adding an entry of the Table of Incomplete Instructions (TII). This entry is a pair ([instruction address], [symbol]). e.g. (101, ONE) in this case.
By the time, the END statement is processed, the symbol table would contain the addresses of all symbols defined in the source program and TII would contain info. Describing forward references. The assembler can now process each entry in TII to complete the concerned instruction.
Design of a Two Pass Assembler:- Tasks performed by the passes of a two pass assembler are:-
PASS 1:-
(1)    Separate the symbol, mnemonic opcode and operand fields.(2)    Build the symbol table.
(3)    Perform LC processing.
(4)    Construct intermediate representation
PASS 2:-
The Pass 1 performs analysis of the source program and synthesis of the intermediate representation while Pass 2 processes the intermediate representation (IR) to synthesize the target program. Before the details of design of assembler passes, we should know about advanced assembler directives.Advanced Assembler Directives:-
(i) ORIGIN:- The syntax of this directive is     ORIGIN      [address spec]
where [address spec] is an [operand spec] or [constant]. This directive indicates that LC should be set to the address given by [address spec]. The ‘ORIGIN’ statement is useful when the target program does not consist of consecutive memory words. The ability to use an
(ii) EQU:-
   [symbol]   EQU   [address spec]
where [address spec] is an [operand spec] or [constant]. The EQU statement defines the symbol to represent [address spec]. This differs from DC/DS statement as associates the name [symbol] with [address spec].
LTORG:-
The LTORG statement permits a programmer to specify where literals should be placed. By default, assembler places the literals after the END statement. At every LTORG statement, the assembler allocates memory to the literals of a literal pool. This pool contains all the literals used in the program. The LTORG directive has less relevance (applicapability) for the simple assembly languages because allocation of literals at intermediate points in the program is efficient rather than at the end.
Pass 1 of the Assembler:- Pass 1 uses the following data structures:-
OPTAB: - A table of mnemonic opcodes and related info.
SYMTAB: - Symbol Table.
LITTAB: - A table of literals used in the program.
OPTAB contains the fields mnemonic opcode, class and mnemonic information. The ‘class’ field indicates whether the opcode corresponds to an imperative statement (IS), a declaration statement (DL) or an assembler directive (AD). If an imperative statement is present, then the mnemonic info. Filed contains the pair (machine opcode, instruction length) else it contains the pair id of a routine to handle the declaration or directive statement.
SYMTAB contains the fields address and length.
LIMTAB contains the fields literal and address.
- The processing of an assembly statement begins with the processing of its label field. If it contains a symbol, the symbol
and the value in LC is copied into a new entry of SYMTAB.
- After this, the class field is examined to determine whether the mnemonic belongs to the class of imperative, declaration or
assembler directive statements.
- If it is an imperative statement, then length of the machine instruction is simply added to the LC.
- If a declaration or assembler directive statement is present, then the routine mentioned in the mnemonic info. Field is called
to perform appropriate processing of the statement.
- The LITTAB is used to collect all literals used in the program. The awareness of different literals pools in maintained by an
auxiliary table POOLTAB. This table contains the literal no. of starting literal of each literal pool.
- At any stage, the current literal pool is the last pool in LITTAB. On encountering an LTORG statement (END), literals in the
current pool are allocated addresses starting with current value in LC and LC is incremented appropriately.
IMAGE
Pass 1 Algorithm: -
(1) Loc ctr = ‘0’ (default value)       Littab ptr = ‘1’
       Pool tab ptr = ‘1’
(2) While the next statement is not END statement.
        (a) If label is present then
             This label = get the name of label
              Store [this label, loc counter] in SYMTAB
        (b) If an LTORG statement then
            (i) Process literals in LITTAB and allocate memory.
           (ii) Pooltab ptr = Pool tab ptr + 1;
           (iii) Littab ptr = Littab ptr +1;
         (c) If instruction is START or ORIGIN then
               Loc counter = value specified in operand field;
         (d) If an EQU statement then
           (i) this . address = value of [address spec];
           (ii) Store [this . label, this . address] in Symbol Table.
         (e) If a declaration statement then (DC/DS)
           (i) Code = code of declaration statement;
           (ii) Size = size of memory area required by DC/DS.
           (iii) Loc counter =
         (f) If imperative statement then
           (i) Code = machine opcode from OPTAB
           (ii) Loc ctr = loc ctr + instruction length from OPTAB
           (iii) Generate Intermediate Code (IS,code);
(3) (Processing of END statement)
      (a) Perform step 2(b)
      (b) Generate IC (AD,02)
      (c) Go to Pass 2
Intermediate Code forms: - The intermediate code cinsistes of a set of IC units, each IC unit consisting of following three fields: -
(1) Address.
(2) Representation of mnemonic op code.
(3) Representation of operands.
SMALL IMAGE
There are generally two criteria for choice of intermediate code (IC) viz—processing efficiency and memory economy. There arise some variants of intermediate code. The variant forms of intermediate codes arises mainly in operand and address fields due to trade off between processing efficiency and memory economy. The info. In mnemonic fields same in all the variants.
- The mnemonic field contains a pair of the form
Here, statement class can be one of the imperative (IS) (D2) declaration and assembler directive (AD) statement resp. for imperative statement (IS), code is the instruction opcode in machine language.
For DL and AD, code in an ordinal number within class.
IMAGE
Variant 1: - The first operand is represented by a single digit number which is a code for register or (i.e. 1-4 for AREG – DREG).
The second operand, which is memory operand, is represented by a pair of form
               (operand class, code)
where operand class is one of C, S and L standing for constant, symbol and literal resp.
- For a constant, code field contains internal representation of constant itself.
- For symbol or literal, code field contains the ordinal no. of operand’s entry in SYMTAB or LITTAB.
Variant 2: - This variant differs from variant 1 in that the operand fields of source statements are replaced by their processed forms.
For declarative statements and assembler directives, processing of operand fields contain processed forms. For imperative statements, the operand field is processed only to identify literal references. Variant 2 reduces the work of Pass 1 by transferring burden of operand processing from Pass 1 to pass 2.
IMAGE
Pass 2 Algorithm: -
(1) Code Area Address = address of code area (where target code is to be enabled)       Pool tab ptr = ‘1’
       Loc ctr = o (defined)
(2) While the next statement is not an END statement.
      (a) Clear memory buffer area.
      (b) If an LTORG statement.
          (i) Process the literals and assemble the literals in memory buffer.
          (ii) Size = size of memory area reqd. for literals.
          (iii) Pooltab ptr = Pool tab ptr + 1
(c) If a START or ORIGIN statement then
          Loc ctr = value specified in operand field.
          size = 0
(d) If a declaration statement
          (i) If a DC statement
              Assemble the constant in memory buffer
          (ii) If DS statement
               Generate machine code
          Size = size of memory req. by DC/DS
(e) If an imperative statement
          (i) Assemble instruction in memory buffer.
          (ii) Size = size reqd. to store instruction
(f) If size != 0
          Store the memory buffer code in code area address.
          Loc ctr = loc ctr + size
(3) END statement
         (a) Perform steps 2(a) and 2(f)
         (b) Write code area into o/p file.