An assembly language compiler for lab #3 cpu.
Author: Dima Shchelokov.
CONTENTS
I. Overview
II. Usage
III. Design and Implementation
The compiler, written in Perl, translates a program written in an assembly language into binary cpu instructions from lab #3. Besides simplifying system developer's task by providing text names for opcodes and registers, the compiler works with labels, macros, and comments, and checks for certain errors. A two-pass assembler, it successfully deals with forward reference and macro label duplication problems.
The program is executed by specifying "perl assmblr.pl" on the command line. The assmblr.pl file (thereafter called assembler) reads the assembly code from prog_lab3.asm and outputs the results to maxgram.inf, which can be used for lab #3 without any modifications to the code or format.
The following language constructs are available to the
programmer of prog_lab3.asm:
1. Specifying a binary command in text format.
Example:
LOAD R1 11100011
Example:
Load_the_Value:
LOAD R1 11100011
GOTO Load_the_Value
Note: BRN and BRZ instructions require an address offset
as an argument, and therefore have not been replaced by "branch label"-like
commands.
MACRO <name>
<param1> <param2> ...
<paramN>
<command
1>
.....
<command
M>
END <optional qualifier>
Optional qualifier can be any string of text serving as comments, such as a name of the macro to help the programmer distinguish between several END's more easily. Macro parameters can be either register names or 8-bit values. The body of the macro command can contain labels. However, implementation limitations dictate that the goto statements could point only to the labels from the same macro command body. Additionally, macros can't be called or defined inside of other macros.
A macro call syntax is intuitive as well. It consists of a macro name followed by actual parameters:
<name> <param1>
<param2> ... <paramN>
To describe the design and implementation process clearer, it is useful to break it down into four familiar parts:
This first phase of design and implementation process was to make a single pass compiler that reads the input line by line, substitutes the ascii instructions into binary, and saves the results. The compiler processes a line by looking up the binary translations of the opcode and registers in the hard coded opcodes and registers tables. If a command line field doesn't have an entry in the registers table, the register field in the binary instruction becomes "000".2. Labels and goto.When it comes to supporting labels, the compiler must do two passes because of the forward reference problem. The problem is that the label name and an address it points to must be stored in the labels table before the compiler comes across a statement that branches to that label. Since, some goto statements appear earlier in the code than the corresponding label marks, the compiler must, first, go through the file and read in all the labels before it can translate goto statements. During the first pass, the translator keeps track of an address of a current instruction through a variable, called ILC (Instruction Location Counter), so that when a label is encountered, the compiler knows the address of the instruction it labels.During the second pass, the compiler translates the label parameter in a goto statement to an address this label marks. The label statement is a pseudo instruction and is ignored in the second pass.
3. Macro definition, call, and expansion.
In the first pass, the compiler records macro definitions into a table. Furthermore, when it comes across a macro call, it looks up the table, creates a "macro expansion" and stores it in a queue. (To make it all work in a single pass, the definitions must precede the calls.) A macro expansion, for this assembler, is simply a macro body with formal parameters replaced by actual parameters and label names modified to be unique for this expansion. The compiler makes labels inside a certain macro expansion unique by concatenating that expansion's order number to the name of the label. It is quite easy to count how many expansions of a certain macro occurred.During the Pass 2, the translator skips macro definitions. To translate macro calls, for every macro call it shifts an expansion of a queue and then translates each line of that expansion into a cpu instructions.
4. Comments.Whenever the input lines are processed, comments are stripped off and then later concatenated for printing. Separating comments from code is very easy to do in Perl:
($code, $comments) = split (/;/, $line, 2);
The parameter '2' indicates that the line is split in maximum two parts by the first semi-colon, which is exactly what needed to separate the code and comments.
E22/CS23 Final project.
Dima Shcheokov, Spring 2000.