#### Mini-MIPS

CS/EE 3710 Fall 2019

From Weste/Harris CMOS VLSI Design

# Based on MIPS

- In fact, it's based on the multi-cycle MIPS from Hennessy and Patterson
  - Your CS/EE 3810 book...
- 8-bit version
  - 8-bit data and address
  - 32-bit instruction format
  - 8 registers numbered \$0-\$7
    - \$0 is hardwired to the value 0

# Instruction Set

| Table 1.7 MIPS in  | struction set (su | bset supported)                             |          |        |        |
|--------------------|-------------------|---------------------------------------------|----------|--------|--------|
| Instruction        | Function          |                                             | Encoding | ор     | funct  |
| add \$1, \$2, \$3  | addition:         | \$1 <- \$2 + \$3                            | R        | 000000 | 100000 |
| sub \$1, \$2, \$3  | subtraction:      | \$1 <- \$2 - \$3                            | R        | 000000 | 100010 |
| and \$1, \$2, \$3  | bitwise and:      | \$1 <- \$2 and \$3                          | R        | 000000 | 100100 |
| or \$1, \$2, \$3   | bitwise or:       | \$1 <- \$2 or \$3                           | R        | 000000 | 100101 |
| slt \$1, \$2, \$3  | set less than:    | \$1 <- 1 if \$2 < \$3<br>\$1 <- 0 otherwise | R        | 000000 | 101010 |
| addi \$1, \$2, imm | add immediate:    | \$1 <- \$2 + imm                            | I        | 001000 | n/a    |
| beq \$1, \$2, imm  | branch if equal:  | PC <- PC + imm <sup>a</sup>                 | I        | 000100 | n/a    |
| j destination      | jump:             | PC <- destination <sup>a</sup>              | J        | 000010 | n/a    |
| lb \$1, imm(\$2)   | load byte:        | \$1 <- mem[\$2 + imm]                       | I        | 100000 | n/a    |
| sb \$1, imm(\$2)   | store byte:       | mem[\$2 + imm] <- \$1                       | I        | 101000 | n/a    |

a. Technically, MIPS addresses specify bytes. Instructions require a four-byte word and must begin at addresses that are a multiple of four. To most effectively use instruction bits in the full 32-bit MIPS architecture, branch and jump constants are specified in words and must be multiplied by four (shifted left two bits) to be converted to byte addresses.

# Instruction Encoding

| Forma                 | at Example           | Encoding |      |    |     |    |       |  |  |
|-----------------------|----------------------|----------|------|----|-----|----|-------|--|--|
|                       |                      | 6        | 5    | 5  | 5   | 5  | 6     |  |  |
| R                     | add \$rd, \$ra, \$rb | 0        | ra   | rb | rd  | 0  | funct |  |  |
|                       |                      | 6        | 5    | 5  |     | 16 |       |  |  |
| I beq \$ra, \$rb, imm |                      | ор       | ra   | rb | imm |    |       |  |  |
|                       |                      | 6        |      |    | 26  |    |       |  |  |
| J                     | j dest               | ор       | dest |    |     |    |       |  |  |

#### FIG 1.49 Instruction encoding formats

## Fibonacci C-Code

int fib(void)

{

}

FIG 1.50 C code for Fibonacci program

return f1;

# Fibonacci C-Code

```
int fib(void)
```

{

}

```
while (n != 0) { /* count down to n = 0 */
  f1 = f1 + f2;
  f2 = f1 - f2;
  n = n - 1;
}
return f1;
```

#### **FIG 1.50** C code for Fibonacci program

| Cycle | 1: | f1 | = | 1 | + | ( - | -1) | =  | 0, | f2 | 2 = | = ( | ) - | _ | (-1) | = | 1 |
|-------|----|----|---|---|---|-----|-----|----|----|----|-----|-----|-----|---|------|---|---|
| Cycle | 2: | f1 | = | 0 | + | 1   | =   | 1, | f2 | =  | 1   | —   | 1   | = | 0    |   |   |
| Cycle | 3: | f1 | = | 1 | + | 0   | =   | 1, | f2 | =  | 1   | _   | 0   | = | 1    |   |   |
| Cycle | 4: | f1 | = | 1 | + | 1   | =   | 2, | f2 | =  | 2   | _   | 1   | = | 1    |   |   |
| Cycle | 5: | f1 | = | 2 | + | 1   | =   | 3, | f2 | =  | 3   | _   | 1   | = | 2    |   |   |
| Cycle | 6: | f1 | = | 3 | + | 2   | =   | 5, | f2 | =  | 5   | _   | 2   | = | 3    |   |   |

## Fibonacci Assembly Code

FIG 1.51 Assembly language code for Fibonacci program

Compute 8<sup>th</sup> Fibonacci number (8' d13 or 8' h0D) Store that number in memory location 255

# Fibonacci Machine Code

|       |              |      |       |        |       |        |        |        |        |   | Hexadecim | ar |
|-------|--------------|------|-------|--------|-------|--------|--------|--------|--------|---|-----------|----|
| Instr | ucti         | lon  |       | Binary | Encod | ing    |        |        |        |   | Encoding  |    |
| addi  | \$3,         | \$0, | 8     | 001000 | 00000 | 00011  | 0000   | 000000 | 00100  | 0 | 20030008  |    |
| addi  | \$4 <b>,</b> | \$0, | 1     | 001000 | 00000 | 00100  | 0000   | 000000 | 00000  | 1 | 20040001  |    |
| addi  | \$5 <b>,</b> | \$0, | -1    | 001000 | 00000 | 00101  | 1111   | 111111 | 111111 | 1 | 2005ffff  |    |
| beq   | \$3,         | \$0, | end   | 000100 | 00011 | 00000  | 0000   | 000000 | 000010 | 1 | 1060000 4 |    |
| add   | \$4,         | \$4, | \$5   | 000000 | 00100 | 00101  | 00100  | 00000  | 10000  | ) | 00852020  |    |
| sub   | \$5 <b>,</b> | \$4, | \$5   | 000000 | 00100 | 00101  | 00101  | 00000  | 10001  | 0 | 00852822  |    |
| addi  | \$3,         | \$3, | -1    | 001000 | 00011 | 00011  | 1111   | 111111 | 111111 | 1 | 2063ffff  |    |
| j loc | pp           |      |       | 000010 | 0000  | 000000 | 000000 | 000000 | 00001  | 1 | 08000003  |    |
| sb    | \$4 <b>,</b> | 255  | (\$0) | 101000 | 00000 | 00100  | 0000   | 00001  | 111111 | 1 | a00400ff  |    |
|       |              |      |       |        |       |        |        |        |        |   |           |    |

FIG 1.52 Machine language code for Fibonacci program

Assembly Code

Machine Code

## Architecture







101010

х

111

х

SLT

undefined

10

11



## Control FSM



CS/EE 3710

FIG 1.54 Multicycle MIPS control FSM. Reprinted from [Patterson04] with permission from Elsevier.

# **Connection to External Memory**

| Table 1.9        | Top-level in     | puts and outp                       | outs |
|------------------|------------------|-------------------------------------|------|
| Inputs           | •                | Outputs                             |      |
| ph1              |                  | adr[7:0]                            |      |
| ph2              |                  | writedata[7:                        | :0]  |
| reset            |                  | memread                             |      |
| memdata[7        | :0]              | memwrite                            |      |
| stal<br>cillator | ph1<br>→ ph2 MIF | memread<br>memwrite<br><b>&gt;S</b> | ,8   |



# External Memory from Book

// external memory accessed by MIPS
module exmemory #(parameter WIDTH = 8)
 (input clk,
 input memwrite,
 input [WIDTH-1:0] adr, writedata,
 output reg [WIDTH-1:0] memdata);

```
reg [31:0] RAM [(1<<WIDTH-2)-1:0];
wire [31:0] word;
```

```
// Initialize memory with program
initial $readmemh("memfile.dat",RAM);
```

// read and write bytes from 32-bit word always @(posedge clk) if(memwrite) case (adr[1:0]) 2'b00: RAM[adr>>2][7:0] <= writedata; 2'b01: RAM[adr>>2][15:8] <= writedata; 2'b10: RAM[adr>>2][23:16] <= writedata; 2'b11: RAM[adr>>2][31:24] <= writedata; endcase assign word = RAM[adr>>2]; always @(\*) case (adr[1:0]) 2'b00: memdata <= word[7:0]; 2'b01: memdata <= word[15:8]; 2'b10: memdata <= word[23:16]; 2'b11: memdata <= word[31:24]; endcase endmodule

#### Notes:

- Endianess is fixed here
- Writes are on posedge clk
- Reads are asynchronous
- This is a 32-bit wide RAM
- With 64 locations
- But with an 8-bit interface...

### Exmem.v

module **exmem** #(parameter WIDTH = 8, RAM\_ADDR\_BITS = 8)

- (input clk, en,
- input memwrite,
- input [RAM\_ADDR\_BITS-1:0] adr,
- input [WIDTH-1:0] writedata,
- output reg [WIDTH-1:0] memdata);
- reg [WIDTH-1:0] mips\_ram [(2\*\*RAM\_ADDR\_BITS)-1:0];

```
initial $readmemb("fib.dat", mips_ram);
```

```
always @(posedge clk)
if (en) begin
if (memwrite)
    mips_ram[adr] <= writedata;
    memdata <= mips_ram[adr];
    end</pre>
```

#### endmodule

- •This is synthesized to a Block RAM on the Altera FPGA
- It's 8-bits wide
- With 256 locations
- Both writes and reads are clocked

#### Exmem.v

module exmem #(parameter WIDTH = 8, RAM\_ADDR\_BITS = 8)

- (input clk, en,
- input memwrite,
- input [RAM\_ADDR\_BITS-1:0] adr,
- input [WIDTH-1:0] writedata,
- output reg [WIDTH-1:0] memdata);
- reg [WIDTH-1:0] mips\_ram [(2\*\*RAM\_ADDR\_BITS)-1:0];

initial \$readmemb("fib.dat", mips\_ram);

```
always @(posedge clk)
if (en) begin
if (memwrite)
mips_ram[adr] <= writedata;
memdata <= mips_ram[adr];
end</pre>
```

This is synthesized to a Block RAM on the Altera FPGA

Note clock!

#### endmodule

# Recall – Overall System



# Recall – Overall System



So, what are the implications of using a RAM that has both clocked reads and writes instead of clocked writes and async reads? (we'll come back to this question...)

# mips Block Diagram





```
// simplified MIPS processor
module mips #(parameter WIDTH = 8, REGBITS = 3)
  (input clk, reset,
      input [WIDTH-1:0] memdata,
      output memread, memwrite,
      output [WIDTH-1:0] adr, writedata);
wire [31:0] instr;
```

```
wire zero, alusrca, memtoreg, iord, pcen, regwrite, regdst;
wire [1:0] aluop,pcsource,alusrcb;
wire [3:0] irwrite;
wire [2:0] alucont;
```

### Controller

module controller(input clk, reset, [5:0] op, input input zero, output reg memread, memwrite, alusrca, memtoreg, iord, output pcen, regwrite, regdst, output reg output reg [1:0] posource, alusrob, aluop, output reg [3:0] inwrite); parameter FETCH1 = 4 b0001; FETCH2 = 4<sup>r</sup>b0010; parameter FETCH3 = 4'b0011;parameter parameter  $FETCH4 = 4^{\circ}b0100;$ DECODE = 4<sup>°</sup>b0101; State Codes parameter MEMADR =  $4^{\circ}b0110$ : parameter LBRD = 4<sup>c</sup>b0111; parameter LBWR = 4'b1000; parameter SBWR = 4<sup>c</sup>b1001; parameter  $RTYPEEX = 4^{\prime}b1010$ : parameter RTYPEWR = 4'b1011;parameter parameter BEQEX = 4'b1100; JEX = 4<sup>c</sup>b1101; parameter LB = 6'b100000; parameter SB  $= 6^{\circ}b101000$ : parameter Useful constants to compare against RTYPE = 6'b0; parameter BEQ  $= 6^{\circ}b000100$ : parameter = 6'b000010;.I parameter reg [3:0] state, nextstate; powrite, powritecond; reg // state register always @(posedge clk) State Register if(reset) state <= FETCH1: else state <= nextstate:

## Control FSM



CS/EE 3710

FIG 1.54 Multicycle MIPS control FSM. Reprinted from [Patterson04] with permission from Elsevier.

### Next State Logic

```
♭// next state logic
  always @(*)
     begin
        case(state)
           FETCH1: nextstate <= FETCH2;
           FETCH2: nextstate <= FETCH3;
           FETCH3: nextstate <= FETCH4:
           FETCH4: nextstate <= DECODE:
           DECODE: case(op)
                       LB:
                                nextstate <= MEMADR:
                       SB:
                                nextstate <= MEMADR:
                       RTYPE:
                                nextstate <= RTYPEEX;
                       BEQ:
                                nextstate <= BEQEX;
                                nextstate <= JEX;
                       J:
                       default: nextstate <= FETCH1; // should never happen
                    endcase
           MEMADR:
                    case(op)
                       LB:
                                nextstate <= LBRD;
                       SB:
                                nextstate <= SBWR:
                       default: nextstate <= FETCH1; // should never happen
                    endcase
           LBRD:
                    nextstate <= LBWR:
           LBWR:
                    nextstate <= FETCH1;
           SBWR:
                    nextstate <= FETCH1:
           RTYPEEX: nextstate <= RTYPEWR;
           RTYPEWR: nextstate <= FETCH1;
           BEQEX:
                    nextstate <= FETCH1;
           JEX:
                    nextstate <= FETCH1;
           default: nextstate <= FETCH1; // should never happen
        endcase
     end
```

#### **Output** Logic always @(\*) begin // set all outputs to zero, then conditionally assert Very common way // just the appropriate ones irwrite <= 4'b0000: to deal with default pcwrite <= 0; pcwritecond <= 0;</pre> regwrite <= 0; regdst <= 0; values in combinational memread <= 0; memwrite <= 0;</pre> alusrca <= 0; alusrcb <= 2~b00; aluop <= 2~b00; posource <= 21b00; Always blocks iord <= 0; memtoreg <= 0; case(state) FETCH1: begin memread <= 1; irwrite <= 4"b0001; // changed to reflect new memory and alusrcb <= 2~b01; // get the IR bits in the right spots pcwrite <= 1: // FETCH 2,3,4 also changed... end FETCH2: begin memread <= 1; irwrite <= 4'b0010; alusrcb <= 2'b01: powrite <= 1; end FETCH3: begin memread $\leq 1:$ irwrite <= 4'b0100: alusrcb <= 2°b01; Continued for the other states... powrite <= 1: end

# Output Logic

```
SBWR:
                 begin
                    memwrite <= 1;
                    iord
                             <= 1:
                 end
              RTYPEEX:
                                                 Two places to update the PC
                 begin
                    alusrca <= 1;
                                                 pcwrite on jump
                           <= 2'b10:
                    aluop
                 end
                                                 pcwritecond on BEQ
              RTYPEWR:
                 begin
                    regdst
                           <= 1;
                    regwrite <= 1;
                 end
              BEQEX:
                 begin
                    alusrca
                               <= 1:
                               <= 2<sup>r</sup>b01;
                    aluop
                    powritecond <= 1:
                               <= 2<sup>r</sup>b01:
                    posource.
                 end
              JEX:
                                                  Why AND these two?
                 begin
                    powrite <= 1;
                    posource <= 21b10:
                 end
        endcase
     end
  assign pcen = pcwrite | (pcwritecond & zero); // program counter enable
endmodule
```



| aluop | funct  | alucontrol | Meaning   |  |  |
|-------|--------|------------|-----------|--|--|
| 00    | x      | 010        | ADD       |  |  |
| 01    | x      | 110        | SUB       |  |  |
| 10    | 100000 | 010        | ADD       |  |  |
| 10    | 100010 | 110        | SUB       |  |  |
| 10    | 100100 | 000        | AND       |  |  |
| 10    | 100101 | 001        | OR        |  |  |
| 10    | 101010 | 111        | SLT       |  |  |
| 11    | x      | x          | undefined |  |  |



#### zerodetect

```
assign y = (a==0);
endmodule
```

# Register File

```
module regfile #(parameter WIDTH = 8, REGBITS = 3)
                                       clk,
                (input
                 input
                                       regwrite,
                 input [REGBITS-1:0] ral, ra2, wa,
                 input [WIDTH-1:0] wd,
                 output [WIDTH-1:0] rd1, rd2);
   reg [WIDTH-1:0] RAM [(1<<REGBITS)-1:0];</pre>
   // three ported register file
   // read two ports combinationally
   // write third port on rising edge of clock
   // register 0 hardwired to 0
   always @(posedge clk)
      if (regwrite) RAMEwa] <= wd;
                                         What is this synthesized
                                         into?
   assign rd1 = ra1 ? RAM[ra1] : 0;
   assign rd2 = ra2 ? RAMEra2] : 0;
endmodule
```



# Datapath continued



# Flops and MUXes

```
module flop #(parameter WIDTH = 8)
                                                     module mux2 #(parameter WIDTH = 8)
              (input
                                         clk.
                                                                 (input EWIDTH-1:0] d0, d1,
                                                                 input
               input
                           [WIDTH-1:0] d.
                                                                                   s,
                                                                 output [WIDTH-1:0] y);
               output reg [WIDTH-1:0] q);
                                                       assign y = s ? d1 : d0;
   always @(posedge clk)
                                                     endmodule
      a <= d;
                                                     module mux4 #(parameter WIDTH = 8)
endmodule
                                                                 (input
                                                                           [WIDTH-1:0] d0, d1, d2, d3,
                                                                 input
                                                                           [1:0]
                                                                                       s,
module flopen #(parameter WIDTH = 8)
                                                                 output reg [WIDTH-1:0] y);
                 (input
                                           clk, en,
                                                       always @(*)
                  input
                              [WIDTH-1:0] d.
                                                          case(s)
                 output reg [WIDTH-1:0] q);
                                                             2″b00:y<=d0;
                                                             2′b01: y <= d1;
   always @(posedge clk)
                                                             2′b10: y <= d2;
      if (en) q <= d:
                                                             2′b11: y <= d3;
                                                          endcase
endmodule
                                                     endmodule
module flopenr #(parameter WIDTH = 8)
                                            clk. reset. en.
                  (input
                   input
                               [WIDTH-1:0] d,
                   output reg [WIDTH-1:0] q);
   always @(posedge clk)
      if
               (reset) q \leq 0;
                        a <= d:
      else if (en)
endmodule
         CS/EE 3710
```

# Back to the Memory Question

- What are the implications of using RAM that is clocked on both write and read?
  - Book version was async read
  - So, let's look at the sequence of events that happen to read the instruction
  - Four steps read four bytes and put them in four slots in the 32-bit instruction register (IR)

# Instruction Fetch



# Instruction Fetch



# Instruction Fetch

| <pre>// independent of bit width, load instruction into four<br/>// 8-bit registers over four cycles<br/>flopen #(8) ir0(clk, irwrite[0], memdata[7:0], instr[7:0]);<br/>flopen #(8) ir1(clk, irwrite[1], memdata[7:0], instr[15:8]);<br/>flopen #(8) ir2(clk, irwrite[2], memdata[7:0], instr[23:16]);<br/>flopen #(8) ir3(clk, irwrite[3], memdata[7:0], instr[31:24]);</pre> |  |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| module flopen #(parameter WIDTH = 8)<br>(input clk, en,<br>input [WIDTH-1:0] d,<br>output reg [WIDTH-1:0] q);                                                                                                                                                                                                                                                                   |  |
| always@(posedge clk)<br>if (en) q <= d;<br>endmodule                                                                                                                                                                                                                                                                                                                            |  |

- Memread, irwrite, addr, etc are set up just after clk edge
- Data comes back sometime after that (async)
- Data is captured in ir0 ir3 on the next rising clk edge
- How does this change if reads are clocked? CS/EE 3710

#### mips + exmem



mips is expecting async reads exmem has clocked reads

One of those rare cases where using both edges of the clock is useful!

# Memory Mapped I/O

- Break memory space into pieces (ranges)
  - For some of those pieces: regular memory
  - For some of those pieces: I/O
    - That is, reading from an address in that range results in getting data from an I/O device
    - Writing to an address in that range results in data going to an I/O device

## Mini-MIPS Memory Map



# **Enabled Devices**



(b) Single-Port

Only write to that device (i.e. enable it) if you're in the appropriate memory range.

#### Check top two address bits!

#### **MUXes for Return Data** module mux2 #(parameter WIDTH = 8) (input [WIDTH-1:0] d0, d1, input output [WIDTH-1:0] y); Op IRWrite[3:0] [5:0] assign y = s ? d1 : d0; endmodule Instruction [5:0] Instruction [31:26] 0 Instruction М [25:21] Address u х Instruction [20:16] MemData Instruction M [15:0] Instruction u Write [15:11] х Use MUX to decide if Instruction data Memory register data is coming from memory Instruction [7:0] М u or from I/O х Memory data register Check address bits! **CS/EE 3710**

# Lab2 in a Nutshell

- Understand and simulate mips/exmem
  - Add ADDI instruction
  - Fibonacci program correct if 8'h0d is written to memory location 255
- Augment the system
  - Add memory mapped I/O to switches/LEDs
  - Write new Fibonacci program
  - Simulate in Quartus
  - Demonstrate on your board

# My Initial Testbench...

timescale 1ns / 1ps

module mips mem mips mem\_sch\_tb();

// Inputs

2

3 4 5

6 7

8

9

10 11

12

13 14

15 16

18

19

20

21

22

23

24

25

26

reg clk;

reg reset;

#### // Output

wire memread;

#### // Bidirs

```
// Instantiate the UUT
17
       mips mem UUT (
           .clk(clk),
           .memread(memread),
           .reset(reset)
       );
    // Initialize Inputs
           initial begin
```

```
reset <= 1;
                    29
#22 reset <= 0;
                    30
end
                    31
```

27

28

32

33 34

35

36

37

38

39

40

41

42

```
// Generate clock to sequence tests
always
begin
   clk <= 1; # 5; clk <= 0; # 5;
end
// check the data on the memory interface between mips and exmem
// If you're writing, and the address is 255, then the data should
// be 8'hOd if you've computed the correct 8th Fibonacci number
always@(negedge clk)
if (UUT.memwrite)
   if (UUT.addr == 8'd255 & UUT.wdata == 8'h0d)
      $display("Yay - Fibonacci completed succesfully!");
```

else \$display("Oops - wrong value written to addr 255: %h", UUT.wdata);