1. Two’s Complement Arithmetic Compute the following additions assuming the numbers are 8-bit 2’s complement numbers. Show the result, and indicate which, if any, cause overflow. Also give the decimal equivalent of the answer in each case.

\[
\begin{array}{ccc}
\text{A)} & 01001010 & + & 11110110 \\
\text{B)} & 10010110 & + & 11001101 \\
\text{C)} & 10101101 & + & 11011110 \\
\end{array}
\]

2. Number Conversions Fill in the blanks in the following table that shows numbers in binary, octal, and hexadecimal forms:

<table>
<thead>
<tr>
<th></th>
<th>Binary</th>
<th>Octal</th>
<th>Hexadecimal</th>
</tr>
</thead>
<tbody>
<tr>
<td>A</td>
<td>1010100101111000010</td>
<td></td>
<td></td>
</tr>
<tr>
<td>B</td>
<td>7343420</td>
<td></td>
<td>6A9CE</td>
</tr>
<tr>
<td>C</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

3. RISC vs. CISC The 80x86 architecture has many instructions that are quite complex. The modern vogue in computer architecture says that simpler instructions are better, but the 80x86 continues to thrive. List at least two advantages and two disadvantages of a CISC architecture like 80x86 relative to a RISC machine like MIPS.
4. State Machine Implementation
Given the following state diagram, and state encoding, implement the state machine using D flip flops.

Inputs are I1, I0
Output is Z
State bits are P1, P0

(a) Fill in the state table including present state, next state, and output information.
(b) Fill in Karnaugh maps for the next-state input values for each flip flop, and the output. Circle the terms in the K-map and write the optimized boolean equations for each of the state variables and output values.
(c) Write Verilog code to implement this state machine.
5. Circuit Timing

Consider a generic sequential circuit with combinational logic connected to an edge-triggered register, and the output of that register fed back to the inputs of the combinational logic.

The specs are:

- Combinational logic propagation delay $T_{pd} = 100\text{ns}$
- Combinational logic contamination delay $T_{cd} = 30\text{ns}$ *Contamination delay is the minimum interval following an input change during which the previous outputs are guaranteed to remain valid*
- Register propagation delay $T_{pd} = 13\text{ns}$
- Register setup time $T_{su} = 20\text{ns}$
- Register hold time $T_{h} = 5\text{ns}$

Answer the following questions:

(a) What is the minimum clock period that will ensure correct operation of the circuit?

(b) By how long must any change in inputs precede the next clock edge?

(c) There is a window in time with respect to the clock edge when the inputs are allowed to change without violating timing restrictions. Define this window in time with respect to the clock edge. That is, when can the inputs change with respect to the clock?

(d) What is the smallest time after the clock edge that outputs can be expected to be valid?

(e) What is the smallest time after a clock edge that you expect the outputs to start changing?
6. Assembly Code I

For scientific computing applications, floating point performance is often a major factor that decides the purchase of one machine over another. To measure performance, the Linpack benchmark is often used to compare machines. This benchmark involves a lot of matrix operations, such as matrix multiply and Gaussian elimination. Below is an assembler routine for multiplying two \( n \times n \) matrices together; it is the basis for the next questions. Note that it was not written for any particular processor and it has not been tested; don’t worry about details of the code. It implements a triply nested loop, each of which executes \( n \) times.

Matrix multiply

\[
\text{Calculates } C = A \times B \text{ where } A \text{ and } B \text{ are } n \times n \text{ matrices}
\]

\[
\text{Recall: } C[i,j] = \sum_{k=0}^{n-1} A[i,k] \times B[k,j]
\]

Input: R1 = Address of matrix A
R2 = Address of matrix B
R3 = Address of matrix C
R4 = Size of matrix N
Output: C = A \times B

```
matrix_multiply :
    ADD R10,R3,R0 ; R10 = Address of C[0,0]
    ADD R11,R1,R0 ; R11 = Address of A[0,0]
    SLL R19,R4,#2 ; R19 = 4 \times N (row address increment)
    ADD R5,R4,R0 ; R5 = Loop Counter for i
    loopi: ; for (i = 0; i < N; i++) {
        ADD R6,R4,R0 ; R6 = Loop counter for j
        ADD R12,R2,R0 ; R12 = Address of B[0,0]
        loopj: ; for (j = 0; j < N; j++) {
            ADD R7,R4,R0 ; R7 = Loop counter for k
            ADD R20,R11,R0 ; Move address of A[i,0] to R20
            ADD R21,R12,R0 ; Move address of B[0,j] to R21
            LW F4,0(R20) ; F4 = A[i,k]
            LW F5,0(R21) ; F6 = B[k,j]
            FMUL F3,F4,F5 ; F3 = A[i,k] \times B[k,j]
            FADD F2,F2,F3 ; F2 += F3 (add to sum)
            ADDI R20,R20,#4 ; R20 = &A[i,k+1]
            ADDI R21,R21,R19 ; R19 = &B[k+1,j]
            SUBI R7,R7,#1 ; Decrement k loop counter
            BNE loopk ; Loop if R7 != 0
            endk: ; }
        SW 0(R10), F2 ; Save C[i,j]
        ADD R10,R10,#4 ; Move to next C[i,j]
        ADD R11,R11,#4 ; Move to next A[i,j]
        ADD R12,R12,#4 ; Move to next B[i,j]
        SUBI R6,R6,#1 ; Decrement j loop counter
        BNE loopj ; Loop if R6 != 0
        endj: ; }
    ADD R11,R11,R19 ; R11 = &A[i+1,0]
    SUBI R5,R5,#1 ; Decrement i loop counter
    BNE loopi ; Loop if R5 != 0
    endi: ; }
    RET
```

Evaluation

Suppose that matrix multiply is to be run on a machine which has the following CPI (Cycles per Instruction) for different types of instructions:

<table>
<thead>
<tr>
<th>Instruction Type</th>
<th>CPI</th>
</tr>
</thead>
<tbody>
<tr>
<td>All integer operations (ADD, BNZ, LW)</td>
<td>1</td>
</tr>
<tr>
<td>Floating point add/subtract (FADD, FSUB)</td>
<td>5</td>
</tr>
<tr>
<td>Floating point multiply (FMUL)</td>
<td>10</td>
</tr>
</tbody>
</table>

(a) Suppose that a matrix multiply is performed with the above code for \( N=100 \). Calculate the total number of clock cycles spent executing the program.

(b) Suppose the clock rate is 100 Mhz. What is the execution time in seconds of the matrix multiply for \( N=100 \)?

(c) What is the Mflops rating of the program?
(d) The vendor of the machine claims that the peak floating point performance is 20 Mflops. How does your answer in part (iii) compare to this and how do you explain any discrepancy?

7. Assembly Code II

(a) Load/Store Code Write an assembly language instruction sequence that is equivalent to the following fragment of C code. Assume the variables are initially in memory, but not in registers. (i and j have been declared as word variables and have been initialized) Also, your code must update the variables in memory after the loop has been executed. You may use any registers that you need, and you may assume that no overflow results from the multiplication. You may use any type of load/store assembly language that you like. For example, you might use MIPS assembly code, or the same style of assembly code as in the previous question. Make sure to comment your code though!

```c
while ( i > j + 2 )
    { i = i - 1;
      j = j * 2;
    }
```