6. Micro-architecture Speedup Notes

These notes elaborate on the speedups to the Mic-1 micro-architecture in Chapter 4, section 4.4. These new, improved architectures are:

Mic-2: improvements based on adding hardware, figure 4-29
Mic-3: a 4-stage pipeline, figure 4-31
Mic-4: a 7-stage pipeline, figure 4-35

General speed-ups

Reduce the number of clock cycles per instruction
Simplify so that the clock cycle is shorter -> faster clock
Execute more than instruction at a time using parallelism and pipelining

An easy speed-up is to eliminate the Main1 micro-instruction. This "interpreter loop", where Main1 increments PC and fetches the next operand can be included at the end (or near the end) of most instructions.

Mic-2 speed-up: adding hardware

The Mic-2 design adds the following components:

A third bus is added so that we don't have to shuffle parameters over the H register, possibly losing a cycle.
Add an Instruction Fetch Unit (IFU) so that we aren't waiting for instructions and operands to be fetched from memory.
- Concept: get the next few bytes before you need them
- Done in parallel so there's no penalty for doing this
- PC incremented outside of datapath, speeding it up
- See Figure 4-27

Mic-3 speed-up: a 4-stage pipeline

We can introduce parallelism by adding a latch controlling the data for each bus: A, B, C. The impact of this is:

Slices the datapath into 3 micro-steps:
1. Load busses A, B
2. Perform ALU, shift operations
3. Write registers from bus C
Each micro-step is faster, so we can increase the clock speed
Each micro-step is isolated (by latches and registers), so we can execute them independently

The 4th stage of the pipeline is the already-present IFU.

Mic-4 speed-up: A 7-stage pipeline

The change in Mic-4 is introducing parallelism to micro-instruction fetching. This is not very different conceptually from the IFU and its fetching of IJVM instructions. The important bullets are:

Micro-instructions in the ROM must be in order... no more jumping around unless your done with all the micro-instructions for an IJVM instruction.
All the micro-instructions for the current IJVM are loaded into a queue immediately when the IJVM instruction begins.
The control words for the micro-instructions are loaded into MIR1, then MIR2, then MIR3, then MIR4 as the micro-instruction works its way through the datapath.

Author: William Krieger, Nov 2003

CSC 220 Fall 2003