6. Micro-architecture Speedup Notes

These notes elaborate on the speedups to the Mic-1 micro-architecture in Chapter 4, section 4.4. These new, improved architectures are:

  • Mic-2: improvements based on adding hardware, figure 4-29
  • Mic-3: a 4-stage pipeline, figure 4-31
  • Mic-4: a 7-stage pipeline, figure 4-35

General speed-ups

  1. Reduce the number of clock cycles per instruction
  2. Simplify so that the clock cycle is shorter -> faster clock
  3. Execute more than instruction at a time using parallelism and pipelining

An easy speed-up is to eliminate the Main1 micro-instruction. This "interpreter loop", where Main1 increments PC and fetches the next operand can be included at the end (or near the end) of most instructions.

Mic-2 speed-up: adding hardware

The Mic-2 design adds the following components:

  • A third bus is added so that we don't have to shuffle parameters over the H register, possibly losing a cycle.
  • Add an Instruction Fetch Unit (IFU) so that we aren't waiting for instructions and operands to be fetched from memory.
    • Concept: get the next few bytes before you need them
    • Done in parallel so there's no penalty for doing this
    • PC incremented outside of datapath, speeding it up
    • See Figure 4-27

Mic-3 speed-up: a 4-stage pipeline

We can introduce parallelism by adding a latch controlling the data for each bus: A, B, C. The impact of this is:

  • Slices the datapath into 3 micro-steps:
    1. Load busses A, B
    2. Perform ALU, shift operations
    3. Write registers from bus C
  • Each micro-step is faster, so we can increase the clock speed
  • Each micro-step is isolated (by latches and registers), so we can execute them independently

The 4th stage of the pipeline is the already-present IFU.

Mic-4 speed-up: A 7-stage pipeline

The change in Mic-4 is introducing parallelism to micro-instruction fetching. This is not very different conceptually from the IFU and its fetching of IJVM instructions. The important bullets are:

  • Micro-instructions in the ROM must be in order... no more jumping around unless your done with all the micro-instructions for an IJVM instruction.
  • All the micro-instructions for the current IJVM are loaded into a queue immediately when the IJVM instruction begins.
  • The control words for the micro-instructions are loaded into MIR1, then MIR2, then MIR3, then MIR4 as the micro-instruction works its way through the datapath.

Author: William Krieger, Nov 2003 CSC 220 Fall 2003