v CPU performance factors
o
Instruction
count
·
Determined
by ISA and compiler.
o
CPI
and Cycle time
·
Determined
by CPU hardware.
v We will examine two MIPS
implementations
·
Simplified
version
·
More
realistic pipelined version
v Simple subset,show most aspects
·
Memory
reference: lw,sw
·
Arithmetic/logical:
add,sub,and,or,slt
·
Control
transfer: beq,j
CPU overview
v
All
instruction start by using the program counter to supply the instruction
address to the
instruction memory.
v
After
the instruction is fetched, the register operands used by an instruction are
specified by fields of that instruction.
v
Once
the register operands have been fetched, they can be operated on to compute a
memory address (for a load or store), to compute an arithmetic result (for an
integer arithmetic-logical instruction), or a compare (for a branch).
v
If
the instruction is an arithmetic-logical instruction, the result from the ALU
must be written to a register.
v
If
the operation is a load or store, the ALU result is used as an address to
either store a value from the registers or load a value from memory into the
registers.
v
The
result from the ALU or memory is written back into register file.
v
Branches
require the use of the ALU output to determine the next instruction address,
which comes either from the ALU (where the PC and branch offset are summed) or
from an adder that increments the current PC by 4.
v
The
thick lines interconnecting the functional units represent buses, which
consists of multiple signals.
Multiplexers
v
In
practice, these data lines cannot simply be wired together; we must add a logic
element that chooses from among the multiple sources and steers one of those
sources to its destination.
v This selection is commonly done with
a device called a multiplexor, although this device might better be called a
data selector.
Control
v
The
top multiplexor (Mux) controls what value replaces the PC; the multiplexor is
controlled by the gate that “ANDs” together the Zero output of the ALU and a
control signal that indicates that the instruction is a branch.
v
The
middle multiplexor, whose output returns to the register file, is used to steer
the output of the ALU or the output of the data memory for writing into the
register file.
v
Finally,
the bottommost multiplexor is used to determine whether the second ALU input is
from the registers or from the offset field of the instruction.
v
The
added control lines are straightforward and determine the operation performed
at the ALU, whether the data memory should be read or write, and whether the
registers should perform a write operation.
v The control lines are shown in blue
colors.
Clocking
Methodology
v Design of high performance
microprocessors.
v
Input
from state elements, output to state element
v
Longest
delay determines clock period
Written by Andy Low Fu Hwa
Datapath
-
Elements that process data and
addresses in the CPU
-
Example
of datapath is Registers, ALUs, multiplexors , memories, …
Instruction
Fetch
To execute any instruction, we must start by fetching
the instruction from Instruction Memory.
•
PC feeds address of current
instruction to Instruction Memory.
•
Instruction memory read the address
and fetch the instruction stored in the memory.
•
PC add 4 to hold the next instruction
address.
R-Format
Instructions
add $t0, $t1,
$t2: $t0= $t1 + $t2
and $t0, $t1,
$t2: $t0= $t1 AND $t2
Portion
of datapath for R-format instruction
Register File
•
Consists of
a set of 32 registers that can be read and written
Registers
built from D flip-flops
•
has two
read ports and one write port
•
Register number
are 5 bit long
•
To write,
you need three inputs:
a register number, the
data to write, and a clock (not shown explicitly) that controls the
writing
into the register
The register content will change on rising clock
edge
R-Format
Instructions Datapath
R-Format Instruction (Figure desc)
Fig above shows the operation of the datapath for
R-format instruction, such as add $t0, $t1, $t2. The operation:
1.
The instruction is fetched, and the
PC is incremented
2.
Two registers, $t1 and $t2, are read
from the register file; also RegDst, RegWrite and ALUOp is set.
3.
The ALU operates on the data read
from the register file, using the function code (bits 5:0, in the funct field)
to generate the ALU function
4.
The result from the ALU is written
into the register file using bits 15:11 of the instruction to select the
destination register ($t0)
Load/Store Instructions
-
Read register operands
-
Calculate address using 16-bit offset
· Use
ALU, but sign-extend offset
-
Load: Read memory and update register -
Store: Write register value to memory |
Store instruction datapath
Load instruction datapath
Store
instruction datapath
Memory Unit
•
MemRead to
be asserted to read
•
MemWrite to
be asserted to write
•
Both
MemRead and MemWrite not to be asserted in same clock cycle
•
Memory is edge triggered for writes
- Figure in above illustrate the execution of load word such as
lw $t1, 4($t2)
1.
Instruction is fetched from the
instruction memory, and PC is incremented.
2.
Value of register $t2 is read from
the register file.
3.
The ALU computes the sum of the value
read from the register file and the sign-extended, lower 16 bits of the
instruction (offset = 4).
4.
The sum from the ALU is used as the
address for the data memory.
5.
The data from the memory unit is
written into the register file; the register destination is given by bits 20:16
of the instruction ($t1)
Branch Instruction
- Read
register operands
- Compare
operands
-
Use ALU, subtract and check Zero
output
- Calculate
target address
-
Sign-extend displacement
-
Shift left 2 places (word
displacement)
-
Add to PC + 4
§ Already calculated by instruction fetch
Figure shows the datapath for a branch uses the ALU to
evaluate the branch condition and a separate adder to compute the branch target
as the sum of the incremented PC and the sign-extended, lower 16 bits of the
instruction (the branch displacement), shifted left 2 bits.
Example : Branch-on-equal
·
Fig above shows the operation of the
branch-on-equal instruction, such as beq $t1, $t2, offset. The four steps
execution:
1.
An instruction is fetched from the
instruction memory, and the PC is incremented.
2.
Two registers, $t1 and $t2, are read
from the register file.
3.
The ALU performs a subtract on the
data values read from the register file. The value of PC+4 is added to the
sign-extended, lower 16 bits of the instruction (offset) shifted left by two;
the result is the branch target address.
4. The zero result from the ALU is used to decide
which adder result to store into the PC
Implementing
Jumps
- Jump uses word address
- Update PC with concatenation of
-
Top 4 bits of old PC
-
26-bit jump address
- Need an extra control signal decoded from opcode
Datapath
With Jumps Added
Performance Issues
- Longest delay determines clock period
-
Critical path: load instruction
-
Instruction memory ® register
file ®
ALU ®
data memory ®
register file
- Not feasible to vary period for different instructions
- Violates design principle
-
Making the common case fast
- We will improve performance by pipelining
Written By Soo Pheng Kian
PIPELINING
-Pipelining is an implementation
technique where multiple instructions are overlapped in execution.
-A
useful method of demonstrating this is the laundry analogy. Let's say that
there are four loads of dirty laundry that need to be washed, dried, and
folded. We could put the the first load in the washer for 30 minutes, dry it
for 40 minutes, and then take 20 minutes to fold the clothes. Then pick up the
second load and wash, dry, and fold, and repeat for the third and fourth loads.
Supposing we started at 6 PM and worked as efficiently as possible, we would
still be doing laundry until midnight.
However, a smarter approach to the problem would
be to put the second load of dirty laundry into the washer after the first was
already clean and whirling happily in the dryer. Then, while the first load was
being folded, the second load would dry, and a third load could be added to the
pipeline of laundry. Using this method, the laundry would be finished by 9:30.
-Five stages,one step per stage
1.IF : Fetch instructions from memory
2.ID : Read registers and decode the instruction
3.EX : Ececute the instruction or calculate an address
4.MEM : Access an operand in data memory
5.WB : Write the result into a register
-Pipeline Performance
·
The potential increase in performance
resulting from pipelining is proportional to the number of pipeline stages.
· - The previous pipeline is said to have
been stalled for two clock cycles.
· - Any condition that causes a pipeline
to stall is called a hazard.
· - Data hazard – any condition in which
either the source or the destination operands of an instruction are not
available at the time expected in the pipeline. So some operation has to be
delayed, and the pipeline stalls.
· -Instruction (control) hazard – a
delay in the availability of an instruction causes the pipeline to stall.
· - Structural hazard – the situation
when two instructions require the use of a given hardware resource at the same
time.
· - Again, pipelining does not result in
individual instructions being executed faster, rather, it is the throughput
that increases.
· - Throughput is measured by the rate at
which instruction execution is completed.
· - Pipeline stall causes degradation in
pipeline performance.
· - We need to identify all hazards that
may cause the pipeline to stall and to find ways to minimize their impact.
-Hazards
-There
are three classes of hazards:
·
Structure hazards : They arise from resource conflicts when
the hardware cannot support all possible combinations of instructions in
simultaneous overlapped execution.
· - Data hazards : They arise when an instruction depends on the
result of a previous instruction in a way that is exposed by the overlapping of
instructions in the pipeline.
· -Control hazards : They arise from the
pipelining of branches and other instructions that change the PC.
Written by Ng Wui Sheng
Pipelined
datapath
1.In MIPS
datapath,there are 5 stages which is done step by step.That is IF(fetching
instructions),ID(instructions decode and read register),EX(execute
operation/calculate address),MEM(access memory operand) and WB(write results to
registers)
· Pipeline is a method used to increase
performance and decrease load time by doing few different stages at the same
time which is called one cycle.
· However,pipelined datapath needs a
way to store information produced by previous cycle.
· Therefore,between each stages a
register is needed to store that information.
· As shown by the figure below,there
are 4 blue bars which are registers labelled between each stages.
· Each register has a size of 64 bits
which is divided into two to enable the pipelining methodology to work
1. During the first stage known as
instruction fetch(IF),only half(32 bit) of the pipeline register is being
accessed by the stages as shown by the
figure below.
2. During the second stage ,only half(32
bit) of first and second pipeline register is being accessed as shown by the
figure below.
3. In the third pipe stage of load instruction,the
register is added to the sign-extended immediate, and the sum is placed in the
EX/MEM pipeline register.
However, in the third
pipe stage of store instruction,the value in second register is loaded in the
EX/MEM pipeline register to be used in next stage.
4. Using the address in EX/MEM pipeline
register in a pipeline stage of load instructions,data memory is read and
placed in the MEM/WB pipeline register.
In pipeline stage of store
instructions data is written into data memory to be stored. The data came from
EX/MEM pipeline register without any involvement to the MEM/WB pipeline
register.
5. For the pipeline stage of load
instruction,data is read the from previous pipeline register and written into
the register file marked in the middle
In the stage of store
instruction,since the data came comes from the EX/MEM pipeline register,nothing
is manipulated in the MEM/WB.When the data is written into the memory,nothing
happens in final stage.
6. However,the
load instructions in fourth and fifth pipeline stages have bugs and repaired in
the figure below. The write register number and the data comes
from the MEM/WB pipeline register. The register number is passed from the ID
pipe stage until it reaches the MEM/WB pipeline register, adding five more bits
to the last three pipeline registers. This is the corrected pipelined datapath to handle the
problematic load instruction.
Written by Mohd Safar