Blogger Templates

Sunday, 16 December 2012

Processor

1.Introduction


  v  CPU performance factors
o   Instruction count
·   Determined by ISA and compiler.
o   CPI and Cycle time
·   Determined by CPU hardware.
  v  We will examine two MIPS implementations
·   Simplified version
·   More realistic pipelined version
  v  Simple subset,show most aspects
·   Memory reference: lw,sw
·   Arithmetic/logical: add,sub,and,or,slt
·   Control transfer: beq,j
    
 CPU overview

         v  All instruction start by using the program counter to supply the instruction address to the
                instruction memory.
v  After the instruction is fetched, the register operands used by an instruction are specified by fields of that instruction.
v  Once the register operands have been fetched, they can be operated on to compute a memory address (for a load or store), to compute an arithmetic result (for an integer arithmetic-logical instruction), or a compare (for a branch).
v  If the instruction is an arithmetic-logical instruction, the result from the ALU must be written to a register.
v  If the operation is a load or store, the ALU result is used as an address to either store a value from the registers or load a value from memory into the registers.
v  The result from the ALU or memory is written back into register file.
v  Branches require the use of the ALU output to determine the next instruction address, which comes either from the ALU (where the PC and branch offset are summed) or from an adder that increments the current PC by 4.
v  The thick lines interconnecting the functional units represent buses, which consists of multiple signals.

Multiplexers



v  In practice, these data lines cannot simply be wired together; we must add a logic element that chooses from among the multiple sources and steers one of those sources to its destination.
v  This selection is commonly done with a device called a multiplexor, although this device might better be called a data selector.   

   Control


v  The top multiplexor (Mux) controls what value replaces the PC; the multiplexor is controlled by the gate that “ANDs” together the Zero output of the ALU and a control signal that indicates that the instruction is a branch.
v  The middle multiplexor, whose output returns to the register file, is used to steer the output of the ALU or the output of the data memory for writing into the register file.
v  Finally, the bottommost multiplexor is used to determine whether the second ALU input is from the registers or from the offset field of the instruction.
v  The added control lines are straightforward and determine the operation performed at the ALU, whether the data memory should be read or write, and whether the registers should perform a write operation.
v  The control lines are shown in blue colors.



Clocking Methodology




v  Design of high performance microprocessors.
v  Input from state elements, output to state element
v  Longest delay determines clock period
   
   Written by Andy Low Fu Hwa

Datapath
-          Elements that process data and addresses in the CPU
-          Example of datapath is Registers, ALUs, multiplexors , memories, …

Instruction Fetch


To execute any instruction, we must start by fetching the instruction from Instruction Memory.
      PC feeds address of current instruction to Instruction Memory.
      Instruction memory read the address and fetch the instruction stored in the memory.
      PC add 4 to hold the next instruction address.
   
R-Format Instructions

add $t0, $t1, $t2:   $t0= $t1 + $t2
and $t0, $t1, $t2:   $t0= $t1 AND $t2










Portion of datapath for R-format instruction




Register File

         Consists of a set of 32 registers that can be read and written
                         Registers built from D flip-flops
         has two read ports and one write port
         Register number are 5 bit long
         To write, you need three inputs:
            a register number, the data to write, and a clock (not shown explicitly) that controls the 
            writing into the register

The register content will change on rising clock edge

R-Format Instructions Datapath


R-Format Instruction (Figure desc)

Fig above shows the operation of the datapath for R-format instruction, such as add $t0, $t1, $t2. The operation:
1.    The instruction is fetched, and the PC is incremented
2.    Two registers, $t1 and $t2, are read from the register file; also RegDst, RegWrite and ALUOp is set.
3.    The ALU operates on the data read from the register file, using the function code (bits 5:0, in the funct field) to generate the ALU function
4.    The result from the ALU is written into the register file using bits 15:11 of the instruction to select the destination register ($t0)


Load/Store Instructions
-          Read register operands
-          Calculate address using 16-bit offset
·      Use ALU, but sign-extend offset
-          Load: Read memory and update register-    

Store: Write register value to memory

           Store instruction datapath



           Load instruction datapath


Store instruction datapath


Memory Unit
         MemRead to be asserted to read
         MemWrite to be asserted to write
         Both MemRead and MemWrite not to be asserted in same clock cycle        
         Memory is edge triggered for writes



Load/Store Datapath



  • Figure in above illustrate the execution of load word such as lw $t1, 4($t2)
1.    Instruction is fetched from the instruction memory, and PC is incremented.
2.    Value of register $t2 is read from the register file.
3.    The ALU computes the sum of the value read from the register file and the sign-extended, lower 16 bits of the instruction (offset = 4).
4.    The sum from the ALU is used as the address for the data memory.
5.    The data from the memory unit is written into the register file; the register destination is given by bits 20:16 of the instruction ($t1)

Branch Instruction       
  •    Read register operands
  •    Compare operands
-          Use ALU, subtract and check Zero output
  •    Calculate target address
-          Sign-extend displacement
-          Shift left 2 places (word displacement)
-          Add to PC + 4
§ Already calculated by instruction fetch

 

 Figure shows the datapath for a branch uses the ALU to evaluate the branch condition and a separate adder to compute the branch target as the sum of the incremented PC and the sign-extended, lower 16 bits of the instruction (the branch displacement), shifted left 2 bits.

Example : Branch-on-equal


·         Fig above shows the operation of the branch-on-equal instruction, such as beq $t1, $t2, offset. The four steps execution:

1.      An instruction is fetched from the instruction memory, and the PC is incremented.
2.      Two registers, $t1 and $t2, are read from the register file.
3.      The ALU performs a subtract on the data values read from the register file. The value of PC+4 is added to the sign-extended, lower 16 bits of the instruction (offset) shifted left by two; the result is the branch target address.
                  4.  The zero result from the ALU is used to decide which adder result to store into the PC

Implementing Jumps



  • Jump uses word address
  • Update PC with concatenation of
-          Top 4 bits of old PC
-          26-bit jump address
  • Need an extra control signal decoded from opcode

Datapath With Jumps Added




Performance Issues

  • Longest delay determines clock period
-          Critical path: load instruction
-          Instruction memory ® register file ® ALU ® data memory ® register file

  • Not feasible to vary period for different instructions
  • Violates design principle
-          Making the common case fast

  • We will improve performance by pipelining

Written By Soo Pheng Kian

PIPELINING

-Pipelining is an implementation technique where multiple instructions are overlapped in execution.
-A useful method of demonstrating this is the laundry analogy. Let's say that there are four loads of dirty laundry that need to be washed, dried, and folded. We could put the the first load in the washer for 30 minutes, dry it for 40 minutes, and then take 20 minutes to fold the clothes. Then pick up the second load and wash, dry, and fold, and repeat for the third and fourth loads. Supposing we started at 6 PM and worked as efficiently as possible, we would still be doing laundry until midnight.



    However, a smarter approach to the problem would be to put the second load of dirty laundry into the washer after the first was already clean and whirling happily in the dryer. Then, while the first load was being folded, the second load would dry, and a third load could be added to the pipeline of laundry. Using this method, the laundry would be finished by 9:30.


-Five stages,one step per stage
1.IF : Fetch instructions from memory
2.ID : Read registers and decode the instruction
3.EX : Ececute the instruction or calculate an address
4.MEM : Access an operand in data memory
5.WB : Write the result into a register

-Pipeline Performance
·         The potential increase in performance resulting from pipelining is proportional to the number of pipeline stages.

·        - The previous pipeline is said to have been stalled for two clock cycles.

·      -   Any condition that causes a pipeline to stall is called a hazard.

·      -  Data hazard – any condition in which either the source or the destination operands of an instruction are not available at the time expected in the pipeline. So some operation has to be delayed, and the pipeline stalls.

·         -Instruction (control) hazard – a delay in the availability of an instruction causes the pipeline to stall.

·       -  Structural hazard – the situation when two instructions require the use of a given hardware resource at the same time.

·        - Again, pipelining does not result in individual instructions being executed faster, rather, it is the throughput that increases.

·        - Throughput is measured by the rate at which instruction execution is completed.

·        - Pipeline stall causes degradation in pipeline performance.

·        - We need to identify all hazards that may cause the pipeline to stall and to find ways to minimize their impact.
              -Hazards

-There are three classes of hazards:
·         Structure hazards : They arise from resource conflicts when the hardware cannot support all possible combinations of instructions in simultaneous overlapped execution. 

·        - Data hazards : They arise when an instruction depends on the result of a previous instruction in a way that is exposed by the overlapping of instructions in the pipeline. 

·         -Control hazards : They arise from the pipelining of branches and other instructions that change the PC.

   Written by Ng Wui Sheng



Pipelined datapath




1.In MIPS datapath,there are 5 stages which is done step by step.That is IF(fetching instructions),ID(instructions decode and read register),EX(execute operation/calculate address),MEM(access memory operand) and WB(write results to registers)

           ·       Pipeline is a method used to increase performance and decrease load time by doing few different stages at the same time which is called one cycle.

          ·       However,pipelined datapath needs a way to store information produced by previous cycle.

          ·       Therefore,between each stages a register is needed to store that information.
          ·       As shown by the figure below,there are 4 blue bars which are registers labelled between each stages.
          ·       Each register has a size of 64 bits which is divided into two to enable the pipelining methodology to work




1.    During the first stage known as instruction fetch(IF),only half(32 bit) of the pipeline register is being accessed by the  stages as shown by the figure below.





2.    During the second stage ,only half(32 bit) of first and second pipeline register is being accessed as shown by the figure below.






3.    In the third pipe stage of load instruction,the register is added to the sign-extended immediate, and the sum is placed in the EX/MEM pipeline register.





However, in the third pipe stage of store instruction,the value in second register is loaded in the EX/MEM pipeline register to be used in next stage.




4.    Using the address in EX/MEM pipeline register in a pipeline stage of load instructions,data memory is read and placed in the MEM/WB pipeline register.




     In pipeline stage of store instructions data is written into data memory to be stored. The data came from EX/MEM pipeline register without any involvement to the MEM/WB pipeline register.

5.    For the pipeline stage of load instruction,data is read the from previous pipeline register and written into the register file marked in the middle

     In the stage of store instruction,since the data came comes from the EX/MEM pipeline register,nothing is manipulated in the MEM/WB.When the data is written into the memory,nothing happens in final stage.


6. However,the load instructions in fourth and fifth pipeline stages have bugs and repaired in the figure below. The write register number and the data comes from the MEM/WB pipeline register. The register number is passed from the ID pipe stage until it reaches the MEM/WB pipeline register, adding five more bits to the last three pipeline registers. This is the corrected  pipelined datapath to handle the problematic  load instruction.


Written by Mohd Safar