The B32P has a 5-stage pipeline, very similar to a MIPS CPU. The stages are as follows:
- FE (fetch): Obtain instruction from memory bus
- DE (decode): Decode instruction and read register bank
- EX (execute): ALU operation and jump/branch calculation
- MEM (memory access): Stack access and memory bus access
- WB (write back): Write to register bank
Hazard detection and branch prediction
The CPU detects pipeline hazards, removing the need for the programmer to account for this, by doing the following things depending on the situation:
- Flush (if jump/branch/halt)
- Stall (if EX uses result of MEM)
- Forwarding (MEM -> EX and WB -> EX)
Branch prediction is done by always assuming that the branch did not pass (which is the easiest to implement).
Because the FPGC does not have a separate instruction and data memory, the FE and MEM stages could need access to the memory bus at the same time. To handle this, an arbiter is used that gives priority to the MEM request, since these only occur for READ/WRITE instructions, and lets the FE request stall until the bus is free.
Access to memory via the memory bus will take a variable amount of cycles depending on the memory type that is being accessed. The pipeline will stall during this delay.