The B32P has a 5-stage pipeline, very similar to a MIPS CPU. The stages are as follows:
The CPU detects pipeline hazards, removing the need for the programmer to account for this, by doing the following things depending on the situation:
Branch prediction is done by always assuming that the branch did not pass (which is the easiest to implement).
Because the FPGC does not have a separate instruction and data memory, the FE and MEM stages could need access to the memory bus at the same time. To handle this, an arbiter is used that gives priority to the MEM request, since these only occur for READ/WRITE instructions, and lets the FE request stall until the bus is free.
Access to memory via the memory bus will take a variable amount of cycles depending on the memory type that is being accessed. The pipeline will stall during this delay.