Lab 4: (Near) Complete Processor + Pipelining + Bells + Whistles
Lab 4 involves 2 compulsory tasks and the rest is open-ended.
Demonstrate [Post-synthesis simulation and FPGA hardware] incorporating all the mandatory and open-ended parts. Use a single assembly program that can be entirely compiler-generated, entirely hand-crafted, or a combination of the two.
Compulsory Task 1 [5 marks]
Add support for the following previously unsupported instructions/features mentioned below. They are pretty easy to add at this point and will allow you to run almost all the code generated by a compiler. Please see Using Compiled Code and New Peripherals page for more info. However, using hand-coded assembly is perfectly fine too.
There is no need to try and craft a program specifically to demonstrate everything below, unlike in the previous labs. The only requirement is to have at least one function call in your program, which will cause jal
(call
) and jalr
(ret
) to be emitted by the compiler, unless you do a whole program optimization with GCC.
* DP instructions: xor
, xori
, slt
, sltu
, slti
, sltiu
, slli
, srli
, srai
.
* Branch instructions: blt
, bge
, bltu
, bgeu
, jal
(with support for linking: i.e. rd = PC + 4
), jalr
.
* Multiply (M) instructions: mulh
, mulhu
, div
, rem
, remu
. mulhsu
is NOT a requirement, as it requires some additional effort. mulhsu
can be avoided in the compiler-generated code by avoiding signed x unsigned multiplications.
Support for byte and half-word load / store is NOT a basic requirement. They can be mostly avoided by using only int
s in your program. However, incorporating them can lead to easier to write and more efficient programs, especially when using byte-oriented peripherals such as the UART console, 8-bit images with OLED, etc.
Compulsory Task 2 [10 marks]
Implement basic pipelining. Hazard hardware is optional and will count as an open-ended enhancement if done, but it is not counted for the compulsory task marks. Pipelining should be done such that the processor supports all the requirements for Lab 2, Lab 3, and the additional instructions / features mentioned in Lab 4 Compulsory Task 1 above.
As long as your code works after inserting the sufficient NOP
s, this task requirement can be satisfied. If you have full hazard hardware, of course, NOP
s are unnecessary.
For some tips on implementing pipelining, see this page.
The real benefit of pipelining is higher clock speeds. For example, it's highly unlikely that your design runs at 100 MHz without pipelining. You'll most likely get a critical warning that some timing constraints are not met. The design may or may not run on the board, but if you got a timing warning, the functionality is not reliable. It's pointless to have a pipelined design that runs at 1 Hz. So it is a good idea to think of an application program that benefits from the higher clock speeds. Frequencies of up to 100 MHz can be achieved by changing CLK_DIV_BITS in TOP.vhd.
Open-ended Enhancement [10 marks]
This is the fun part - you get 10 marks for Lab 4 for implementing performance enhancements of your choice. There is no fixed requirement to get these marks, and while we suggest some enhancements below, it doesn't mean you need to implement all (or any!) of these. Just one significant performance enhancement will suffice, and this need not be limited to the ones we have listed.
Some potential improvements you can think about implementing are:
- Implement additional instructions
- Implement exception handling and interrupt support
- Implement basic branch prediction
- Use other devices, especially if they can be used to illustate performance enhancements. e.g., built in mic (could be interesting), RGB LEDs (not too interesting), USB host, etc. Not too difficult using components/modules from https://github.com/Digilent/Nexys-4-DDR-OOB
Design Instructions
- You are encouraged to have your own, comprehensive programs to have a convincing demo.
- You may have to tweak the templates given in Lab 2 for use in Lab 4. Specifically, you will have to change the ALUControl to 4 bits for ARM, PCSrc and ALUSrcA/B width for RISC-V, etc.
- All 32-bit combinational arithmetic and logical operations have to be performed inside the ALU. Exceptions:
+
for calculatingPC+4
,PC+8
(in ARM), multiplication, division, enhancements such as branch prediction. Pipeline hazard detection will require comparisons (for equality only) too, but they are on 5-bit values. - In the ALU, DO NOT use additional
+
signs -> this could infer additional adders. The existing addition framework should be good enough. - All operators are permitted on 32-bit values outside the ARM/RISC-V module. For example, you will have to do 32-bit comparisons in the wrapper for address decoding.
- Use of arithmetic operators
- Do not use
*
operator. It is synthesizable, but in this course, we are implementing multi-cycle multiplication./
is not synthesizable, except on constants. - All operators (including
**
,*
,/
,sll
,rem
,mod
etc.) are allowed on constants (operations on constants are done at synthesis time, and will not infer any hardware).
- Do not use
Submission Info
- Lab 4 will be evaluated in Week 12. The presentation schedule can be found on Canvas.
- Include
- .v/vhd files you have created/modified [ RTL Sources, Testbench(es) ]
- .c/h files used to generate assembly
- .bit files
- .s/.asm files (assembly programs)
- .ppt file - 2 to 6 slides showing performance enhancement techniques you have implemented.
in an archive with the filename GroupXX_Monday/Friday_Lab4.zip (replace XX with your group number) and upload it to Canvas. One submission per group is sufficient – if there are multiple submissions, the file with the latest timestamp will be taken as the final submission. Do not zip and upload the complete project folder – only those files mentioned above should be included. The files should be the exact same files that you used for the demo.