Qualification Requirements
Entries must meet the following requirements to qualify for the competition:
- All collateral contained in a github repository, shared with contest operator (Supranational), including everything needed to run your model. This may include:
- Code - RTL, software, scripts
- Documentation
- Constraint files
- TCL scripts
- Makefiles
- Reasonable documentation of the design, including
- High level algorithm - include architectural drawings, formulas, pseudo-code, models (python, etc.)
- Key implementation details
- Detailed instructions to reproduce all inputs and results
- Conforms to the specified modular squaring interface
- Simulates successfully with the provided modulus
- Vivado behavioral simulation to 10k iterations
- SDAccel hardware emulation passes to 10 iterations
- Synthesizes and Implements successfully in AWF F1 SDAccel flow
- Executes and produces the correct result on AWS F1 FPGA hardware for 1B iterations using a random input
- Complies with AWS F1 usage agreements
- Complies with this contest official rules
Performance Evaluation
- Estimate performance for all qualifying designs using the SDAccel synthesis clock freq and simulation cycles/sq. For example, given 8 cycles/sq and 161Mhz, total latency is (1/161)*1000*8 = 49.7ns.
- Select the design with the highest estimated performance as well as any designs within 3ns of that result.
- Execute these designs on AWS F1 using the available (granular) clocking. Measure performance and functional correctness of 1B repeated squarings. The clock frequencies available natively from AWS F1 are documented here: https://github.com/aws/aws-fpga/blob/master/hdk/docs/dynamic_clock_config.md.
- If this produces a clear winner stop. This is the most likely outcome.
- It is possible to have a result from #4 where the interaction of F1 auto frequency scaling and the available granular clock inverts the ordering between two designs. In this situation we will endeavor to run affected designs with a more precise MMCM generated clock running at the SDAccel auto-scaling recommended frequency on either AWS F1 or a standalone VCU118 board to determine the winner. In the event that this is not practical the winner will be determined based on the auto-scaling recommended frequency.
Example of Result Inversion
To further illustrate the scenario described in #4 consider the following outcome with designs A and B.
This table shows how using the estimated frequency from auto scaling design A wins with a lower overall latency:
Design | Estimated Clock | Cyc/Sq | Latency | |
A | 249 | 8 | 32.1 | A WINS! |
B | 460 | 15 | 32.6 |
However when you apply the available granular clocking options the results flip and B wins:
Design | Granular Clock | Cyc/Sq | Latency | |
A | 227 | 8 | 35.2 | |
B | 458 | 15 | 32.8 | B WINS! |
Executing at the estimated maximum frequency resolves this issue.