Page Comparison

The contest will be run as winner take all as measured by the lowest latency per squaring over 1B repeated squarings on AWS F1.

The baseline latency for Round 1 will be set at 50ns per squaring. The winner of Round 1 will receive $3000/ns improvement from the baseline.

The baseline for Round 2 will be set based on the results of Round 1.

Qualification Requirements

Entries must meet the following requirements to qualify for the competitionto win:

All collateral contained in a github repository, shared with contest operator (Supranational), including everything needed to run your model. This may include:
- Code - RTL, software, scripts
- Documentation
- Constraint files
- TCL scripts
- Makefiles
Reasonable documentation of the design, including
- High level algorithm - include architectural drawings, formulas, pseudo-code, models (python, etc.)
- Key implementation details
- Detailed instructions to reproduce all inputs and results
Conforms to the specified modular squaring interface
Simulates successfully with the provided modulus
- Vivado behavioral simulation to 10k iterations
- SDAccel hardware emulation passes to 10 iterations
Synthesizes and Implements successfully in AWF F1 SDAccel flow
Executes and produces the correct result on AWS F1 FPGA hardware for 1B iterations using a random input
Complies with AWS F1 usage agreements
Complies with this contest official rules

...

However when you apply the available granular clocking options the results flip and B wins:

...

Estimate performance for all qualifying designs using the SDAccel synthesis clock freq and simulation cycles/sq. For example, given 8 cycles/sq and 161Mhz, total latency is (1/161)*1000*8 = 49.7ns.
Select the design with the highest estimated performance as well as any designs within 3ns of that result.
Execute these designs on AWS F1 using the available (granular) clocking. Measure performance and functional correctness of 1B repeated squarings. The Contestants should be aware that only certain clock frequencies available natively from AWS F1 are , as documented herein: https://github.com/aws/aws-fpga/blob/master/hdk/docs/dynamic_clock_config.md.
If this produces a clear winner stop. This is the most likely outcome.
It is possible to have a result from #4 where the interaction of F1 auto frequency scaling and the available granular clock inverts the ordering between two designs. In this situation we will endeavor to run affected designs with a more precise MMCM generated clock running at the SDAccel auto-scaling recommended frequency on either AWS F1 or a standalone VCU118 board to determine the winner. In the event that this is not practical the winner will be determined based on the auto-scaling recommended frequency.

To further illustrate the scenario described in #4 consider the following outcome with designs A and B.

This table shows how using the estimated frequency from auto scaling design A wins with a lower overall latency:

...

Designs may use an MMCM or clock generator to operate at alternate frequencies.
The winner will be the design with the lowest latency per squaring over 1B iterations.