
Quantum computers, which have the potential to revolutionize medicine, materials science, and artificial intelligence, also represent a threat to the security of our communications. Cryptographic systems like RSA, Diffie-Hellman, and ECC have been shown to be vulnerable to quantum attacks. Post-quantum cryptography (PQC) aims to develop systems resistant to these attacks.
The HQC algorithm, which is based on error-correction codes, was selected as the standard by NIST in 2025. The proposed methods for HQC software implementation perform relatively poorly compared to alternative Euclidean-network-based solutions. Hardware implementations, while faster, rely on polynomial multiplication involving very large polynomials requiring a very large silicon area—the algorithm’s main bottleneck.

Tightly-coupled acceleration, which encapsulates the algorithm’s recurring operations in custom instructions added to the processor instruction set, offers an alternative to conventional hardware acceleration methods. The accelerator can be tightly integrated into the RISC-V processor execution pipeline using the Core-V eXtension Interface(CV-X-IF). This solution offers transparent speed increases while overcoming the traditional hurdles to ISA extension.

We applied this new, tightly-coupled acceleration strategy and designed three dedicated hardware acceleration technologies targeting the HQC algorithm’s main bottlenecks. The first, R-Unit, uses a multi-level Karatsuba algorithm to speed up polynomial multiplication on 32-bit blocks with four custom instructions to ensure that the results are completely controlled. The second, RS-Decoder, contains several specialized instructions to eliminate loops and intermediate results, speeding up key Galois field operations like carry-free multiplication and final zero counting. The third, the Keccak accelerator, leverages a dedicated register and three custom instructions to ensure efficient loading, processing, and storage. This minimizes overhead compared to conventional loosely-coupled approaches, for effective management of the 1,600-bit permutation state.
TYRCA, with its tightly-coupled acceleration, delivers substantial performance improvements over the original HQC software implementation. The number of clock cycles in key generation (KeyGen) and encapsulation and decapsulation (Encaps, Decaps) operations is reduced by around 95% at all security levels (HQC-128/192/256). This approach also substantially reduces instruction memory use. Implemented on a Kintex-7 FPGA target, TYRCA occupies less than 26% of the total system-on-chip area. R-Unit, which delivers the highest performance improvement, takes up less than 10 % of the area. Normalized velocity metrics (velocity/surface) confirm that TYRCA outperforms the existing loosely-coupled approaches.

