Speed means low cost and low-power operation

The speed of cryptographic code is important for maximizing throughput and for minimizing latencies in protocols that use the code. This can be relevant for meeting timing constraints or to obtain a good user experience. However, having faster code - i.e., code that performs the same work in fewer processor cycles - could also allow you to select slower microcontrollers and thereby reduce the BOM costs of your hardware. Moreover, as CMOS circuits mainly consume power when switching states, using fewer processor cycles means fewer state switches and therefore less power drain, which is particularly relevant for battery-powered devices.

Algorithmic innovations

We have developed, analyzed and optimized the cryptographic code of ocrypto since 2013. During that time, we have introduced several unique algorithmic innovations, in order to achieve state-of-the-art performance while ensuring constant-time code execution:

  • Combination of known algorithms for multiplication in a prime field including modular reduction. It reduces the number of expensive instructions. For example, it brings down the number of multiplications for SRP from 64 to 8 million.
  • New bitslice implementation for AES. A new field-theoretical approach for the S-box calculation allows an efficient and table-free implementation of AES without the overhead and complications of handling multiple blocks in parallel.
  • New mathematical approach for NIST P-256 curves. Our enhanced co-Z implementation of the NIST P-256 curves is unique in that it is complete, correct, efficient, table-free, and executes in constant time even in all edge cases.

Assembly-language optimizations

Going beyond algorithmic innovations, we have carefully written the most critical parts of the code in assembly language for popular microcontroller cores. The result is typically more than three times as fast as a good imple­mentation in C. ocrypto thus makes advanced communication protocols and advanced firmware security features feasible even on low-power, low-cost 32-bit microcontrollers without hardware accelerators. Or even for processors with hardware acceleration: in situations where the hardware accelerator does not cover all relevant algorithms, is not available to all microcontroller cores, or in systems where real-time threads compete for the accelerator hardware (e.g., where the application code competes with a BLE stack, or multiple encrypted connections run in parallel).

Core Cortex-M0 Cortex-M3 Cortex-M4F microAptiv UP
Instruction set architecture ARMv6-M ARMv7-M ARMv7E-M with FPv4-SP extension MIPS32 with DSP enhancements
Clock frequency 16 MHz 48 MHz 64 MHz 200 MHz
Set up HomeKit accessory – first phase with static setup code (with dynamic setup code: three times as long) 3.9 s 1.1 s 0.4 s 0.1 s
Set up HomeKit accessory – second phase with static or dynamic setup code 15.0 s 4.3 s 1.4 s 0.4 s
Open HomeKit session 940 ms 260 ms 60 ms 20 ms

The above HomeKit-related numbers only include the time for cryptographic processing. The communication protocol, application logic, and operating system at the other end will add to the experienced round-trip times. Note that accessory setup usually occurs only once in the lifetime of a HomeKit accessory and happens in two phases (before the setup code is entered on the iOS device, and after the setup code has been entered).