Saving two birds with one stone: A new fast and robust coding scheme

There is a lot of important data in our digital world, which – whether in transit or at rest – we want to keep secure and available in the face of unexpected loss or corruption. Error correcting codes (ECCs) are an important tool for achieving reliable data transmission or storage over unreliable networks or media, and are widely used in practice – from digital media broadcast, cellular networks and satellite communications to high-end RAM, DVDs, disk and cloud storage.

An ECC encodes data by adding redundancy. Data is viewed as a message split into symbols that get transformed into a larger number of code symbols, often by simply appending additional symbols to the message. Even if some of the code symbols are altered or deleted, there typically remains enough information to decode them and recover the original message. Most ECCs are designed to protect against a small number of random errors, but there are important applications of ECCs in which an attacker may be able to introduce errors that are specifically targeted to cause the most harm.

Most known ECCs intended to protect against such adversarial data corruption, like Reed-Solomon codes, are computationally costly, requiring large decoding effort. Fountain codes, like LT [1] and Raptor [2] codes, are on the other end of the spectrum. They produce a potentially unbounded stream of new code symbols on demand and decode symbols as they become available, providing extremely efficient message encoding and recovery. Raptor codes are the most advanced fountain codes that have been standardized by IETF (RFCs 5053 and 6330) and also adopted in other standards to support wide application areas, such as multimedia services (e.g., MBMS, DVB-H, IP-TV) streaming media, or satellite communications.

But this great efficiency comes at a qualitative limitation. By design, LT and Raptor codes can operate reliably only in the face of random erasures where each code symbol is independently assumed lost with some small probability. As soon as they deviate from this benign operational setting, their decoding guarantees abruptly vanish. Even simple targeted erasures can increase the likelihood of covertly inflicting decoding failures or delays, because an attacker can exploit the code structure to select those erasures that maximally disrupt message recovery.

In practice, targeted erasures can be easily enforced, for instance, if a compromised router selectively erases symbols that go through it. Lopes and Neves [3] recently demonstrated such an attack against a TFTP application for transferring files that had been encoded via a standardized Raptor version. They exploited some weak elements of the code to fully predict its structure and pre-compute the most harmful erasure patterns. Alarmingly, the selected erasures are data independent and in most configurations consist of just a few code symbols, yet, they almost always make the message irrecoverable. The attack can run at line-speed and is very hard to detect because very few symbols are dropped. Moreover, it can effectively disrupt file transfers even if their encoding is authenticated or encrypted i.e., when TFTP runs on top of IPsec/ESP.

We have recently provided a solution to the above problem. In joint work with our collaborators Ari Juels, James Kelley and Roberto Tamassia, we have shown that it is possible to devise a crypto-enhanced coding scheme that achieves both practical efficiency and strong tolerance against targeted errors. Our solution, called Falcon codes [4], is a class of authenticated ECCs that are based on LT codes and enjoy a nice ‘best-of-two-worlds’ quality. In nearly all cases they can correct adversarial symbol corruptions while preserving the main capabilities of standard LT codes, namely fast generation of a ‘fountain’ of code symbols and fast message recovery, as long as sufficient symbols remain intact.

The new design enhances LT codes in two fundamental ways: a secret key is used for data encoding and message recovery, and the code structure is fully protected via a simple combination of a strong pseudorandom number generator (to randomize encoding and disallow its prediction), optionally with a secure cipher (to encrypt symbols and disallow leakage) and an unforgeable MAC (to authenticate symbols and disallow error propagation). When carefully combined with data striping, the new encoding provides scalable variants that can handle large messages, e.g., of several GB. Through judicious use of crypto at the core LT-coding level, Falcon codes also lend themselves to secure extensions of Raptor codes that still achieve high efficiency, e.g., throughput speeds of 300MB/s. Conveniently, Falcon codes can also leverage any security provided by the communication channel (e.g., tunneling over SSH or IPsec) to further limit the required overhead.

Overall, fast, on-the-fly generation of code symbols that guarantee fast message recovery in the presence of limited malicious corruptions is an attractive property that renders Falcon codes useful in many practical scenarios. They can provide significantly stronger security than LT/Raptor codes, while only incurring a small overhead allowing them to remain practical. In particular, Falcon codes can provably harden Raptor codes against the targeted-erasure attacks of Lopes and Neves. They can also provide comparable security to existing attack-tolerant codes, like Reed-Solomon, but with greater efficiency. In particular, when employed as a drop-in replacement for a custom-designed such code that is used in the cloud storage security protocol by Shi et al. [5], Falcon codes result in a 35% speed-up. Their design allows Falcon codes to offer the best of both worlds: security and efficiency.

[1] M. Luby: LT Codes. IEEE Symposium on Foundations of Computer Science 2002: 271.

[2] A. Shokrollahi: Raptor codes. IEEE Transactions on Information Theory 52(6): 2551-2567, 2006.

[3] J. Lopes, N. Neves: Stopping a Rapid Tornado with a Puff. IEEE Symposium on Security and Privacy 2014: 509-523.

[4] A. Juels, J. Kelley, R. Tamassia, N. Triandopoulos: Falcon Codes: Fast, Authenticated LT Codes (Or: Making Rapid Tornadoes Unstoppable). ACM Conference on Computer and Communications Security 2015: 1032-1047.

[5] E. Shi, E. Stefanov, C. Papamanthou: Practical Dynamic Proofs of Retrievability. ACM Conference on Computer and Communications Security 2013: 325-336.

No Comments