Stream cipher
In cryptography, a stream cipher is a symmetric cipher which encrypts data by combining the plaintext with pseudorandom data to generate the ciphertext. The pseudorandom data is provided by a pseudorandom number generator whose output depends on a cryptographic key. To decrypt, run the generator with the same key to generate the same pseudorandom data, then reverse the combining operation to convert ciphertext back to plaintext.
Most stream ciphers use bitwise exclusive OR as the combining operation. This is convenient since XOR is its own inverse (abb=a for any a, b), so encryption and decryption use exactly the same operations. However, any operation that has an inverse may be used. Some stream ciphers use bytewise (mod 256) addition to encrypt, subtraction to decrypt. Solitaire is intended to be used manually rather than on a computer; it uses addition and subtraction mod 26 to encrypt at letter level.
The hard part of stream cipher design is the pseudorandom generator. An enemy who knows or guesses some plaintext and has intercepted the matching ciphertext can easily get some of the pseudorandom data. It must be very difficult for an enemy who has part of the pseudorandom data to make any inferences about other parts of it. If he could make such inferences, that would reveal information on other parts of the plaintext. It should be effectively impossible for an enemy — even with a huge sample of output — to infer the internal state of the generator; that would break the cipher completely.
It is essential that the period before the pseudorandom output begins to repeat should be very long. Without this, a variant of code book attack can break the cipher; an enemy with enough known plaintext can collect the whole pseudorandom stream, or at least enough of it to do damage. Some designs combine several generators with relatively short periods to achieve a long period overall; the base periods should be relatively prime for this to work well. Some ciphers allow the input plaintext to affect the generator state, a technique known as auto-keying; this breaks up the cycles.
It is normal practice to re-key any cipher from time to time to prevent an enemy from acquiring too much data encrypted with a single key. Such an accumulation facilitates some attacks and makes the payoff for breaking a key quite large. In the case of stream ciphers, the re-keying interval should be much shorter than the cipher's period.
A one-time pad is in effect the ultimate stream cipher. It requires a truly random key as long as the data to be encrypted, which makes it impractical for many applications. Given such a key, however, there is no need to generate pseudorandom material and therefore no need to worry about the quality of the generator.
One way to look at a one-time pad would be to consider it a stream cipher with a perfect generator and infinite period. One way to look at a stream cipher would be to say that, as the generator is improved, the security asymptotically approaches the provable security of a one-time pad. Of course it can never reach that, but with a good generator it may be close enough for practical purposes.
The other main type of symmetric cipher is a block cipher which encrypts data in fixed-size blocks rather than generating an arbitrary-length stream.
Stream ciphers from other primitives
Any pseudorandom number generator which accepts an initialisation value or key can be used to make a stream cipher. However, the stream cipher will be insecure if the period is short, if key is too small (see cryptographic key), or if the generator design is not adequate in cryptographic terms. Given some of the generator's output, it must be very difficult for an enemy to make inferences about previous output or later output. In particular, it should be completely infeasible for him to determine the internal state of the generator; that would give him all future output.
Any stream cipher can also be used as a pseudorandom number generator. For example, RC4 is used in the OpenBSD random device. Going in this direction, there are fewer design issues. Any generator that is adequate for stream cipher use will be fine for any other application, provided it is intialised properly and re-initialised often enough.
Any block cipher can be used to construct a stream cipher; there are block cipher modes of operation designed for this. Either output feedback mode or counter mode may be used.
In theory, any hash algorithm could be used to construct a stream cipher by repeatedly hashing some sort of buffer which would initially include the key and might include a counter. Only part of the hash output would be used as pseudorandom output, so that an enemy would not have enough data to determine the hash state; the excess hash output would be fed back into the buffer. This technique is not used in practice for stream ciphers since the resulting cipher would be slower than other techniques. However, it is common in operating system random number generators.
Shift register stream ciphers
Many stream ciphers are based on shift registers, usually linear feedback shift registers or LFSRs, though other types have been used. These are easily implemented in hardware and have been a common device in secure communications systems, especially military systems, for decades. They have been extensively analysed and the theory is well understood. Applied Cryptography [1] gives much of the theory and a catalog of implementations.
There are two equivalent ways to implement an LFSR. Here we describe the Galois configuration; see the LFSR article for the Fibonacci configuration.
In the Galois method, the shift register is initialised by loading the key into it. You then loop forever
output the lowest bit of the register shift the register right one bit if the output was a one, XOR a constant into the register
With a register of length L and a well-chosen constant (Applied Cryptography has a table with recommended values [2]), this outputs 2L-1 bits before it repeats. Longer periods are easily obtained; just use two or more registers whose periods are relatively prime and XOR their outputs together. The period is then the product of the individual periods.
LFSRs alone, however, are useful as random number generators in some applications but are not effective as ciphers; they are too easily broken. The "L" in LFSR is for linear, and that is a highly undesirable property in a cipher. For an LFSR of length L, the Berlekamp-Massey algorithm can determine the register's internal state from 2L output bits. XORing the output of several LFSRs does not help much; that construction is provably equivalent to a single longer shift register, and Berlekamp-Massey breaks that. Various other simple tricks fail in a similar way.
However, techniques are available to avoid this difficulty. The most common is to use some non-linear function to combine LFSR-produced bits and generate the output; this may be applied either to a sequence of bits from a single LFSR or to bits taken from two or more LFSRs. It is also possible to clock an LFSR in such a way as to make the output sequence non-linear.
Specific stream ciphers
Many stream ciphers are built into hardware, protecting things like military radio or the links between a bank's computers and its ATM machines. Details of such systems are generally not made public, so our examples below almost certainly omit some widely deployed stream ciphers.
RC4
RC4, Rivest Cipher number four, was designed by Ron Rivest. It has a size parameter; the 8-bit version is in widespread use. This generates pseudo-random data one byte at a time and maintains a 256-byte internal state. The combining operation is XOR. The key can be any size up to the state size, 256 bytes or 2048 bits for the 8-bit version.
RC4 is quite simple to implement in software and is very widely used. Among other applications, it is used (at least as one option) in Internet protocols such as TLS (RFC 2246) for secure web browsing and SSH (RFC 4251) for secure remote login. It is also use in WEP wireless networking, in PPTP (RFC 2637), and in many other applications.
RC4 was originally proprietary to RSA Security; that company has never released a specification of the algorithm. However, "Alleged RC4" or "ARC4" or "Arcfour" has appeared on the net.
Solitaire
Solitaire was designed by Bruce Schneier and is used by characters in Neal Stephenson's novel Cryptonomicon. It is a manually operated stream cipher whose key is a shuffled deck of cards, designed for use by people who may not have access to a computer and do not want to risk being caught with incriminating cipher equipment. It uses arithmetic mod 26, convenient for 26 letters in the English alphabet and 52 cards in a deck.
A5
A5 is one of the ciphers used in GSM cellphones. There are two widely used versions, A5/1 and A5/2. A5/1 uses three LFSR shift registers and make the overall system non-linear by having clocking depend on some of the registers' middle bits. One bit is selected from each register; if all three bits match, then all three registers are clocked in the next round. If only two match, only those two are clocked. A5/2 has four registers. Both of these have been broken completely.
A newer version, A5/3, is now part of the standard but not widely deployed. It is a different design, based on a block cipher.
Fish and Pike
Fish [3], named from fibonacci shrinking, was a stream cipher based on the idea of a lagged Fibonacci generator. This uses three arrays of 32-bit words and relations similar to those in the Fibonacci sequence, which is generated by successive applications of xi = xi-1 + xi-2. In the cipher, however, the right side has more than two terms and the coefficients are not just i-1 and i-2, but things like i-7 or i-39. The coefficients and array size are different for each array. Fish was quite fast in software and supported large key sizes.
Ross Anderson broke Fish and, in the same paper [4] proposed an improved version called Pike. In Fish, the clocking depends on the lowest bits of the sum. Anderson made it depend on carry bits instead, which prevents the attack he had on Fish. The rule used is similar to that for A5; if all three carry bits match, all arrays are clocked and if only two match, two are clocked. The stream cipher output is the XOR of the three sums.
Estream ciphers
The Estream project [1] was a European search for good new stream ciphers, 2004-2008. It ran in three phases, each shortening what was initially a list of over 40 candidate ciphers, and eventually gave two portfolios of ciphers, four ciphers (HC-128, Rabbit, Salsa20/12 and SOSEMANUK) for software implementations and three (Grain v1, MICKEY v2 and Trivium) for hardware. The linked site has details on all candidates.
The eSCARGOT (European stream ciphers are ready to go) project [2] has chip implementations of all the 15 phase three hardware profile ciphers.
HC-128
HC-128 [3] is a cipher from Hongjun Wu. It uses two arrays, each of 512 32-bit entries. 32 bits are generated for each iteration. It takes a 128-bit key and 128-bit initialisation vector (IV). The designers claim security up to 264 bits of generated keystream. There was also an HC-256 cipher with key, IV and arrays all twice as large.
Rabbit
The Rabbit stream cipher [4] was proposed by its authors for both Estream profiles; they suggested it would be useful for either software or hardware implementation. However, Rabbit made it into the final portfolio for software-implemented ciphers, but not the final hardware portfolio.
Salsa20/12
Dan Bernstein has devised a whole family of Snuffle or Salsa stream ciphers [5] with different parameters. Salsa 20, or Snuffle 2005 [6], is the member of that family entered in the Estream competition. There is a variant called XSalsa 20, extended with a larger nonce.
SOSEMANUK
Sosemanuk [7] is a software-oriented stream cipher from a French team. It uses design ideas from an earlier stream cipher Snow 2.0 and some transformations borrowed from the block cipher Serpent.
Grain v1
Grain [8] was designed as "A stream cipher for constrained environments".
MICKEY v2
Mickey v2 [9] is a strengthened second version of a stream cipher called Mickey for Mutual Irregular Clocking KEYstream generator. It is designed for constrained environments and uses two 100-bit shift registers. It takes an 80-bit key and is designed to be secure for up to 240 bits of generated keystream.
Trivium
Trivium [10] is a synchronous stream cipher designed to generate up to 264 bits of key stream from an 80-bit secret key and an 80-bit initial value (IV). The state is an array of 288 bits. Each step takes 15 bits from the state, updates three state bits and generates one output bit. The state is rotated between steps.
Py
Py [11] is a Hebrew name; it is pronounced "Roo" as in "kangaroo". The two designers were the Israeli Eli Biham and the Australian Jennifer Seberry. The cipher was designed for the Ecrypt stream cipher contest, Estream. They have a page on Py with much detail. [12] Py was one of the fastest candidate ciphers. It introduced a new technique which the developers called "rolling arrays". This somewhat resembles a rotor machine, but on a larger scale and using operations which are efficient in software.
There have been a number of published attacks. see links on Estream page. However, the originators deny that Py is broken because their stated design goals included security for up to 264 bytes of output and the attacks require more than that amount of data. Despite this, Py was not chosen for consideration in the contest's third and final round.
There is a smaller version called Py6 and the same authors have produced later variants called TPy, TPypy and TPy6. Some of the attackers have proposed variants [5] which block their attacks; these are called RCR32 and RCR64.
Restrictions and weaknesses
All of the problems described below for stream ciphers are also issues for one-time pad systems.
Maintaining synchronisation
A design problem for stream cipher implementations is how to maintain synchronisation; if the encrypting and decrypting devices get out of sync, generating different pseudorandom output, then communication is lost. In many applications, this is an easy problem but in some cases it can be quite difficult. Consider battlefield communications with noise everywhere and an enemy who actively attempts to disrupt your communications.
Output feedback stream ciphers — ones such as Solitaire or RC4 in which the internal state changes with every output operation — are fragile in relation to this. Make a single error and the state will be incorrect, so all further encryption or decryption will be wrong. Moreover, there is no way to recover short of negotiating a new key or restarting the whole process from the beginning with another copy of the original key.
Reusing pseudorandom material
There is one essential usage restriction: never encrypt two different messages with the same pseudorandom data, or even the same truly random data.
Using P for plaintext, C for ciphertext, R for (pseudo)random, and for XOR this makes the encryptions:
C1 = P1R C2 = P2R
The enemy can intercept both ciphertexts, so he does:
X = C1C2 = P1RP2R
and the Rs cancel out!
This is extremely good for the attacker who now has:
X = P1P2
This is very weak indeed. Because the Rs cancel out, the attacker gets the same result no matter what R is used, so the encryption no longer has any real effect and the quality of the random numbers becomes irrelevant.
From that point on, the attacker need not worry about the encryption, only about properties of the plaintexts. If he knows or can guess part of either plaintext, he can immediately infer the corresponding part of the other. A zero byte in X means the corresponding bytes in P1 and P2 are equal. Other simple relations exist and can readily be exploited. He can also play a sort of ping-pong; make some inferences or guesses about one text, see what that tells him about the other, make some inferences there, see what that tells him about the first text, and so on.
Given moderately strong knowledge of plaintext properties — for example, that it is English or Russian text — P1P2 can be broken using techniques such as frequency counting that have been well-known for centuries and can be done with pencil and paper; you don't even need a computer. See History of cryptography for details. Even with weaker knowledge of the input format, it is still easily breakable.
Of course in a real application, such as the NSA and British attack on Soviet one-time pads called VENONA, the attack may be far more difficult. Suppose you have a substantial archive of intercepted messages and you know or suspect that the enemy sometimes reuses the random material, There is still a huge amount of work to do to discover what was reused where.
On the other hand, the attack might be easy. In some versions of Word and Excel, Microsoft mis-used RC4 [6]; when multiple versions of an encrypted file were saved, the same pseudorandom stream was used each time. In this case the attacker has more to go on since he knows the texts are multiple versions of the same document. Also, there may be more than two versions.
Microsoft also mis-used RC4 in their first version of PPTP, the point-to-point tunneling protocol used to implement VPNs [7]. They used the same key for encryption in both directions on a connection; the attacker can just XOR the two message streams together. This was fixed in their next version of the protocol.
There is a rather nice graphical illustration of this attack at Cryptosmith.
Rewrite attacks
Stream ciphers (including one-time pads) are inherently vulnerable to a rewrite attack. If an attacker knows some plaintext and has intercepted the matching ciphertext, then he can discover that portion of the pseudorandom data.
This does not matter if the attacker is just a passive eavesdropper. It gives him no plaintext he didn't already know and we don't care that he learns some pseudorandom data. We will never re-use that data, for reasons given in the previous section. For a one-time pad, having a portion of the truly random key tells him absolutely nothing about the rest of the key. For a stream cipher, having some of pseudorandom generator's output does not allow any effective attack if the generator is well designed.
However, an active attacker who knows the plaintext can recover the pseudorandom data, then use it to encode whatever he chooses. If he can get his version delivered, and accepted as genuine by the recipient, this is a disaster. If you send "attack target 372", the delivered message can be anything the same length; perhaps the enemy would prefer "withdraw East now". An active attacker with only a reasonable guess at the plaintext can try the same attack. If the guess is correct, this works and the attacker's bogus message is delivered. If the guess is wrong, a garbled message is delivered.
The attack is completely blocked if an effective cryptographic authentication mechanism is used at either packet or message level. That mechanism detects the bogus message and either discards it or warns the user. Note, however, that while authenticating the players at channel setup time is a required step to prevent man-in-the-middle attacks, it does not prevent this attack. To stop rewrite attacks, you need authentication later on, at packet or message level.
Even ignoring whatever authentication is used, this attack is generally not practical against high-speed systems with a good synchronisation mechanism, such as military systems or network encryption boxes. Performing the attack in real time — fast enough to get a bogus message delivered without losing synchronisation — would be very difficult. Systems such as TLS that use a stream cipher for a slower and less synchronised connection are more vulnerable; they definitely need authentication. Systems such as Microsoft Word encryption which apply stream ciphers to static data are even more vulnerable.
References
- ↑ Schneier, Bruce (Second Edition, 1996), Applied Cryptography: Protocols, Algorithms, and Source Code in C, John Wiley & Sons p 369
- ↑ Schneier, p 376
- ↑ Uwe Blöcher & Markus Dichtl (1994), Fish: A fast software stream cipher, Springer-Verlag, DOI:10.1007/3-540-58108-1_4
- ↑ Anderson, Ross J. (1995), On Fibonacci keystream generators, Proc. Fast Software Encryption 1994, Lecture Notes in Computer Science, vol. 1008, Springer-Verlag, DOI:10.1007/3-540-60590-8_26, at 346–352
- ↑ Gautham Sekar, Souradyuti Paul & Bart Preneel (2007), Related-key Attacks on the Py-family of Ciphers and an Approach to Repair the Weaknesses
- ↑ Hongjun Wu (2005), The Misuse of RC4 in Microsoft Word and Excel
- ↑ Bruce Schneier and Mudge, Cryptanalysis of Microsoft's Point-to-Point Tunneling Protocol (PPTP), ACM Press