Recent blog entries for linergy

Bruce Hauman
03/04/2001

bhauman@cs.wcu.edu

A Simple Symmetric Large-Key Vector Cipher

Abstract:

It is the intention of this paper to propose a method of encryption. In this method the key is a map where the points on the map refer to data values. The size of this key is flexible and can be decided when the key is generated. The message is mapped to the key based on where the data to be encrypted resides in the key. The locations of the data to be found in the key can be recorded to encrypt a message. The encoded message, however, is not a list of positions but is basically a list of directions (vectors) from one point in the key to another point where each point is some data value. This makes it possible for a single message to be mapped to the same key in many different ways. So in effect one message can be encrypted repeatedly with the same key without being encrypted the same way twice. This type of encryption would appear to resist many types of cryptanalytic attacks.

Introduction:

In modern cryptography there are several assumptions which guide the design of an encryption algorithm. It is assumed that the cipher itself is available to everyone. Also it is assumed that the opponent will be able to get a hold of the plaintext (the original message before encryption) along with its ciphertext (the encoded counterpart). Strong encryption therefore is not dependent upon the secrecy of the algorithm at all.

According the the sci.crypt FAQ strong encryption is defined as follows:

  • "A strong cryptosystem has a large keyspace, as mentioned above. It has a reasonably large unicity distance; see question 8.8.

  • A strong cryptosystem will certainly produce ciphertext which appears random to all standard statistical tests (see, for example, [CAE90]).

  • A strong cryptosystem will resist all known previous attacks. A system which has never been subjected to scrutiny is suspect."

There are several attacks that one has to consider when evaluating a cipher. One such attack is the 'known-plaintext' attack. This attack is where the attacker has the plaintext along with the encoded version of a message. The attacker then makes an attempt to determine the key from the relationship between the two documents.

Another attack is the 'adaptive-chosen-plaintext' attack. This attack is where the attacker is able to trick the encoder into encrypting a message of his choice. The attacker tries to determine the key by first analyzing the encrypted message and then choosing another message to be encoded that he feels will lead him closer to finding the original key.

The 'ciphertext-only' attack is where the attacker has only the encoded message and tries to determine the plaintext and or the key only from that ciphertext.

The Encryption Algorithm:

During the explanation of this algorithm the author will use certain abstractions to aid in the explanation. As the explanation proceeds these abstractions will be peeled away to reveal the actual implementation of this algorithm.

This method of encryption is symmetric. This means that it relies on both parties that are trying to communicate to each have a copy of the key which will encode and decode the encrypted data. This is also known as Private-Key encryption.

As an aid to understanding the system we will present the key as a two dimensional array. The array is filled with randomly chosen characters from a set of characters which includes a space-bar (' ') character as well as the characters A - Z.

Here is an example:

Q G   S Y D Y I H O J U P K L B
N V H C B S D F X D A   Q U D O
C S U X C M A F U X H K R Y U I
U X C   J S D H F U S D L K H V
L Z I S U D H   S H V L Z S D H
V L U H S D Q A Z W A G Q G H
Y D K Y I O J U P K L B N V H C
B D F X   A R Q U D O J S U X C
M A F U X H K R   U Z I U X C K
J S D H F U S D L K V L Z I S U
D H K S H V L Z S D   V L U H S

Now every character in this key has its own point or position (x, y). To encode a message with this key a starting point is needed. This location of this point can be considered part of the key itself. Imagine setting this starting point in the center of a circle whose radius is long enough that the perimeter of the circle passes through 432 characters. In other words, from this point in the array there are 432 directions to choose from. The field of characters was generated randomly. So, among these directions there is a high probability that there are 16 representations of each character in some random order.

To encode the message "HELLO" one begins at the starting point in the key and then searches these 432 directions in a random order for 'H' the first character of the message. When one finds 'H' one records the direction to it from the starting point (in this case the direction is a number 1 - 432). Let's say that direction is 134. Now one changes the point of reference from the starting point to the found character 'H' and performs another random search for 'E' the next letter to be encoded from there. When 'E' is found the direction from 'H' to 'E' is recorded. Let's say that direction is 54. Then the point of reference is changed to 'E' and a random search is made for the next letter 'L' and again the direction to the found 'L' is recorded. Again, let's say that this direction is 325 and that the directions to the next letters 'L' and 'O' are respectively 100 and 412. This process is repeated for every letter of the message to be encoded. The recorded directions are the encrypted message. In this case the letters "HELLO" are encrypted as 134, 54, 325, 100, 412.

Now there are some interesting things to observe here. Notice that there are two L's in the encoded word but they do not map to the same encoded direction. In fact you could have a message that was all L's and it would appear completely random even though the encoded message is completely uniform. Let's expand this observation the include the word "HELLO." What does the reader suppose would happen if "HELLO HELLO HELLO HELLO" was encoded? The encoded output would appear completely random. In other words, the frequency of the letters or words used would not affect the encoded message. There would be no patterns resulting from repetitive word use for cryptanalysts to grab a hold of.

Now consider what would happen if "HELLO HELLO HELLO HELLO" is encoded yet another time using the same key and the same starting point. The new encoded message would bear no resemblance to the to the first one. There is a 1 in 16 chance that the first letter will be the encoded to the same direction of a previously encoded message. There is a 1 in 256 chance that the second direction will be the same and there is only a 1 in 4096 chance that the third letter will be the encoded to the same direction of a previously encoded message. The chances that a sequence of letters will be encoded the same way a second time around decreases exponentially 1/(16^n) with the length of the message. Therefore it is doubtful that any message longer than four characters would ever be encoded the same way twice.

The method uses the directions from one letter to another in a key, when from any point there are probably 16 directions to the next letter, to encode a message. This effectively means that no direction or sequence of directions maps directly to any letter or sequence of letters. Every letter maps exactly to all the cumulative directions from the starting point. If the message is "HELLO THERE I REALLY HOPE THAT YOU ARE NOT ABLE TO READ THIS" then the letter 'H' in "hope" maps only to the all of the directions from the starting point to that particular 'H'. The deeper one gets into a message the longer the map is to that particular letter. This means that a part of a message can not be treated as separate and that every part of a message is completely dependent on what came prior to it. This dependence effectively means that each letter of an encoded message is represented by long string of directions. The number of combinations that these long strings of directions can take becomes very large as one moves into the message. So instead of encoding a 64 bit block of message directly to 64 bit (2^64 combinations) block of encryption we are mapping one character to a string of n directions where each direction can be 1 of 432 directions or (432^n combinations).

Vulnerabilities:

In this method the key could be vulnerable to mapping by gathering a large amount of encoded material along with its plaintext. This is possible if one records all the directions and converts them to coordinates by having a knowledge of the distance between points and then placing the plaintext value of the message at that point. This is possible only if the distance between the encoded characters is constant and known. If the distances between the characters are varied according to a predetermined pattern which is also part of the key then the distances between the points would be unknown. This should render the key un-mappable because the encoded directions have unknown distances related to them.

The method is also especially vulnerable to mapping the key around the starting point. This can be remedied by encoding a random number of random characters at the beginning of an encoded message. Which effectively moves the starting point of the message out of this initial mappable area.

Actual Implementation:

The following is just a test implementation that demonstrates one way to to actually put this algorithm to work. The specifics in this example were chosen to make the implementation easy. These specifics can be altered to either increase the effectiveness or the efficiency of the algorithm.

The actual test implementation of this cipher differs from the abstract representation in several ways. First of all, the key does not hold a random array of characters. It holds a random array of bytes (4 bit binary representations of the integers 1 - 16). The actual implementation encodes the individual bytes of a message not the characters. If we would like to maintain the ability to find 16 members of each byte in all the search-able directions around a particular byte in the array we need to have at least 256 search-able directions.

In other words, at each point in the array there is a random integer in the range 1 - 16. From any point in the array there are 256 search-able directions. The number of directions at which one can find one particular byte directly affects the number of ways that one message can be represented. In this example a message of n bytes can be represented 16^n ways.

The total number of directions that one can search for a byte of data directly affects the size of each encoded byte. If one four bit byte is to be represented by a 1 of 256 directions then each byte has to be represented by at least eight bits (2^8).

This implementation inflates any message to double its original size.

The array that these bytes are held in is not a two dimensional array but an eight dimensional array declared as follows:

byte keyArray[size][size][size][size][size][size][size][size];

The variable 'size' determines the size of the array and is another component of the key. The size of the entire key will be 'size'^8. If 'size' is 5 then the actual size of the key will be 390625 bytes plus the negligible size of the starting point, 'size' variable, and distance pattern.

The way a direction could be applied to this array from any particular point is by adding the individual bits of the binary representation of the direction to the corresponding subscript of the array and a modulus operation is performed on the sum to keep it within the bounds of the array. For example if our starting point in the array is the compound index 0,0,0,0,0,0,0,0 or :

keyArray[0][0][0][0][0][0][0][0]

Then one would find the byte at direction 156 from this point by adding the individual bits of the binary representation of 156 (10011100) to the corresponding subscripts of the array as follows:

keyArray[ 0 ][ 0 ][ 0 ][ 0 ][ 0 ][ 0 ][ 0 ][ 0 ]
        + 1  + 0  + 0  + 1  + 1  + 1  + 0  + 0
_________________________________________________
keyArray[ 1 ][ 0 ][ 0 ][ 1 ][ 1 ][ 1 ][ 0 ][ 0 ]

In this implementation there is also a distance as well as a direction. To accomplish this there are two distance patterns which can be thought of as streams of random numbers that are in the range 0 through ('size' - 1). These streams are of different length. These streams are iterated through in order and divided into chunks of eight values. From each stream a chunk of eight values is taken to apply a 'distance' to the chosen direction. What follows is an example of two chunks A and B:

A 2 0 3 1 0 4 3 2
B 3 1 4 0 4 2 3 1

Now the two sets are shown with the binary direction above them.

  1 0 0 1 1 1 0 0

A 2 0 3 1 0 4 3 2
B 3 1 4 0 4 2 3 1

The way a distance is calculated is by choosing the value of the A set in every column where the binary direction has the value 1. Every column where the direction has the value 0 gets the value from the B set.

Dir   1 0 0 1 1 1 0 0

A     2 0 3 1 0 4 3 2
B     3 1 4 0 4 2 3 1

Direction
with  2 1 4 1 0 4 3 2
Distance

This new direction with distance is added to the previous index as follows:

keyArray[ 0 ][ 0 ][ 0 ][ 0 ][ 0 ][ 0 ][ 0 ][ 0 ]
        + 2  + 1  + 4  + 1  + 0  + 4  + 3  + 2
_________________________________________________
keyArray[ 2 ][ 1 ][ 4 ][ 1 ][ 0 ][ 4 ][ 3 ][ 2 ]

One would then perform a modulus operation on these individual subscripts to keep it in the bounds of the array. If the 'size' variable in this case is 5 then the operation would be as follows:

keyArray[ 2 ][ 1 ][ 4 ][ 1 ][ 0 ][ 4 ][ 3 ][ 2 ]
        % 5  % 5  % 5  % 5  % 5  % 5  % 5  % 5 
_________________________________________________
keyArray[ 2 ][ 1 ][ 4 ][ 1 ][ 0 ][ 4 ][ 3 ][ 2 ]

In order to encode a message it is necessary to search randomly through these 256 directions to find the byte to be encoded. This should be done efficiently because the running time of this search has a large effect on the running time of the algorithm. Basicly we need a function that will iterate randomly through the integers 1 - 255 without repeating. The author doubts that the following is an original algorithm. He includes it because it makes the implementation of this encryption algorithm more efficient. He also has not made an attempt to find a better one which he is sure exists.

The algorithm initializes a 256 element array to the values 0 - 255. It then randomly generates a number in the range 0 - 255. The generated value is used as an index into the array. The value that is found in the array at that index is the random value to be returned from the function. Here is the important part. The value in the array at that index is replaced by the last value in the array. Then the next call to the function generates an index in the decremented range 0 - 254. Again the value at that index is to be returned. The value that is returned is replaced by the value in the array at the index 254. 254 is the largest index of the decremented range. This process is repeated until all integers in the range 0 - 255 have been returned.

Again there are probably 16 representations of each byte in the 256 possible directions. Therefore the average search will consist of around 16 calls to the function that returns random indices.

After finding the direction to a particular byte, the direction is recorded as a part of the encoded message. The next byte to be found is sought after from the location of the last byte found taking into account the next distance in the distance pattern. It's direction is added to the encoded message. This process is repeated until the message is completely encoded.

To decode the message all one needs to do is to start with the starting index and add the first direction of the encoded message after it has been modified by the distance pattern. This will be the index of a byte that matches the first byte of the original message. Then the next index is calculated by adding the the second direction-distance to the current index. This process is repeated until the message is decoded.

The running time of this encryption algorithm for both encoding and decoding is linear. Decoding is extremely fast because it consists mainly of looking up values in an array. Encoding is a bit slower because of the search that has to be made for each byte but on average the search only goes through 16 iterations.

So, over all the algorithm is definitely fast.

Is this Algorithm Strong?:

It is beyond the current ability of the author to determine whether or not this is a strong cipher. At the same time the author would like to point out why he feels it would be based on the definitions strong encryption found in the sci.crypt FAQ.

The properties of a strong cipher according to the sci.crypt FAQ are listed at the beginning of this paper. The first of these was that the cipher must have "a large keyspace" where the keyspace is the number of distinct keys. This system does have a large keyspace. First put aside the fact that the keys can range in size which in effect provides an even larger keyspace. Then consider that the 'size' variable is set to 5. So the size of this key is 5^8 or 390625. Now if there is an equal representation of each byte in the key then one could represent the keyspace for a key of this size as 390625!/((24414!)^16) minus all the cases where there are consecutive numbers. This is a very large keyspace indeed. A brute force attack would never be able to iterate through that keyspace.

The system also needs a "reasonably large unicity distance." Unicity distance according to The Handbook of Applied Cryptography "is the minimum amount of ciphertext(number of characters) required to allow a computationally unlimited adversary to recover the unique encryption key." The unicity distance for this system appears to be huge. There seems to be no reasonable way to generate an entire key from say a paragraph of encoded messages. Of course this is highly dependent on the way that random numbers are generated but it is assumed that the random number generator is a secure one. When compared to the fact that the unicity distance for DES is 17.5 characters(according to the FAQ) the unicity distance of this method could be very large.

A strong system "will certainly produce ciphertext which appears random to all standard statistical tests." Since the cipher text in this case consists of directions, all of which are equally probable from any point in the key it would seem that this requirement is met as well.

It was also stated that a strong cipher "will resist all known previous attacks." The three attacks mentioned were the ciphertext-only, the known-plaintext, and the adaptive-chosen-plaintext.

The author is only speculating, but a ciphertext-only attack would seem very difficult for several reasons. Letter, word or pattern frequency is not transparent in the encoded message. One can encode n bytes in 16^n different ways. That number does not include the many variations of encoding the message provided by the distance pattern. Another reason a ciphertext-only attack is improbable is that the key space is very large and impossible to iterate through.

A known-plaintext attack will most likely be difficult. This is because having the plaintext next to an encrypted message in this case is almost meaningless because of the many different ways that the text could have been encoded. Now if the distance between the points in the array was static and known then the key would be mappable given a large amount of plaintext. However, the distances are unknown and vary according to a pattern that is as unknown as the key.

An adaptive-chosen-plaintext attack will be difficult as well for the same reasons that the known-plaintext attack will be. The varied distances are important in preventing the mapping of the key.

Conclusions:

It seems to the author that this encryption method may have merit.

It is very flexible with regard to key size. The limitations are really only the limitations of the machine on which it is implemented. This means that the key can grow as the necessity arises. This can be done without changing the implementation. In other words if tomorrow there is a super computer that can iterate through a really large keyspace then the next moment you can have a key that cannot be iterated through.

It is also very easy to increase the relative strength of the algorithm by changing the mapping. In the example a 4 bit value was mapped to an eight bit direction. But we could just as easily mapped a 1 bit value to a 16 bit direction. In other words from any point we could chose 65536 directions and half of them would be 1's while the other half would be 0's. Now that would definitely be a hard message to perform cryptanalysis on. It is also possible to map say 7 bits to a 9 bit direction and thereby decreasing the amount of message inflation.

The major weakness of the method is its heavy dependence on random number generation.

Sources:

Bach, Eric, Steve Bellovin, Dan Bernstein, Nelson Bolyard,
Carl
  Ellison, Jim Gillogly, Mike Gleason, Doug Gwyn, Luke
O'Connor,
  Tony Patti, and William Setzer.  sci.crypt
Cryptography-faq.
  Last-modified: 06/27/1999.  Accessed on 02/25/2000.
 
http://www.faqs.org/faqs/cryptography-faq/part01/index.html

Menezes, Alfred J., P. C. vanOorschot and S.A. VanStone. Handbook of Applied Cryptography. New York: CRC Press, 1997.

I came up with an encryption scheme and I am wondering if the idea is original or has any merit. I have neither taken a class on encryption or read a book on the subject. I am hoping someone out there has and can tell me what they think.

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!