Authenticity and Confidentiality: Building Blocks of Digital Character

When J.P. Morgan was asked one day "What is the first factor of credit worthiness, income flow or asset balance?", he responded: "No sir, the first thing is character."

Professional relationships require character. And building and maintaining character requires truthful communication. So, when others eavesdrop, tamper with sensitive material, impersonate a party you rely upon, masquerade with your identity, repudiate statements of fact, disclose confidential information, or in any other way impede your ability to clearly communicate the truth -- then your character, and thus your ability to participate in a professional world of trust, is under attack.

In everyday interactions, we protect our communication. We regularly verify authenticity. Picture identification is required for writing a check or boarding a plane. If something is questionable, we look for reassurance in shared experience, by asking about name of a common acquaintance or for the location of a building. And, in some circumstances, we check a fingerprint or listen closely to voice. In a similar manner, establishing confidentiality comes naturally. Privacy is as simple as shutting a door. An envelope signed across the back seals a letter. An inside joke carries private meaning in public. These techniques are obvious, easy to administer, and require significant effort to foil. As we regularly use them to protect our character and the character of others, they become habit.

On the Internet, similar habits to protect communication have yet to mature. The Internet was created to enable efficient communication of shared knowledge. Constant authentication is not efficient and confidentiality inhibits sharing. Every message sent across the Internet is open to snooping. Impersonating plain text e-mail is easy to do. There is no independent verification of message arrival. Communication in transit can be fabricated, altered, eliminated, recorded, and replayed. In short, security on the Internet is not automatic, it must be added.

Current digital security, such as the firewall and password, is based on the existence of clear boundaries. They are usually all-or-nothing approaches. Once an attacker passes through the boundary, it can move unchecked from system to system within an organization. This approach does not always work. A firewall can often be fooled by disguise and passwords are either easy to guess or too hard to remember. Furthermore, the world is moving from protected islands of trust to an Internet where security is important everywhere -- borders are more numerous and far less clear.

To establish and maintain character on the Internet, what we need is a collection of techniques analogous to those we already use in everyday life -- those which verify authenticity and establish confidentiality.

Communication

Before we continue, it is important to review basic communication theory. In the familiar model, all communication follows the Sender, Message, Receiver pattern as shown below.

Sender ---> [Message] ---> Receiver

Unfortunately, this simple model does not reflect an essential element, code. A code is a collection of symbols and rules for attaching meaning to those symbols. A language, French, English, etc., consists of at least two codes, a spoken code and a written code. The symbols for a spoken code are various sound forms which are woven together during speech, the symbols for a written code are letters which are assembled on a page. To learn how we can verify authenticity and establish confidentiality, we must study how messages are transformed from one code to another.

In 1945, Shannon and Weaver, presented a different model of communication, which focuses on three processes, and four distinct states for text being communicated:

[Source] --> [Message] --> [Message] --> [Destination]
(Encode) (Transmit) (Decode)

The example used was a telegraph, where a letter is sent from one location to another over a wire. A human on the sending end presses down a lever in short or long pulses, character by character, as the letter is read. The lever causes an electric signal to travel over the wire to a person on the other end. This person then reproduces the letter, character by character, on a blank page. The original letter to be sent is the source document, the message is the sequence of dots and dashes transmitted over the wire, and the destination is the reproduced letter. If the source and destination letters are to match, the person doing the encoding (characters on the page to dots and dashes), and the person doing the decoding (dots and dashes to characters on the page) must agree on the mapping. This agreement is called morse code, it defines which dots and dashes represent which characters.

A modern and far more automated example is the telephone. Consider someone who sends "HELLO" to their friend over the telephone wire. On the sending end, sound waves are picked up by a microphone in the bottom part of the handset and encoded into electrical impulses. These impulses are transmitted over the telephone wire to the receiving party, where the speaker in the upper part of the handset decodes the electrical impulses into sound waves, which the listener can hear as "HELLO". Although much more granular, this is also an example of a code. The handset does both the encoding and decoding, from sound to electric impulses and back to sound.

In communication, we use different codes for different reasons. A written code has persistence and can be transported, morse code can be used to send letters over a wire, etc. In each case, by transforming a text to an alternative code, a transmission process with different characteristics becomes available.

Cryptography

Cryptography is the study of how people can use code to obtain a private communication over a public channel. A code used for this purpose is a crypt. In a similar way, encoding is called encrypting and decoding is called decrypting.

[Source] --> [Message] --> [Message] --> [Destination]
(Encrypt) (Transmit) (Decrypt)

Pig Latin is an example of everyday cryptography. It allows source text to be converted into a less obvious form for distribution in public. To encrypt source text to a message in Pig Latin, the first syllable of each word is moved to the end, and the sound ay is added. To decrypt Pig Latin message into a destination text, you do the opposite. For each word, you drop the ending sound ay, extract the last syllable and move it to the front of the word. This is a simple, but effective crypt.

We are all born with the ability to transform meaning from one code to another, we require only experience to enable those transformations. In a similar way, computers can transform from one code to another if they are programmed to do so. Such a program is called a cipher. To make a cipher reusable, it is configured much like a lock, requiring a key to determine its specific operation. The correct key gives a correct transformation, an incorrect key gives an invalid transformation. To stretch the analogy, Pig Latin is just like a key, it tells us the specific operation to perform for proper encrypting and decrypting. Thus if someone does not know that for each word, ay must be dropped and the last syllable of each word moved to the first, then they will be unable to decrypt messages encrypted using Pig Latin. As a rule of thumb, the longer the key, the more steps are required to encrypt and decrypt, and thus the harder it is to guess the key and decode the message. Today, a 128 binary digit crypt strike the balance: it takes years of computer time to break, and only a fraction of a second to use. In general, the business rationale to cryptography is not to make decrypting impossible, but to make it so expensive that the costs far outweigh any benefit.

A key which enables both encryption and decryption is called a symmetric. Pig Latin is symmetric in nature, if you know the process, you can both encrypt and decrypt. Although a symmetric crypt is great for private communication over a public channel, it can only work if both parties know the "secret" key beforehand. However, to send a secret key, private communication is required! This chicken and egg problem was solved by Diffie and Hellman in 1975 when they introduced an asymmetric crypt which uses a pair of mathematically linked keys. If one key is used for encrypting, then the other key becomes required for decrypting, and vice versa. Their invention is based on prime numbers and modular algebra, the cipher is thought to be secure since factoring large numbers into primes is computationally difficult.

Both symmetric and asymmetric ciphers allow private transmission over public channels. Symmetric cryptography is often called "secret key" cryptography, since the method requires that both parties share a key, not known to the public, before the communication can proceed. Asymmetric key pair cryptography is often called "public key" cryptography since the method does not have this restriction. One key from the pair, the public key, is distributed in the clear and the other key in the pair, the private key, is securely retained.

Back to Confidentiality (Encrypted Messaging)

To establish confidentiality with a second party over a public communication channel, an asymmetric crypt is used. If the first party wants to receive confidential communication from the second party, they distribute, unencrypted over the public channel, one key from their asymmetric key pair, the public key. The second party can then use this public key to encrypt a source text into a message which can be sent over the public channel to the first party. This message can the be decrypted using the retained, private key. This process also works the other way, if the second party wants to receive confidential information from the first party, they must distribute their public key. The first party can then encrypt the source with the public key received, sending the resulting message back. At this point, the second party uses their private key to decrypt the message. The communication is confidential since a third party can intercept both the public keys and messages if they wish, however, they cannot break into the private communication since they do not have the corresponding private keys!

Unfortunately, due to the algorithm, asymmetric (public key) cryptography is far more computationally expensive than symmetric (secret key) cryptography. So, for bulk transmissions, things are a bit different. Instead of sending a long message encrypted with a public key, the two parties use their public keys to negotiate a secret key. Then bulk transmission can use uses the shared, computationally cheaper, secret crypt. For even more secure communication, this secret key could be changed several times during the transmission by using the public key, much like switching languages during a conversation, then if someone breaks one language, they will only get a portion of the communication -- anything to make decrypting so expensive that it isn't worth the effort.

Authenticity (Signed Messaging)

Verifying authenticity over a public channel is a bit more surprising, asymmetric cryptography is used in reverse! The private key is used to encrypt the message and the corresponding public key, is used to decrypt the message. So, if the first party wants to send authentic communication to the second party, they encrypt the source text with their private key and publish the encrypted message, along with their public key onto the public channel. In this way, anyone can obtain the public key and read the message, but only the person with the private key is able to create such messages. So, unless the private key is compromised, the recipient, both the second and any intercepting third parties can be certain that the message is authentic.

However, we have one slight problem. For backwards compatibility, the message should be sent without being encrypted. So, an alternative method is actual practice. The sender first creates a small identifier "hash" from the text, one that changes dramatically when the message is changed slightly. For example, a simple (and very poor) hash could assigns all occurances of A to 1, B to 2, etc., the hash is then the sum, so the value for "ABC" is "6". In practice, much more complicated hash functions are used, so that "ABC" does not have the same hash as "CBA". Regardless of method, this hash value is encrypted in lieu of the message. Then both the source text, which may be unencrypted, and the encrypted hash value, called a digest, is sent together. This technique results in what is called a signed message. In this way, if the recipient questions the authenticity of a signed message, they can compute the hash on their own, decrypt the digest to obtain the hash value, and compare them. If they do not match, then the message was altered somewhere in the communication channel.

With this alternative method of encrypting a hash value with the sender's private key, we gain one other significant benefit, we can encrypt the source using the recipient's public key. With this variant, the communication gets the best of both worlds, its authenticity can be verified by using the sender's public key and its confidentiality is established, since the recipient's private key is required for decryption.
Certificates

Conclusion

When used together, encrypted and/or signed messages form the basis is what is called the public key infrastructure. These two methods allow the confidentiality to be established and authenticity to be verified on a as-needed basis. Together they allow for digital certificates, issued by organizations which can formally verify identity, and they allow the digital equivalent of a notary public. May other analogies to real world techniques are being implemented, such as digital vaults, message delivery receipts, and other trusted third party arrangements. Public key infrastructure provides the building blocks for the creation and maintenance of digital character.

Examples

Verisign is an organization who issues digital certificates on the Internet. Certificates can be looked up in a public access directory by legal name to obtain a registrant's public key. This reinforces authenticity since the sender of a signed message may first register with this trusted third party, who does limited background checks

Secure Socket Layer (SSL) is probably the most common example. When you go to a web site, and want to make a secure transaction, your browser asks for an SSL connection. In return, the server sends its public key to the browser. The browser then creates a secret key and sends it back to the server. This secret key is then used to encrypt the remaining session.

Secure Multi-purpose Internet Mail Extension (S/MIME) is another up-and-coming example. With many new e-mailers, it is possible to send signed messages. If the recipient has a mailer which understands how to verify digests, then it will, otherwise the digest is ignored. S/MIME also allows encrypted messages to be sent to those who have published their public key as a certificate.