Hiding Data in Data

Does your encrypted data stick out like a sore thumb? Try these techniques for hiding it.

Gary C. Kessler

March 24, 2002

9 Min Read
ITPro Today logo

Further increase the security of your communications with steganography

Cryptography—the science of writing messages in secret code—addresses all the elements necessary for secure communication over an insecure communications medium: namely, privacy, confidentiality, key exchange, authentication, and nonrepudiation. However, cryptography doesn't always provide safe communication.

Consider an environment in which the use of encrypted messages causes suspicion. If a nefarious government or an ISP is looking for encrypted messages, it can easily find them because encrypted data sticks out like a sore thumb. For example, Figure 1, page 38, contains text that's encrypted with Pretty Good Privacy (PGP), a popular email-encryption program. The government or ISP would easily spot this encrypted message because it's nonsensical to casual readers. In addition, the characters that make up the message appear at random and don't adhere to the relative frequency counts that you'd expect in a plaintext message (i.e., a large number of the more commonly used letters of e, t, o, and a, and a small number of the less commonly used letters of q, w, and z).

Steganography is the science of hiding information. The goal of cryptography is to make data unreadable by third parties, whereas the goal of steganography is to hide the data from third parties. Although many different steganographic methods are available, the basic process is the same. Let's look at the various steganographic methods, the steganographic process, and how to hide data manually and with an automated program.

Steganographic Methods
If you've watched a lot of spy movies, you're probably familiar with such steganographic methods as using invisible ink, embedding hidden messages within text, and using microdots. (The sidebar "Other Forms of Steganography" details some of these methods.) With computers and networks, you can hide information many other ways, including the following:

  • You can hide files in plain sight. For example, you can hide a file by giving it an important-sounding filename such as system_save.exe and placing it in the C:winntsystem32 directory.

  • You can use covert channels. Loki and some Distributed Denial of Service (DDoS) tools use the Internet Control Message Protocol (ICMP) as the communications channel. Similarly, you can use ICMP to hide information.

  • You can use digital watermarking, one of the most widely used steganographic applications. Historically, a watermark is the replication of an image, logo, or text on paper stock so that the source of the document can be at least partially authenticated. A digital watermark can accomplish the same function. For example, on a Web site, a graphic artist might post sample images that have an embed-ded signature. If someone plagiarizes an image, the graphic artist can reveal the signature to prove ownership of that image.

  • You can hide information in image or audio files you post on the Web or send in an email message. People often use this form of steganography in conjunction with cryptography to doubly protect the information. The information is encrypted, then hidden so that intruders have to first find the information (an often diffi-cult task in itself) before they can decrypt it.

The Steganographic Process
The basic steganographic process follows a generic formula:

cover_medium + hidden_data + stego_key = stego_medium

where cover_medium is the file in which you hide the hidden_data, which you've encrypted with the stego_key. The resultant file is stego_medium, which will be the same type of file as cover_medium. Cover_medium (and thus stego_medium) are typically image or audio files. In this article, I focus on image files, so I refer to cover_medium and stego_medium as cover_image and stego_image, respectively.

Before I discuss how to hide data, I want to discuss how images are stored. An image is a binary file that contains a binary representation of the color or light intensity of each picture element (pixel). Images typically use 8-bit or 24-bit color. An 8-bit color image uses 8 bits per pixel and provides up to 256 colors in a palette. A 24-bit color image uses 24 bits per pixel and provides a much better set of colors (more than 16 million colors). In 24-bit color images, each pixel is represented by 3 bytes, with each byte representing the intensity of the three primary colors of red, green, and blue.

The HTML format for specifying colors in a Web page often uses the 24-bit format. To specify a color, you use a six-digit value consisting of three hexadecimal numbers that represent the amount of red, green, and blue, respectively. For example, suppose you want to specify the color orange, which consists of 100 percent red, 50 percent green, and 0 percent blue. First, you need to specify 100 percent of the red byte:

binary 11111111 =  decimal 255 =   hex FF

Next, you need to specify 50 percent of the green byte:

binary 01111111 = decimal 127 = hex 7F

Finally, you need to specify 0 percent of the blue byte:

binary 00000000 = decimal 0 = hex 00

So, in the HTML code, you would specify #FF7F00 as the color. The number sign (#) specifies that it's a hex value.

The size of an image file relates directly to the number of pixels and the granularity of the color definition. For example, an 8-bit color image that's 640 * 480 pixels results in a 307KB file (640 * 480 bytes), whereas a 24-bit color image that's 1024 * 768 pixels results in a 2.36MB file (1024 * 768 * 3 bytes).

As you can see, color image files can get quite large. Fortunately, several file compression schemes have been developed to decrease the storage and communications requirements of handling image files. Bitmap (.bmp) and GIF (.gif) files use a lossless compression algorithm. With the lossless compression algorithm, the decompressed image is identical to the original image (i.e., the image before compression). JPEG (.jpg) files use a lossy compression algorithm that approximates the image being compressed. With the lossy compression algorithm, the decompressed image is nearly the same as, but not identical to, the original image. Although you can use both types of compression with steganography, tools that hide information in .jpg files are more complex than tools that hide information in .bmp and .gif files.

Hiding Data Manually
The simplest approach to hiding data in an image file is called least significant bit (LSB) insertion. In this method, you take the binary representation of the hidden_data and overwrite the LSB of each byte in the cover_image. (The LSB of a binary number is typically the rightmost bit.) If you use 24-bit color, the change will be minimal and indiscernible to the eye. For example, suppose you want to hide 9 bits of data—101101101—in three adjacent pixels (i.e., 9 bytes) that have the following Red-Green-Blue (RGB) encoding:

10010101   00001101   1100100110010110   00001111   1100101010011111   00010000   11001011

Working from left to right and top to bottom, you overlay the 9 bits you're hiding over the LSBs of the 9 bytes representing the 3 pixels, changing the LSB to match the hidden bit if necessary. Here are the results, with the bits that changed in bold:

10010101   00001100   1100100110010111   00001110   1100101110011111   00010000   11001011

You've successfully hidden 9 bits but had to change only 4 bits, or roughly 50 percent of the LSBs. You can apply similar steganographic methods to gray-scale images and 8-bit color images. With 8-bit color images, however, the changes in the image are more noticeable.

A potential problem with any steganographic method is that someone who's looking for the hidden message can find it. Steganalysis is the art of detecting and breaking steganography. For example, one way to detect messages hidden with LSB insertion is to analyze an image's color palette. In most images, each color has a unique binary encoding. If the image contains hidden data, the color palette will have duplicate binary encodings because some of the LSBs were changed to hide the data. When an analysis of an image's color palette yields many duplicates, you can safely conclude that the image has hidden information.

Although this type of steganalysis can reveal hidden messages, the quantity of potential images to analyze can make finding the message a Herculean task. For example, suppose you suspect someone has hidden a message in a downloadable image file on an auction site on the Internet. How do you know which file of the hundreds or thousands of downloadable files to analyze? You don't.

Hiding Data with Automated Programs
You can find many steganography programs for just about any platform. Some popular steganography freeware programs for Windows 2000 and Windows NT are

  • S-Tools 4.0 (available on the WebAttack.com Web site at http://www.webattack.com/freeware/security/fwencrypt.shtml)

  • Hide4PGP 2.0 (available on Heinz Repp's Web site at http://www.heinz-repp.onlinehome.de/index.html)

  • MP3Stego 1.1.15 (available on Fabien A.P. Petitcolas' Web site at http://www.cl.cam.ac.uk/~fapp2/steganography/mp3stego)

  • Stash-It (available on the Smaller Animals Software Web site at http://www.smalleranimals.com/stash.htm)

To learn about more steganography programs, check out the steganography tool table on the Johnson & Johnson Technology Consultants (JJTC) Web site at http://www.jjtc.com/Steganography/toolmatrix.htm.

Let's take a closer look at how to use S-Tools. S-Tools lets you hide information in a .bmp, .gif, or .wav file (a type of audio file). S-Tools is easy to use: You simply drag an image or audio file onto the S-Tools active window to act as the cover_medium, drag the hidden_data file onto the cover_medium, then provide a stego_key for encryption. The result is the stego_medium.

For example, I highlighted an image file called 5th wave.gif and dragged it onto the S-Tools active window so that it became the cover_image. A note at the bottom of the image told me that I could hide up to 138,547 bytes, so I highlighted a 14KB text file called virusdetectioninfo.txt and dragged it onto the image. As Figure 2, page 41, shows, a dialog box appeared that stated I was hiding 6019 bytes of data and asked for a passphrase with which to encrypt the hidden text. I entered the passphrase twice and kept the default secret-key encryption scheme of International Data Encryption Algorithm (IDEA).

Figure 3 compares the image that contains the hidden data with the original image. To the eye, the images are identical.

When the intended recipient receives the image, he or she merely drags the file onto the S-Tools active window, right-clicks the image, and selects the Reveal option. A dialog box then asks the recipient for the passphrase. After the recipient enters the correct passphrase, S-Tools displays information about the hidden file, as Figure 4 shows. To open the file, the recipient clicks OK.

Steganography in the Real World
Steganography is an interesting subject that's outside the mainstream of systems administration. However, steganography isn't an arcane subject of study in academia or a laboratory. People use steganography, sometimes with dire consequences—several reports suggest that the terrorist organization behind the September 11, 2001, attacks in New York, in Washington, DC, and near Pittsburgh used steganography as one means of communication.

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like