Lee's Stego Research Notes: November 2007

Saturday, November 24, 2007

Structure of AC code table

在 JPEG 規格書 P. 89 談到了 AC code table 的結構。每一個非零的 AC 係數都是以一個 8-bit RS (Run/Size) 的形式來描述。
RS = binary 'RRRRSSSS'
4 個低位元 SSSS 定義非零的 AC 係數所屬的類別 (category), 分成 10 個類別, 可由 Table F.2 查到每個類別的範圍。高位元的 4 個 RRRR 則是指出非零 AC 係數之前存在多少個 0。由於有可能超過 15 個 0, 4-bit 的 RRRR 無法表示, 因此定義了 'RRRRSSSS' = X'F0' 來代表 15 個 0 外加一個值為 0 的 AC 係數, 換句話說, 共16個 0。除此, 如果在 Zigzag 序列中, 後面的 AC 係數已經全部為 0 了, 就用一個 EOB (end-of-block), 'RRRRSSSS'= '00000000' 來表示。

Table F.2 Categories assigned to coefficient values
SSSS 　　AC coefficients
1 　　　　　　　-1, 1
2 　　　　　 -3,-2, 2, 3
3 　　　　　 -7..-4, 4..7
4 　　　　　 -15..-8, 8..15
5 　　　　　 -31..-16, 16..31
6 　　　　　 -63..-32, 32..63
7 　　　　　-127..-64, 64..127
8 　　　　　-255..-128, 128..255
9 　　　　　-511..-256, 256..511
10 　　　　-1023..-512, 512..1023

Huffman encoding procedures for DC coefficients

在 JPEG 影像壓縮標準中, 對於 DC 係數的編碼, 霍夫曼編碼程序(Huffman encoding procedures) 使用了下列 2 個擴展表格(extended tables):
　　1) XHUFCO
　　2) XHUFSI
這兩個表格都是以相鄰兩個 block 區塊的差值(DIFF) 為索引, XHUFCO 存放 DIFF 所屬類別(category) 的 Huffman code, 再接上為了區分 DIFF 在此類別中的位置的附加位元(additional bits)。XHUFSI 則是存放 DIFF 值所對應的 XHUFCO 表格中的位元長度, 即 Huffman code 長度加上附加位元的長度。

XHUFCO 與 XHUFSI 這兩個表格則是由 Annex C, P. 52 所談到的 EHUFCO 與 EHUFSI 兩個表格加上附加位元擴展而來。

有了 XHUFCO 與 XHUFSI 兩個表格, 當我們有一個 DIFF 需要編碼時, 只要快速查表即可很快地產生 binary data。針對 DC 差值 (DIFF), 霍夫曼編碼程序如下:

　　SIZE = XHUFSI(DIFF)
　　CODE = XHUFCO(DIFF)
　　code SIZE bits of CODE
　

Wednesday, November 14, 2007

Huffman decoding of DC coefficients

由於以下兩個原因:

1) DCT 的 DC 係數就是代表 8*8 區塊的像素平均值,
2) 影像中相鄰區塊的像素平均值有很大的機率是相近的

因此 JPEG 的 Huffman encoding procedure 並不是直接對 DC coefficients 編碼, 而是對兩個相鄰區塊的 DC difference 編碼。由於 difference 可能的值涵蓋太廣了, 對每一個可能值採取直接編碼的方式並不可行 (decoding tree 太大了), 因此 JPEG 採用的作法是先對所有可能值分成12個類別(category), 然後統計每一類別的機率, 有了機率後, 再依機率做 Huffman 編碼。JPEG 規格書 P.89 的 Table F.1 列出了 DC difference 是如何分類的

Table F.1 Difference magnitude categories for DC coding
SSSS 　　DIFF values
0 　　　　　　　 0
1 　　　　　　　-1, 1
2 　　　　　 -3,-2, 2, 3
3 　　　　　 -7..-4, 4..7
4 　　　　　 -15..-8, 8..15
5 　　　　　 -31..-16, 16..31
6 　　　　　 -63..-32, 32..63
7 　　　　　-127..-64, 64..127
8 　　　　　-255..-128, 128..255
9 　　　　　-511..-256, 256..511
10 　　　　-1023..-512, 512..1023
11 　　　　-2047..-1024, 1024..2047

除了第 0 類 (SSSS=0) 之外, 每一類中, difference 的可能值並不只有一個, 因此還另外需要一些額外的位元(appended additional bits) 來標示出確切的 difference 值。所需要的額外位元總數就剛好是 SSSS 值, 原因很簡單, 第 1 類 (SSSS=1) 中有 -1, 1 兩個可能值, 因此只需要 1 個額外位元來指出真正的 difference 值是哪一個。同理, 第 2 類 (SSSS=2) 中有 -3, -2, 2, 3 共 4 個可能值, 因此需要 2 個額外位元來指出真正的 difference 值是哪一個。其餘類別的個數剛好就是 2^SSSS 個, 因此共需要 SSSS 個額外位元才夠指出真正的 difference 值為何。

如果 DIFF > 0, 額外位元就是 DIFF 的二進位表示法中的 SSSS 個低位元(low order bits), 如果 DIFF < 0, 則是用 (DIFF-1) 的 SSSS 個低位元。我們舉例說明: a) 假設 DIFF 值為 -9, SSSS=4, 因為DIFF < 0, 因此 (DIFF-1) = -10, 用 2-complement 來表示 -10, 為 11110110, 那麼 4 個低位元就是 0110。 JPEG 解壓縮時, Huffman decoding procedure 如果使用 decoding tree 解碼得到 SSSS=4, 就會繼續取得 4 個額外位元 0110, 根據第 1 個位元判斷 DIFF<0, 然後使用後面 3 個位元計算得到 6, SSSS=4 這個類別負數區間的最小值為 -15, 將 -15 + 6 = -9 , 就可以得到 DIFF = -9。 b) 假設 DIFF 值為 6, SSSS=3, 因為 DIFF > 0, 因此用 2-complement 來表示 6 為 00000110, 那麼 3 個低位元就是 110。

使用 decoding tree 得到 SSSS=3 後, 繼續取得 3 個位元 110, 由於第 1 個位元為 1, 因此判斷 DIFF 屬於正數區間, 後面兩個位元為 10, 值為 2, 正數區間的最小值為 4, 4 + 2 =6 因此得到 DIFF=6。　

Saturday, November 10, 2007

Y'CbCr in JPEG Standard

JPEG allows Y'CbCr where Y', Cb and Cr have the full 256 values:

　JPEG-Y'CbCr (601) from "digital 8-bit R'G'B' "
　===========================================
　Y' = + 0.299 * R'd + 0.587 * G'd + 0.114 * B'd
　Cb = 128 - 0.168736 * R'd - 0.331264 * G'd + 0.5 * B'd
　Cr = 128 + 0.5 * R'd - 0.418688 * G'd - 0.081312 * B'd
　===========================================
　R'd, G'd, B'd in {0, 1, 2, ..., 255}
　Y', Cb, Cr in {0, 1, 2, ..., 255}

from Wikipedia: YCbCr

Monday, November 05, 2007

關於 JPHide 的點點滴滴 (一) : N. Provos

Niels Provos 在 "Detecting Steganographic Content on the Internet" 這篇論文的 Section 5.2 整節都在談論 JPHide 這個隱藏軟體。內文如下:

　JPHide is a steganographic system by Allan Latham. There are two versions: 0.3 and 0.5. Version 0.5 supports additional compression of the hidden message. As a result, they use slightly different headers to store embedding information. Before the content is embedded, it is Blowfish encrypted with a usersupplied pass phrase.

Because the DCT coefficients are not selected continuously from the beginning, JPHide is more difficult to detect.

The program uses a fixed table that defines classes of DCT coefficients to determine in which order to modify the coefficients. All coefficients in the current class are used first to hide information before the next class is chosen. As a result, coefficients are selected in such a way that they those likely to be numerically high are used first.

One artifact of the implementation is that the information hiding continues in the current coefficient class even after the complete message has been embedded. The first class in the table are the DC coefficients of color component zero. An image with a resolution of 600 * 480 has approximately five thousand DC coefficients. Even if the message is only eight bits long, JPHide modifies all five thousand coefficients in such an image.

這邊提到 JPHide 有一個特殊方式來定義嵌入次序, JPHide 使用一個固定的表格來將 DCT 係數分成不同的 classes, 整張影像相同 class 中的 DCT 係數會依序拿來嵌入機密訊息, 直到此 class 的係數用完了, 才會動用到下一個 class 的 DCT 係數。接著, 相同 class 的係數, 數值較大者也會優先拿來嵌入機密訊息。個人覺得這樣做是有道理的, 因為嵌入影響對較大值的係數來說, 比例相對較小, 因此優先使用。

另外一點令人匪宜所思的是: 即使所有的機密訊息已經嵌入完畢了, JPHide 依然會繼續修改目前這個 class 的所有 DCT 係數。論文中提到一個例子, 第一個 class 就是 DC 係數, 假設一張 600*480的影像, 就會有 (600/8)*(480/8)= 75*60 = 4500 個 DC 係數, 那麼即使機密訊息只有 8 bits, JPHide 依然會去修改這所有的 DCT 係數。

　A pseudo-random number generator determines if coefficients are skipped. The probability of skipping bits depends on the length of the hidden message and how many bits have been embedded already.

JPHide modifies not only the least-significant bits of the DCT coefficients, it can also switch to a mode where the second-least-significant bits are modified.

如其他軟體一般, JPHide 用一個 PRNG 來決定哪些係數該跳過不嵌入機密訊息。然而, 較特殊的作法是跳過的機率是和 1) 機密訊息的長度, 2) 已經嵌入多少資料量。這代表每嵌入 1 個位元, 機率值就隨時進行更新, 用以控制所有的訊息可以完全順利嵌入。另外, JPHide 也會將機密訊息嵌入到次低位元中。

From StegoRN

Figure 6: JPHide has a signature similar to JSteg. The major difference is the order in which the DCT coefficients are modified.

Figure 6 shows the probability of embedding for an image containing information hidden with JPHide. Because JPHide can skip DCT coefficients, the probability is not as high as with JSteg.

Figure 6 是使用 Chi-Square Attack 來針對 JPHide stego-images 分析, 橫軸是將影像平分成 100 等份, 每一等份都用 Chi-Square Attack 計算嵌入機率 p。由於 JPHide 會跳過部份的 DCT 係數不藏, 因此所得到的 P 值並不像 Jsteg 那麼高。

Niels Provos and Peter Honeyman, "Detecting Steganographic Content on the Internet,"ISOC NDSS'02, San Diego, CA, February 2002.

Sunday, November 04, 2007

關於 Jsteg 的點點滴滴 (七) : N. Provos

這張圖出現在 Niels Provos & Peter Honeyman 的 2002年 ISOC NDSS'02 研討會論文 "Detecting Steganographic Content on the Internet" 中 ( Figure 4, P. 4 )。原文是這樣描述的:

　Figure 4 shows the result of the X²-test for an image that contains information hidden with JSteg. In this case, the first chapter of “The Hunting of the Snark” has been bzip2 compressed prior to embedding. The low probability at the beginning of the graph is caused by the dictionary at the beginning of a bzip2 compressed file. The dictionary does not look like encrypted data and is not detected by the test.

這邊提到 bzip2 這個壓縮軟體, 作者先將機密訊息 “The Hunting of the Snark” 的第一章內容用 bzip2 壓縮至 15 KB, 然後用 Jsteg 將其藏到影像中。由於 bzip2 壓縮檔的檔頭存放著解壓縮時需要用到的 dictionary, 因此在 Figure 4 的最左端 - 約 5% 的影像 - 用 X²-test 所得到的 p 值並不像 5% ~ 25% 區間的 p = 100% 那麼高。

Figure 5: Using JSteg-Shell with RC4 encryption causes the probability of embedding to be high for all embedded data.

這張圖則是針對 Jsteg-Shell stego-image 分析所得到的結果。原文描述如下:

　JSteg-Shell is a Windows user interface to JSteg developed by Korejwa. It supports encryption and compression of the content before embedding the data with JSteg. JSteg-Shell uses the RC4 stream cipher for encryption. However, the RC4 key space is restricted to 40 bits.

　When encryption is being employed, we expect the probability of embedding to be high at the beginning of the image. There should be no exception.

　An example of JSteg-Shell is shown in Figure 5. Just observing the graph allows us to determine the size of the embedded message. Later we show how this can help to improve the automatic detection of steganographic content.

JSteg-Shell 在隱藏前, 針對機密訊息提供壓縮和加密的功能。因此, 沒有意外地, Figure 5 從一開始就有很高的 p 值。觀察上圖, 我們很容易就可獲知嵌入的資料量, 這項資訊可以用來改善自動偵測隱藏的訊息。

Niels Provos and Peter Honeyman, "Detecting Steganographic Content on the Internet,"ISOC NDSS'02, San Diego, CA, February 2002.