Lee's Stego Research Notes: Defending Against Statistical Steganalysis (part 1)

N. Provos 在 10th USENIX Security Symposium, August 13-17, 2001 發表了 "Defending Against Statistical Steganalysis" 這篇論文, 內容就是闡述 OutGuess 0.2 這個隱藏軟體是如何運作的。

本篇文章所要討論的主軸是論文中有關 OutGuess 核心技術的部分 - Section 3 。

Section 3 Embedding Process

作者將 embedding Process 切割成兩個獨立的步驟:

1. Identification of redundant bits.
Redundant bits can be modified without detectably degrading the cover medium.
作者指出所謂的冗餘位元(redundant bits) 就是經過修改也不會在掩護媒體中產生會被偵測出來的品質下降現象(degrading)。

2. The selection of bits in which the hidden information should be placed.

切割成兩個步驟的好處是容易取代(easy replacement), 如果要將本篇論文提出的方法在別的資料格式中實作出來, 只要將 identification algorithm 換掉, 然後用新的選擇策略(selection strategy)即可。

Section 3.1 Identification of Redundant Bits

作者闡述了一個觀念, 用來嵌入機密訊息的冗餘位元通常和影像的儲存格式相關。整個嵌入程序自然也和輸出格式有關。通常壓縮程序也包含其中。要最小化對掩護媒體(cover-medium)的修改(modification), 必須具備有關冗餘位元的相關知識才做得到, 作者提到 OutGuess 實作了整個輸出影像的運算。

For example, the OutGuess system performs all operations involved in created the output object and saves the redundant bits encountered. For the JPEG image format, this might be the LSB of the discrete cosine transform coefficients.

Section 3.2 Selection of Bits

探討如何從影像的 redundant bits 中選取一些 bits 來嵌入機密訊息。OutGuess 是使用 RC4 串流加密器(stream cipher)對機密訊息加密, 同時也用 RC4 來建立一個 PRNG (pseudo-random number generator), 然後再將選定的 seed 餵進這個 PRNG 來選擇冗餘位元。

32 state bits = 16-bit seed + 16 bit integer
16-bit seed: 由於不同的 seeds 會選取不同的冗餘位元來作為嵌入機密訊息之用, 因此, 不同的 seeds 自然對原始影像造成的 change, 也會有所不同。當接收端(receiver)收到偽裝影像(stego-image)後, 必須知道當初所選定的 seed, 因此必須把這16-bit seed 也嵌入到掩護影像(cover-image) 之中。
16-bit integer: containing the length of the hidden message.

冗餘位元的選取方式是利用上述的 PRNG 來計算下一個 bit 的隨機距離(random offset) R i(x),

　b0 = 0,
　bi = bi-1 + Ri(x) 　for i = 1, 2, ... , n

bi 表示第 i 個選取位元的位置, Ri(x) 表示與上個選取位元之間的隨機距離, 值介於 [1, x] 之間。x 為最大的間隔(interval), 這個值在每嵌入 8 個位元, 就會重新使用下列的公式重新計算, 目的就是讓所有的機密訊息可以分布到整個可以使用的位元中。

　interval = 2 * remaining redundant bits / remaining length of message.

用上述的方法來設定 interval, 會使得機密訊息的長度限制在 50% 嵌入空間之內。

Section 3.3 Beneficial Reseeding of the PRNG

談論如何靠著選擇不同的 seeds, 智慧地選擇不同的嵌入位置的子集合, 不但可以讓 changed bits 的總數降低, 而且使得嵌入行為較不容易被偵測出來 (Detectability is also used as a bios in the selection process.)。

由於掩護影像(cover-image)中的冗餘位元, 不是 1 就是 0, 加上要嵌入的資料先用 RC4 stream cipher 加密, 變成一串二元的隨機資料流(binary random stream), 將機密訊息嵌入到冗餘位元, 造成這些冗餘位元被改變的機率期望值為 0.5。因此, 統計學中的二元分布(binomial distribution)正好可以用來描述一般的 LSB 嵌入行為。

假設, 我們從冗餘位元之中, 將一個 seed 餵進 PRNG 選擇了 4430 個位元, 並將同樣長度的機密訊息嵌入其中, 便可以去計算此次嵌入動作一共改變了多少個 redundant bits。注意: 不同的 seed 餵進同一個 PRNG 將使得所選擇的嵌入位置不同。Figure 1 就是重複使用不同的 seeds 來統計這 4430 個redundant bits 被改變的總數, 累計其統計值所畫出來的結果。

Figure 1: Probability distribution of changed bits for different seeds compared to a binomial distribution with n=4430 and p=0.5.

不管是從 binomial distribution 公式推論, 或是從 Figure 1 中的實驗中, 我們都可以觀察到當我們選定一個 seed 時, changed bits 的個數是以 n/2 = 2215 的可能性(機率)最高, 不過, 還是存在一些 seeds 會使得 changed bits 的個數小於 2150。論文中是這樣討論的:

Picking a seed that represents the changed bits at the lower end of the binomial distribution allows us to reduce the number of bits that have to be changed; see Figure 1. It becomes harder to detect the modifications, as more of the hidden message is already naturally represented in the redundant bits.

除了降低修改之外, 可偵測性(detectability)也是 selection process 要考量的一個因素。

Detectability is also used as a bias in the selection process. The selector does not try to reduce only the number of changed bits but also the overall detectability. Whenever a bit has to be modified, its detectability will be added to a global bias. A higher accumulated bias reduces the likelihood that this specific embedding will be used.

Section 3.4 Choices with Coding Theory

作者在這邊提到 Coding Theory 的考量為使用 PRNG 去選擇冗餘位元就無可避免地選到
1. locked bits
2. bits with a high detectability
上述兩類冗餘位元是作者不想去更改的。因此, 作者想使用錯誤更正碼(error-correcting codes)來解決上述問題。

[n, k, d] coding 指的是長度為 k 位元的機密訊息(k-bit data block), 將被編碼成長度為 n 位元的編碼區塊(n-bit code block), 每個 code 之間的 Hamming distance 至少是 d, 假設 d = 2t + 1, 那麼這個編碼就具備了可以更正 t 個錯誤位元的能力。換句話說, n 個位元的編碼區塊之中, 如果發生 t 個位元的錯誤, 那麼使用解碼程序, 就可以偵測出哪 t 個位元發生錯誤, 進而更正回來, 因此原先的 k 位元的資料, 是可以完全解碼出來的。

將機密訊息用錯誤更正碼來編碼, 無疑也會增加要嵌入的長度。然而, 觀察整個嵌入過程獲知:
1. 有一半的資訊嵌入是不會改變到冗餘位元的 ( n / 2),
2. 可以有 t 個位元可以不用嵌入(更改冗餘位元)
因此, 假如
　( n /2 ) - t = ( k / 2 )
成立, 那麼作者希望嵌入 n 位元的編碼區塊需要修改的位元數(上述式子的等號左邊)要和嵌入未經編碼的 k 位元的資料區塊需要更改的位元數(上述式子的等號右邊)相等。將上述式子通分得到 n -2t = k, 並將 d+1 = 2t 帶入可以得到
　d = n - k + 1,
剛好就是 MDS (maximum distance separable) code 的 Singleton bound。因此, 作者在這邊得到一個結論就是只要選擇 MDS codes, 就可以滿足上述作者期望的。

不幸地, 值得一提的(non-trivial)二元 MDS code 就只有重複碼(repetition code), 主要缺點就是編碼必須將資料重複 n 次, 因此, 重複碼僅使用在隱藏訊息很小的情況。

資料經過編碼後, 每個編碼區塊可以選擇 t 個位元不去修改冗餘位元。作者所使用的選擇策略是:
欲嵌入的位元與冗餘位元不同(conflict), 且冗餘位元先前已經被嵌入過資料, 被 locked bits 鎖住。

OutGuess 所使用的技術與 Ross J. Anderson and Fabien A. P. Petitcolas 發表在 Journal on Selected Areas in Communication, 16(4): 474-481, May, 1998 的論文 On the Limits of Steganography 中所建議的 parity encoding 相似。然而, 使用 error-correcting codes 的好處要比使用 parity encoding 多。透過選擇一種不是 MDS 的 code, 我們可以犧牲些許的嵌入容量(capacity) 而得到更高的安全性(security)。除此, 對照使用 parity encoding 必須 lock 住 n 個位元, 使用 error-correcting codes 則僅僅需要 lock 住 n-t 個位元。

Section 3.5 Plausible Deniability

為了嵌入機密訊息, 我們修改掩護媒體中的冗餘位元。這些冗餘位元可能存在一些我們沒有感知, 或是對手比我們了解的自然統計特質。假如嵌入程序改變了上述特質, 在這方面知識淵博的觀察者, 不用指出哪些特定位元被改變, 就可以推論出隱藏訊息是存在的。

偽裝媒體的創造者必須面對的是: 欲隱藏的通訊行為可能被揭露出來。然而, 我們假設觀察者僅僅可以確定的事實是掩護媒體被更改了。假如傳訊者嵌入多重訊息, 其中可以包含一份無害的訊息, 讓它和真正想要傳送的訊息(request)攪在一起, 然後宣稱沒有任何訊息隱藏在偽裝媒體之中, 偽裝媒體並沒有遭受破壞(沒有遭到修改, 換句話說就像原始掩護影像一樣, 沒有破壞原先存在的特質)。這就是所謂的似乎合理的可否認性(plausible deniability)。

　　

實際上, 整個 Section 3 所描述的技術已經隱含地支援上述所提到的似乎合理的可否認性。可以隱藏不只一份的訊息, 使用 locked bits 來避免先嵌入的訊息被後嵌入的訊息覆蓋掉。即使是與嵌入訊息的大小相關, 不與先前 locked 住的冗餘位元重疊的可能性是很小的, 在這種情況下, 使用 error-correcting codes 則是可以增加選擇的彈性。

Section 3.6 Hidden Message Determines Cover

針對特定的隱藏訊息, 可以在不同的掩護媒體中, 選擇一個機密訊息對掩護媒體本身影響較小(with minimal modification)的掩護媒體, 來嵌入機密訊息。這和 Section 3.3 中有關 binomial distribution 的系列討論是差不多的。

Go to: Defending Against Statistical Steganalysis (part 2)

Go to: Defending Against Statistical Steganalysis (part 3)