Created
October 22, 2012 11:14
-
-
Save 23Skidoo/3931049 to your computer and use it in GitHub Desktop.
Don Stewart's 2004 Obfuscated Haskell Contest entry updated to work with GHC 7.4.2
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
-- This is the original README from here: http://www.haskell.org/haskellwiki/Obfuscation | |
How does it work? | |
----------------- | |
* unsafeCoerce# in GHC 6.3 | |
* indexWordArray# | |
* behaviour of a [Int] optimisation in GHC | |
More detail | |
----------- | |
We use unsafeCoerce# and indexWordArray# to walk the heap looking for | |
values. Happily, GHC optimises [Int] such that we can directly index | |
the list without dereferencing any pointers... | |
The full story | |
-------------- | |
Firstly, some rewriting has been applied to the code to make it | |
syntactically obfuscated, by replacing keywords with lambda | |
abstractions, and renaming values as symbols. Without this | |
obfuscation, the code is: | |
module Main where | |
import GHC.Base | |
main = print ( tail ( ( \x -> map ( \y -> | |
C# ( unsafeCoerce# ( indexWordArray# | |
( unsafeCoerce# y ) (4#) ))) x) ( | |
[106,117,115,116,32,97,32,115,105,109,112,108,101, | |
32,102,117,110,99,116,105,111,110,97,108,32,108, | |
97,110,103,117,97,103,101, 0x11 ] :: [Int] | |
))) | |
The final output chars are firstly ord'ed to their ascii value. | |
Crucially, GHC optimises some Int lists, containing only small values, | |
into consecutive unboxed cells in the heap. Each cell appears at | |
consecutive slots in the heap 5 words apart, starting 5 words in from | |
the beginning of the list. We depend on this behaviour ;) | |
Using -keep-hc-files we can see what code GHC generates: | |
> Hp[-170]=(W_)(17); | |
SET_HDR_(Hp-169,(P_)&GHCziBase_ZC_con_info,0,0); | |
Hp[-168]=(W_)(Hp-171); | |
Hp[-167]=(W_)((P_)&GHCziBase_ZMZN_closure); | |
SET_HDR_(Hp-166,(P_)&GHCziBase_Izh_con_info,0,0); | |
> Hp[-165]=(W_)(101); | |
SET_HDR_(Hp-164,(P_)&GHCziBase_ZC_con_info,0,0); | |
Hp[-163]=(W_)(Hp-166); | |
Hp[-162]=(W_)(Hp-169); | |
SET_HDR_(Hp-161,(P_)&GHCziBase_Izh_con_info,0,0); | |
> Hp[-160]=(W_)(103); | |
SET_HDR_(Hp-159,(P_)&GHCziBase_ZC_con_info,0,0); | |
Hp[-158]=(W_)(Hp-161); | |
Hp[-157]=(W_)(Hp-164); | |
SET_HDR_(Hp-156,(P_)&GHCziBase_Izh_con_info,0,0); > Hp[-155]=(W_)(97); | |
SET_HDR_(Hp-154,(P_)&GHCziBase_ZC_con_info,0,0); | |
Hp[-153]=(W_)(Hp-156); | |
Hp[-152]=(W_)(Hp-159); | |
SET_HDR_(Hp-151,(P_)&GHCziBase_Izh_con_info,0,0); | |
> Hp[-150]=(W_)(117); | |
So our Int fields (i.e, (W_)(117)) have been slotted in at Hp[-5], | |
Hp[-10] .. Hp[-170]. Note that the first element in the list starts at | |
Hp[-5], not Hp[0]. So Hp[0] will actually contain meta data, and is | |
not a value we want. We have to discard this value later using tail. | |
Strangely, this behaviour only occurs if you annotate the whole list | |
as :: [Int]. If you instead constrain one of the values of the list to | |
type Int, the optimisation won't occur, and you'll just get a list of | |
pointers. So GHC treates [Int], where the values are small, magically. | |
So, in the heap is a list of Int cells, with their Int# fields | |
unpacked. We then map unsafeCoerce# over the list, coercing each cell | |
to a ByteArray#. This lets us treat each 5 word cell (including STG | |
machine data) as a raw byte array, letting us roam around and inspect | |
the cell. We can use unsafeCoerce# to ignore the type system, and | |
worry instead about low-level representations and core dumps. Fun! ;) | |
Also, it appears from GHC 6.2 to 6.3 the typing of unsafeCoerce | |
changed to allow coercion of unboxed values, which we depend upon. So | |
our code only typechecks with 6.3. With 6.2 or 6.0.1, we get kind | |
errors: | |
Couldn't match `#' against `*' | |
When matching types `Char#' and `b' | |
Expected type: Char# | |
Inferred type: b | |
So what we want to do is extract the Int# field manually. So we index | |
the 5th word of the cell using indexWordArray#. Thanks to GHC's | |
unpacking, this field is available, otherwise there would be a level | |
of indirection. Pointers make this unsafeCoerce/byteArray trick much | |
harder. | |
We then apply unsafeCoerce# to force the raw Int# field to Char#. This | |
works because Char# are the same size as Word#/Int# in GHC. We can | |
then box up the raw field with the C# constructor, generating a Char. | |
Our map thus generates a String. | |
However, there is a bogus char at the front (the metadata at Hp[0]), | |
and there needs to be an extra garbage field at the end of the list, | |
otherwise the map truncates our 'string' in the heap, losing the last | |
cell. This is marked by the 0x11 element at the tail of the list. | |
(0x11 is the smallest value that is still unpacked directly into the | |
heap, smaller values are turned into pointers to magic closures). | |
So we drop the first garbage char with tail, giving us the original | |
list, which we can then print out. The quote itself is SPJ's remark | |
about the simplicity of the GHC Core language. | |
-- Don |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{-# LANGUAGE MagicHash #-} | |
module Main where | |
import GHC.Base | |
main = let l = | |
[106,117,115,116,32,97,32,115,105,109,112,108,101, | |
32,102,117,110,99,116,105,111,110,97,108,32,108, | |
97,110,103,117,97,103,101,0x11] :: [Int] | |
in | |
print $ tail (map (\v -> (C# (unsafeCoerce# | |
(indexWordArray# (unsafeCoerce# (plusAddr# (unsafeCoerce# v) 3#)) 3#)))) l) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment