aheadley · March 10, 2015 22:32
diff --git a/big-hw1.txt b/big-hw1.txt
 ----------------------------------------------------------------------------- 
 Originally hosted at: http://mods.relicnews.com/misc/BIGSpec.shtml
 ----------------------------------------------------------------------------- 



 BIG FILE FORMAT SPECIFICATION -- RBF1.23 -- v0.8 
 (incomplete) 
 01/04/2000

 by: 
 _!Lachesis_atatata! 
 pronounced: 
 shaka >yell< lachesis shaka atatata >yell< 
 (you may yell or scream at your capability. what you yell is irrelevant) 

 member of: 
 ICGOMA (International Cooperative Governance Organization of Meaningless Acronyms) 

 in conjunction with: 
 UCFEDD (Uraguayan Consortium of Ferrets with Erectile Dysfunction Disorder) 
 APDOFF (Agrarian Paramilitary Defense Organization of Feminist Farmers) 
 MOOOOO (MOOOOO) 
 

 ----------------------------------------------------------------------------- 
 A. INTRODUCTION 
 ----------------------------------------------------------------------------- 
 This specification describes the format of .BIG archives used in 
 Homeworld by Relic Entertainment Inc. a BIG archive is a container for 
 other files. instead of distributing hundreds of files separately, files are 
 "packed" together into one larger file. 

 at this time, it is not a complete specification. enough information has 
 been learned to extract all files from a BIG archive, but not enough to 
 be able to create one. 

 if information from this document is used for implementations or in 
 other documents, please credit me. it's just the nice thing to do. 
 i'll be haunting the message boards, if you have a question, post it. if 
 i don't reply i: 
 1. don't know the answer 
 2. am busy with something at some point on the globe 
 3. am dead 

 if there are errors on this document ... oops :) 

 ----------------------------------------------------------------------------- 
 B. OVERALL LAYOUT 
 ----------------------------------------------------------------------------- 
 there are three main areas to the BIG file format arranged 
 sequentially one after the other:
 
 1. header 
 2. table of contents 
 3. name/data blobs 

 each is described in the sections below. 

 ----------------------------------------------------------------------------- 
 B1. HEADER AREA 
 ----------------------------------------------------------------------------- 

 the header area comes at the beginning of the file. it has the normal 
 elements you would expect to be there. there is only one header. 

 ------------------------------------------------------------------------- 

 typedef unsigned long ulong_t; 
 typedef unsigned char ubyte_t; 

 const char BIG_FILE_ID[] = { 0x52, 0x42, 0x46, 0x31, 0x2E, 0x32, 0x33 }; 

 // note: "RBF1.23" 
 struct header { 
    char magic_cookie[7]; 
    ulong_t toc_size; 
    ulong_t header_unknown; 
 }; 

 ------------------------------------------------------------------------- 

 magic_cookie 
 identifier for the file. The only currently known value for this is 
 contained in BIG_FILE_ID; 

 toc_size 
 number of toc_entry structures (*see B2) that are contained in the 
 file. (note: toc stands for Table of Contents) 

 header_unknown 
 this field's purpose is unknown. it's value is always 1. does not seem 
 to play a role in anything yet. 

 ----------------------------------------------------------------------------- 
 B2. TABLE OF CONTENTS AREA 
 ----------------------------------------------------------------------------- 
 the table of contents area holds a list of toc_entry structures which 
 comes directly after the header. header.toc_size tells you how many 
 there are. for each file that is in the BIG archive, there is a toc_entry 
 structure. 

 ------------------------------------------------------------------------- 

 struct toc_entry { 
    ulong_t crc_msb; 
    ulong_t crc_lsb; 
    ulong_t name_size; 
    ulong_t data_compressed_size; 
    ulong_t data_uncompressed_size; 
    ulong_t file_offset; 
    time_t timestamp; 
    ubyte_t toc_unknown[4]; 
 }; 

 -------------------------------------------------------------------------

 crc_msb/crc_lsb 
 this is some sort of crc value. crc_msb is the most significant byte and 
 crc_lsb is the least significant byte of a larger 8-byte(64-bit) crc 
 number. exactly how it's generated and how to use it is unknown. 
 (*see crc note below) 

 name_size 
 the size(length) of the name in the name_data_blob (*see B3) that 
 this toc_entry refers to. 

 data_compressed_size 
 the size of the data in the name_data_blob (*see B3) in compressed 
 form. 

 data_uncompressed_size 
 the size of the data in the name_data_blob (*see B3) in uncompressed 
 form. 

 file_offset 
 the offset in bytes from the beginning of the BIG file to the 
 name_data_blob (*see B3) that this toc_entry refers to. 

 timestamp 
 the date/time of the entry. 

 toc_unknown 
 what part this element plays is unknown. what is known is that 
 toc_unknown[0] will be 1 if the data has been compressed and 0 if it 
 has not. (*see toc_unknown note below) toc_unknown[1..3] always 
 seems to equal {0xC9, 0xCA, 0xCB) 

 ------------------------------------------------------------------------- 

 toc_unknown note: 
 sometimes the data is compressed, sometimes it's not. compression 
 can be determined by doing either: 

 1. data_compressed_size < data_uncompressed_size or ... 
 2. toc_unknown[0] == 1 

 I would personally suggest sticking to option 1 until toc_unknown is 
 fully understood. 

 ------------------------------------------------------------------------- 

 crc note: 
 the entire table of contents seems to be sorted by the 8-byte crc 
 value. why is a complete mystery. the crc itself is most likely used as 
 a data validation mechanism. but to sort on it? it may be that the crc 
 is also being used as a mechanism to uniquely identify a file. i *think* 
 that there are no duplicate crc's. 

 ----------------------------------------------------------------------------- 
 B3. NAME/DATA BLOB AREA 
 ----------------------------------------------------------------------------- 
 the name/data blob area holds a list of name_data_blob structures 
 which comes directly after the table of contents area. a 
 name_data_blob holds the name and data of a file that exists in the 
 BIG archive. each name_data_blob is referred to by a toc_entry 
 structure which gives the sizes of the name and data fields. 

 ------------------------------------------------------------------------- 

 struct name_data_blob { 
    char name[]; 
    char data[]; 
 }; 

 quick note: 
 the structure above is not legal C. the fields are not fixed length and 
 i've used this notation because it's useful. use the standard techniques 
 of dynamic memory when implementing.

 ------------------------------------------------------------------------- 

 name 
 this is the file name of the original file. it's length is determined from 
 toc_entry.name_size + 1 (note: there IS a terminating null byte) 
 which this name_data_blob is being referred by. it also happens to be 
 encrypted. (*see encryption note below) 

 data 
 this is the actual file data. it's length is determined from the 
 toc_entry.data_compressed_size field which this name_data_blob is 
 being referred by. if the data is compressed, the algorithm used is 
 LZSS. (*see compression note below) 

 ------------------------------------------------------------------------- 

 encryption note: 
 as if the world isn't evil enough as it is, some sick sick sick bastard > 
 my kinda programmer actually :) < decided to encrypt the file names 
 so that those lurking hacker types would get all confused and bleary- 
 eyed from trying to hack the format. luckily, i knew this silly 
 encryption trick ... i've used it on others myself. ;) 
 what's used is an XOR run. it's really simple, really fast, and has no 
 redeeming value (ie. compression, secure encryption) other than to 
 screw with people. here's the code for doing it. 

 void xor_run(char* buffer, ulong_t buffer_size) 
 { 
    char last_char; 
    ulong_t i; 
    last_char = (char)0xD5; 

    for (i = 0; i < buffer_size; i++) 
    { 
        last_char ^= buffer[i]; 
        buffer[i] = last_char; 
    } 
 } 

 for those of you who don't catch on, that's both the encryption AND 
 decryption routine. this particular version de/encrypts "in-place" and 
 writes over the buffer you pass in. 
 as an implementation note, don't touch the terminating null when 
 decrypting. pass in toc_entry.name_size, not toc_entry.name_size + 1. 

 ------------------------------------------------------------------------- 

 compression note: 
 compression (decompression) of data is done using the LZSS 
 algorithm. for this particular implementation: 

 > very basic LZSS (ie. no huffman a la ZIP) 
 > marker bit of 1 signals passthrough character 
 > marker bit of 0 signals dictionary entry 
 > dictionary entry is composed of 12-bit index and 4-bit length fields 

 don't know the LZSS algorithm? go to: 

 http://dogma.net/DataCompression/ 

 if you look carefully, you will even find a clean LZSS implementation 
 in C up amongst the links that works perfectly. of course, since life 
 isn't fair in the least, i found it about an hour AFTER implementing it 
 myself. ;( 

 note: i'm not associated with the link above, or any link off of it. so 
 don't waste your time speculating. 

 ----------------------------------------------------------------------------- 
 C. >gratzi< 
 -----------------------------------------------------------------------------
 >gratzis< go out to Relic for coming up with Homeworld. very nice. 
 artsy >gratzi< goes to the cutscenes which, though simple, were very 
 effective. special >gratzi< for the music and the voice acting. though 
 not enough music :( 

 >gratzi< to the person who eventually creates me this ship: 
 Light Carrier 
 (to support guerilla tactics) 
 as compared to the standard carrier: 
 no space consuming construction facilities 
 a bit smaller 
 a bit lighter 
 a bit faster 
 a bit cheaper 
 more docking ports for faster docking of large wings 
 capacity for more fighters/corvettes 
 a bit faster fighter/corvette repair cycle 
 enough small guns to give a scout/interceptor wing a hard time 

 >!bye!< 





 ----------------------------------------------------------------------------- 
 Originally provided in Relic's source: BIGaddendum.doc
 ----------------------------------------------------------------------------- 


 .BIG file specification addendum
 By B1FF ( HYPERLINK "mailto:[email protected]" [email protected])

 The article listed on RelicNews is pretty complete WRT the .BIG file format.  It was really neat to download the program for viewing and extracting the contents of a bigfile.  We will probably release our bigfile creation program but you will note that our version of the ‘extract’ command was never finished.  Oh the pains of finalling!  

 The only thing that was not pick up on was the CRC’s of the bigfile.  The CRC is an 8-byte CRC actually made up of 2 standard 32-bit CRC’s.  Included is some sample code to create these CRC’s.  I think I originally copied this code from Graphics Gem’s several games ago.  It’s pretty standard.  Make note of this algorithm.  It is also used in the .CRC format.

 udword CRCTable[] =
 {
     0x00000000,0x77073096,0xEE0E612C,0x990951BA,
     0x076DC419,0x706AF48F,0xE963A535,0x9E6495A3,
     0x0EDB8832,0x79DCB8A4,0xE0D5E91E,0x97D2D988,
     0x09B64C2B,0x7EB17CBD,0xE7B82D07,0x90BF1D91,
     0x1DB71064,0x6AB020F2,0xF3B97148,0x84BE41DE,
     0x1ADAD47D,0x6DDDE4EB,0xF4D4B551,0x83D385C7,
     0x136C9856,0x646BA8C0,0xFD62F97A,0x8A65C9EC,
     0x14015C4F,0x63066CD9,0xFA0F3D63,0x8D080DF5,
     0x3B6E20C8,0x4C69105E,0xD56041E4,0xA2677172,
     0x3C03E4D1,0x4B04D447,0xD20D85FD,0xA50AB56B,
     0x35B5A8FA,0x42B2986C,0xDBBBC9D6,0xACBCF940,
     0x32D86CE3,0x45DF5C75,0xDCD60DCF,0xABD13D59,
     0x26D930AC,0x51DE003A,0xC8D75180,0xBFD06116,
     0x21B4F4B5,0x56B3C423,0xCFBA9599,0xB8BDA50F,
     0x2802B89E,0x5F058808,0xC60CD9B2,0xB10BE924,
     0x2F6F7C87,0x58684C11,0xC1611DAB,0xB6662D3D,

     0x76DC4190,0x01DB7106,0x98D220BC,0xEFD5102A,
     0x71B18589,0x06B6B51F,0x9FBFE4A5,0xE8B8D433,
     0x7807C9A2,0x0F00F934,0x9609A88E,0xE10E9818,
     0x7F6A0DBB,0x086D3D2D,0x91646C97,0xE6635C01,
     0x6B6B51F4,0x1C6C6162,0x856530D8,0xF262004E,
     0x6C0695ED,0x1B01A57B,0x8208F4C1,0xF50FC457,
     0x65B0D9C6,0x12B7E950,0x8BBEB8EA,0xFCB9887C,
     0x62DD1DDF,0x15DA2D49,0x8CD37CF3,0xFBD44C65,
     0x4DB26158,0x3AB551CE,0xA3BC0074,0xD4BB30E2,
     0x4ADFA541,0x3DD895D7,0xA4D1C46D,0xD3D6F4FB,
     0x4369E96A,0x346ED9FC,0xAD678846,0xDA60B8D0,
     0x44042D73,0x33031DE5,0xAA0A4C5F,0xDD0D7CC9,
     0x5005713C,0x270241AA,0xBE0B1010,0xC90C2086,
     0x5768B525,0x206F85B3,0xB966D409,0xCE61E49F,
     0x5EDEF90E,0x29D9C998,0xB0D09822,0xC7D7A8B4,
     0x59B33D17,0x2EB40D81,0xB7BD5C3B,0xC0BA6CAD,

     0xEDB88320,0x9ABFB3B6,0x03B6E20C,0x74B1D29A,
     0xEAD54739,0x9DD277AF,0x04DB2615,0x73DC1683,
     0xE3630B12,0x94643B84,0x0D6D6A3E,0x7A6A5AA8,
     0xE40ECF0B,0x9309FF9D,0x0A00AE27,0x7D079EB1,
     0xF00F9344,0x8708A3D2,0x1E01F268,0x6906C2FE,
     0xF762575D,0x806567CB,0x196C3671,0x6E6B06E7,
     0xFED41B76,0x89D32BE0,0x10DA7A5A,0x67DD4ACC,
     0xF9B9DF6F,0x8EBEEFF9,0x17B7BE43,0x60B08ED5,
     0xD6D6A3E8,0xA1D1937E,0x38D8C2C4,0x4FDFF252,
     0xD1BB67F1,0xA6BC5767,0x3FB506DD,0x48B2364B,
     0xD80D2BDA,0xAF0A1B4C,0x36034AF6,0x41047A60,
     0xDF60EFC3,0xA867DF55,0x316E8EEF,0x4669BE79,
     0xCB61B38C,0xBC66831A,0x256FD2A0,0x5268E236,
     0xCC0C7795,0xBB0B4703,0x220216B9,0x5505262F,
     0xC5BA3BBE,0xB2BD0B28,0x2BB45A92,0x5CB36A04,
     0xC2D7FFA7,0xB5D0CF31,0x2CD99E8B,0x5BDEAE1D,

     0x9B64C2B0,0xEC63F226,0x756AA39C,0x026D930A,
     0x9C0906A9,0xEB0E363F,0x72076785,0x05005713,
     0x95BF4A82,0xE2B87A14,0x7BB12BAE,0x0CB61B38,
     0x92D28E9B,0xE5D5BE0D,0x7CDCEFB7,0x0BDBDF21,
     0x86D3D2D4,0xF1D4E242,0x68DDB3F8,0x1FDA836E,
     0x81BE16CD,0xF6B9265B,0x6FB077E1,0x18B74777,
     0x88085AE6,0xFF0F6A70,0x66063BCA,0x11010B5C,
     0x8F659EFF,0xF862AE69,0x616BFFD3,0x166CCF45,
     0xA00AE278,0xD70DD2EE,0x4E048354,0x3903B3C2,
     0xA7672661,0xD06016F7,0x4969474D,0x3E6E77DB,
     0xAED16A4A,0xD9D65ADC,0x40DF0B66,0x37D83BF0,
     0xA9BCAE53,0xDEBB9EC5,0x47B2CF7F,0x30B5FFE9,
     0xBDBDF21C,0xCABAC28A,0x53B39330,0x24B4A3A6,
     0xBAD03605,0xCDD70693,0x54DE5729,0x23D967BF,
     0xB3667A2E,0xC4614AB8,0x5D681B02,0x2A6F2B94,
     0xB40BBE37,0xC30C8EA1,0x5A05DF1B,0x2D02EF8D,
 };

 /*=============================================================================
    Functions:
 =============================================================================*/
 /*-----------------------------------------------------------------------------
    Name        : crc32Compute
    Description : Compute a 32-bit CRC
    Inputs      :
    Outputs     :
    Return      :
 ----------------------------------------------------------------------------*/
 crc32 crc32Compute(ubyte *packet, udword length)
 {
   udword index, tableIndex;
   crc32  crc;

   crc = 0xffffffff;
   for (index = 0; index < length; index++)
   {
      tableIndex = (crc ^ *(packet++)) & 0x000000FF;
      crc = ((crc >> 8) & 0x00FFFFFF) ^ CRCTable[tableIndex];
   }
   return(~crc);
 }

 The first CRC is the first half of the file name and the second CRC is the second half of the CRC.  Why do such a silly scheme?  It makes it easy to sort the TOC by CRC and do a binary search for a filename.  This makes for faster lookups.  All file requests in our file layer are resolved from the text name to an 8-byte CRC.

 As for some unknown data members, the header_unknown member you refer to is always 1.  A bit redundant?  Yes.  The toc_unknown[1..3] can be ignored.  They’re padding that is cleared to something by the compiler.
diff --git a/big-hw2.txt b/big-hw2.txt
 All indexes, offsets, and counts are little-endian and require
 conversion for the Mac PowerPC architecture, but are ok as-is on the
 Intel platform.  Items that are numeric and described as 4 bytes are of 
 type uint32_t.  Items that are numeric and described as 2 bytes are of
 type uint16_t.

 Overall format is:
    Archive Header
    Section Header describing the four sections immediately
        following the Archive Header (TOC List,
        Folder List, File Info List, and File Name List)
    TOC (Table of Contents) List
    Folder List
    File Info List
    File Name List
    File Data for all the files (including the 264 byte header
       preceeding the file data of each file)

 The format of each of the above is:
    
 180 byte archive header
 	8 bytes of "_ARCHIVE"
 	4 bytes version
 	16 bytes for MD5 tool signature of archive (MD5 of tool security
             key and full file data excluding the archive header)
 	128 bytes for 64 utf16 chars for archive name
 	16 bytes for MD5 signature of archive (MD5 of HW2 Root Security Key
             and archive header data)
 	4 bytes section header size
 	4 bytes exact file data offset

 24 byte section header consisting of four 6 byte sections.
 Each 6 byte section has:
 	4 byte offset relative to archive header
 	2 byte count

 The four sections are:
 	TOC List (describes each TOC entry, that is, each folder hierarchy)
 	Folder List (describes the folder hierarchy for each TOC)
 	File Info List (describes each file)
 	File Name List (the list of file names, including folder names)

 TOC list entry (138 bytes)
 	64 character alias name
 	64 character name
 	2 byte first folder index
 	2 byte last folder index
 	2 byte first filename index
 	2 byte last filename index
 	2 byte start folder index for hierarchy

 Folder list entry (12 bytes)
 	4 bytes file name offset (relative to file name list offset)
 	2 bytes first subfolder index
 	2 bytes last subfolder index
 	2 bytes first filename index
 	2 bytes last filename index

 File info list entry (17 bytes)
 	4 bytes file name offset (relative to file name list offset)
 	1 byte flags (0x00 if uncompressed
                      0x10 to decompress during read -- used for large files
                      0x20 to decompress all at once -- used for small files, like .lua files)
 	4 bytes file data offset (relative to overall file data offset)
 	4 bytes compressed length
 	4 bytes decompressed length

 File header preceding file data for each file (264 bytes)
 	256 chars for file name
        4 bytes file modification date
        4 bytes CRC of uncompressed file data.

 Note that the file data offset in the file info list entry indicates the
 location of the file data.  In order to access the file header
 preceeding the file data you must subtract 264 from the offset.

 The HW2 Root Security Key is an ASCII string that is passed first to
 the MD5 algorithm followed by the archive header data to create the
 archive's 128 bit (16 byte) MD5 signature.  The MD5 algorithm used is
 standard.  The Root Security Key is embedded in the HW2 application
 and also in Relic's archive tool.

 The tool security key is an ASCII string that is passed first to the MD5
 algorithm followed by the full data in the archive excluding the archive
 header to create the archive's 128 bit (16 byte) MD5 tool signature.
 The MD5 algorithm is standard.  The tool security key is embedded in
 Relic's archive tool.

 The file modification date appears to be the number of seconds
 since UTC 00:00:00 January 1st, 1970.  This date is the Unix epoch,
 although it is unknown to the author of this document if that is also
 the Windows epoch.

 The CRC algorithm used to calculate the uncompressed file data CRC
 is the exact same algorithm used for Homeworld.  Apparently the algorithm
 and table are taken from the 32-Bit CRC International Standard,
 which is based on a particular mathematical formula.  Thus there shouldn't
 be any concerns over copyright in this case.
	-----------------------------------------------------------------------------
	Originally hosted at: http://mods.relicnews.com/misc/BIGSpec.shtml
	-----------------------------------------------------------------------------



	BIG FILE FORMAT SPECIFICATION -- RBF1.23 -- v0.8
	(incomplete)
	01/04/2000

	by:
	_!Lachesis_atatata!
	pronounced:
	shaka >yell< lachesis shaka atatata >yell<
	(you may yell or scream at your capability. what you yell is irrelevant)

	member of:
	ICGOMA (International Cooperative Governance Organization of Meaningless Acronyms)

	in conjunction with:
	UCFEDD (Uraguayan Consortium of Ferrets with Erectile Dysfunction Disorder)
	APDOFF (Agrarian Paramilitary Defense Organization of Feminist Farmers)
	MOOOOO (MOOOOO)


	-----------------------------------------------------------------------------
	A. INTRODUCTION
	-----------------------------------------------------------------------------
	This specification describes the format of .BIG archives used in
	Homeworld by Relic Entertainment Inc. a BIG archive is a container for
	other files. instead of distributing hundreds of files separately, files are
	"packed" together into one larger file.

	at this time, it is not a complete specification. enough information has
	been learned to extract all files from a BIG archive, but not enough to
	be able to create one.

	if information from this document is used for implementations or in
	other documents, please credit me. it's just the nice thing to do.
	i'll be haunting the message boards, if you have a question, post it. if
	i don't reply i:
	1. don't know the answer
	2. am busy with something at some point on the globe
	3. am dead

	if there are errors on this document ... oops :)

	-----------------------------------------------------------------------------
	B. OVERALL LAYOUT
	-----------------------------------------------------------------------------
	there are three main areas to the BIG file format arranged
	sequentially one after the other:

	1. header
	2. table of contents
	3. name/data blobs

	each is described in the sections below.

	-----------------------------------------------------------------------------
	B1. HEADER AREA
	-----------------------------------------------------------------------------

	the header area comes at the beginning of the file. it has the normal
	elements you would expect to be there. there is only one header.

	-------------------------------------------------------------------------

	typedef unsigned long ulong_t;
	typedef unsigned char ubyte_t;

	const char BIG_FILE_ID[] = { 0x52, 0x42, 0x46, 0x31, 0x2E, 0x32, 0x33 };

	// note: "RBF1.23"
	struct header {
	char magic_cookie[7];
	ulong_t toc_size;
	ulong_t header_unknown;
	};

	-------------------------------------------------------------------------

	magic_cookie
	identifier for the file. The only currently known value for this is
	contained in BIG_FILE_ID;

	toc_size
	number of toc_entry structures (*see B2) that are contained in the
	file. (note: toc stands for Table of Contents)

	header_unknown
	this field's purpose is unknown. it's value is always 1. does not seem
	to play a role in anything yet.

	-----------------------------------------------------------------------------
	B2. TABLE OF CONTENTS AREA
	-----------------------------------------------------------------------------
	the table of contents area holds a list of toc_entry structures which
	comes directly after the header. header.toc_size tells you how many
	there are. for each file that is in the BIG archive, there is a toc_entry
	structure.

	-------------------------------------------------------------------------

	struct toc_entry {
	ulong_t crc_msb;
	ulong_t crc_lsb;
	ulong_t name_size;
	ulong_t data_compressed_size;
	ulong_t data_uncompressed_size;
	ulong_t file_offset;
	time_t timestamp;
	ubyte_t toc_unknown[4];
	};

	-------------------------------------------------------------------------

	crc_msb/crc_lsb
	this is some sort of crc value. crc_msb is the most significant byte and
	crc_lsb is the least significant byte of a larger 8-byte(64-bit) crc
	number. exactly how it's generated and how to use it is unknown.
	(*see crc note below)

	name_size
	the size(length) of the name in the name_data_blob (*see B3) that
	this toc_entry refers to.

	data_compressed_size
	the size of the data in the name_data_blob (*see B3) in compressed
	form.

	data_uncompressed_size
	the size of the data in the name_data_blob (*see B3) in uncompressed
	form.

	file_offset
	the offset in bytes from the beginning of the BIG file to the
	name_data_blob (*see B3) that this toc_entry refers to.

	timestamp
	the date/time of the entry.

	toc_unknown
	what part this element plays is unknown. what is known is that
	toc_unknown[0] will be 1 if the data has been compressed and 0 if it
	has not. (*see toc_unknown note below) toc_unknown[1..3] always
	seems to equal {0xC9, 0xCA, 0xCB)

	-------------------------------------------------------------------------

	toc_unknown note:
	sometimes the data is compressed, sometimes it's not. compression
	can be determined by doing either:

	1. data_compressed_size < data_uncompressed_size or ...
	2. toc_unknown[0] == 1

	I would personally suggest sticking to option 1 until toc_unknown is
	fully understood.

	-------------------------------------------------------------------------

	crc note:
	the entire table of contents seems to be sorted by the 8-byte crc
	value. why is a complete mystery. the crc itself is most likely used as
	a data validation mechanism. but to sort on it? it may be that the crc
	is also being used as a mechanism to uniquely identify a file. i think
	that there are no duplicate crc's.

	-----------------------------------------------------------------------------
	B3. NAME/DATA BLOB AREA
	-----------------------------------------------------------------------------
	the name/data blob area holds a list of name_data_blob structures
	which comes directly after the table of contents area. a
	name_data_blob holds the name and data of a file that exists in the
	BIG archive. each name_data_blob is referred to by a toc_entry
	structure which gives the sizes of the name and data fields.

	-------------------------------------------------------------------------

	struct name_data_blob {
	char name[];
	char data[];
	};

	quick note:
	the structure above is not legal C. the fields are not fixed length and
	i've used this notation because it's useful. use the standard techniques
	of dynamic memory when implementing.

	-------------------------------------------------------------------------

	name
	this is the file name of the original file. it's length is determined from
	toc_entry.name_size + 1 (note: there IS a terminating null byte)
	which this name_data_blob is being referred by. it also happens to be
	encrypted. (*see encryption note below)

	data
	this is the actual file data. it's length is determined from the
	toc_entry.data_compressed_size field which this name_data_blob is
	being referred by. if the data is compressed, the algorithm used is
	LZSS. (*see compression note below)

	-------------------------------------------------------------------------

	encryption note:
	as if the world isn't evil enough as it is, some sick sick sick bastard >
	my kinda programmer actually :) < decided to encrypt the file names
	so that those lurking hacker types would get all confused and bleary-
	eyed from trying to hack the format. luckily, i knew this silly
	encryption trick ... i've used it on others myself. ;)
	what's used is an XOR run. it's really simple, really fast, and has no
	redeeming value (ie. compression, secure encryption) other than to
	screw with people. here's the code for doing it.

	void xor_run(char* buffer, ulong_t buffer_size)
	{
	char last_char;
	ulong_t i;
	last_char = (char)0xD5;

	for (i = 0; i < buffer_size; i++)
	{
	last_char ^= buffer[i];
	buffer[i] = last_char;
	}
	}

	for those of you who don't catch on, that's both the encryption AND
	decryption routine. this particular version de/encrypts "in-place" and
	writes over the buffer you pass in.
	as an implementation note, don't touch the terminating null when
	decrypting. pass in toc_entry.name_size, not toc_entry.name_size + 1.

	-------------------------------------------------------------------------

	compression note:
	compression (decompression) of data is done using the LZSS
	algorithm. for this particular implementation:

	> very basic LZSS (ie. no huffman a la ZIP)
	> marker bit of 1 signals passthrough character
	> marker bit of 0 signals dictionary entry
	> dictionary entry is composed of 12-bit index and 4-bit length fields

	don't know the LZSS algorithm? go to:

	http://dogma.net/DataCompression/

	if you look carefully, you will even find a clean LZSS implementation
	in C up amongst the links that works perfectly. of course, since life
	isn't fair in the least, i found it about an hour AFTER implementing it
	myself. ;(

	note: i'm not associated with the link above, or any link off of it. so
	don't waste your time speculating.

	-----------------------------------------------------------------------------
	C. >gratzi<
	-----------------------------------------------------------------------------
	>gratzis< go out to Relic for coming up with Homeworld. very nice.
	artsy >gratzi< goes to the cutscenes which, though simple, were very
	effective. special >gratzi< for the music and the voice acting. though
	not enough music :(

	>gratzi< to the person who eventually creates me this ship:
	Light Carrier
	(to support guerilla tactics)
	as compared to the standard carrier:
	no space consuming construction facilities
	a bit smaller
	a bit lighter
	a bit faster
	a bit cheaper
	more docking ports for faster docking of large wings
	capacity for more fighters/corvettes
	a bit faster fighter/corvette repair cycle
	enough small guns to give a scout/interceptor wing a hard time

	>!bye!<





	-----------------------------------------------------------------------------
	Originally provided in Relic's source: BIGaddendum.doc
	-----------------------------------------------------------------------------


	.BIG file specification addendum
	By B1FF ( HYPERLINK "mailto:[email protected]" [email protected])

	The article listed on RelicNews is pretty complete WRT the .BIG file format. It was really neat to download the program for viewing and extracting the contents of a bigfile. We will probably release our bigfile creation program but you will note that our version of the ‘extract’ command was never finished. Oh the pains of finalling!

	The only thing that was not pick up on was the CRC’s of the bigfile. The CRC is an 8-byte CRC actually made up of 2 standard 32-bit CRC’s. Included is some sample code to create these CRC’s. I think I originally copied this code from Graphics Gem’s several games ago. It’s pretty standard. Make note of this algorithm. It is also used in the .CRC format.

	udword CRCTable[] =
	{
	0x00000000,0x77073096,0xEE0E612C,0x990951BA,
	0x076DC419,0x706AF48F,0xE963A535,0x9E6495A3,
	0x0EDB8832,0x79DCB8A4,0xE0D5E91E,0x97D2D988,
	0x09B64C2B,0x7EB17CBD,0xE7B82D07,0x90BF1D91,
	0x1DB71064,0x6AB020F2,0xF3B97148,0x84BE41DE,
	0x1ADAD47D,0x6DDDE4EB,0xF4D4B551,0x83D385C7,
	0x136C9856,0x646BA8C0,0xFD62F97A,0x8A65C9EC,
	0x14015C4F,0x63066CD9,0xFA0F3D63,0x8D080DF5,
	0x3B6E20C8,0x4C69105E,0xD56041E4,0xA2677172,
	0x3C03E4D1,0x4B04D447,0xD20D85FD,0xA50AB56B,
	0x35B5A8FA,0x42B2986C,0xDBBBC9D6,0xACBCF940,
	0x32D86CE3,0x45DF5C75,0xDCD60DCF,0xABD13D59,
	0x26D930AC,0x51DE003A,0xC8D75180,0xBFD06116,
	0x21B4F4B5,0x56B3C423,0xCFBA9599,0xB8BDA50F,
	0x2802B89E,0x5F058808,0xC60CD9B2,0xB10BE924,
	0x2F6F7C87,0x58684C11,0xC1611DAB,0xB6662D3D,

	0x76DC4190,0x01DB7106,0x98D220BC,0xEFD5102A,
	0x71B18589,0x06B6B51F,0x9FBFE4A5,0xE8B8D433,
	0x7807C9A2,0x0F00F934,0x9609A88E,0xE10E9818,
	0x7F6A0DBB,0x086D3D2D,0x91646C97,0xE6635C01,
	0x6B6B51F4,0x1C6C6162,0x856530D8,0xF262004E,
	0x6C0695ED,0x1B01A57B,0x8208F4C1,0xF50FC457,
	0x65B0D9C6,0x12B7E950,0x8BBEB8EA,0xFCB9887C,
	0x62DD1DDF,0x15DA2D49,0x8CD37CF3,0xFBD44C65,
	0x4DB26158,0x3AB551CE,0xA3BC0074,0xD4BB30E2,
	0x4ADFA541,0x3DD895D7,0xA4D1C46D,0xD3D6F4FB,
	0x4369E96A,0x346ED9FC,0xAD678846,0xDA60B8D0,
	0x44042D73,0x33031DE5,0xAA0A4C5F,0xDD0D7CC9,
	0x5005713C,0x270241AA,0xBE0B1010,0xC90C2086,
	0x5768B525,0x206F85B3,0xB966D409,0xCE61E49F,
	0x5EDEF90E,0x29D9C998,0xB0D09822,0xC7D7A8B4,
	0x59B33D17,0x2EB40D81,0xB7BD5C3B,0xC0BA6CAD,

	0xEDB88320,0x9ABFB3B6,0x03B6E20C,0x74B1D29A,
	0xEAD54739,0x9DD277AF,0x04DB2615,0x73DC1683,
	0xE3630B12,0x94643B84,0x0D6D6A3E,0x7A6A5AA8,
	0xE40ECF0B,0x9309FF9D,0x0A00AE27,0x7D079EB1,
	0xF00F9344,0x8708A3D2,0x1E01F268,0x6906C2FE,
	0xF762575D,0x806567CB,0x196C3671,0x6E6B06E7,
	0xFED41B76,0x89D32BE0,0x10DA7A5A,0x67DD4ACC,
	0xF9B9DF6F,0x8EBEEFF9,0x17B7BE43,0x60B08ED5,
	0xD6D6A3E8,0xA1D1937E,0x38D8C2C4,0x4FDFF252,
	0xD1BB67F1,0xA6BC5767,0x3FB506DD,0x48B2364B,
	0xD80D2BDA,0xAF0A1B4C,0x36034AF6,0x41047A60,
	0xDF60EFC3,0xA867DF55,0x316E8EEF,0x4669BE79,
	0xCB61B38C,0xBC66831A,0x256FD2A0,0x5268E236,
	0xCC0C7795,0xBB0B4703,0x220216B9,0x5505262F,
	0xC5BA3BBE,0xB2BD0B28,0x2BB45A92,0x5CB36A04,
	0xC2D7FFA7,0xB5D0CF31,0x2CD99E8B,0x5BDEAE1D,

	0x9B64C2B0,0xEC63F226,0x756AA39C,0x026D930A,
	0x9C0906A9,0xEB0E363F,0x72076785,0x05005713,
	0x95BF4A82,0xE2B87A14,0x7BB12BAE,0x0CB61B38,
	0x92D28E9B,0xE5D5BE0D,0x7CDCEFB7,0x0BDBDF21,
	0x86D3D2D4,0xF1D4E242,0x68DDB3F8,0x1FDA836E,
	0x81BE16CD,0xF6B9265B,0x6FB077E1,0x18B74777,
	0x88085AE6,0xFF0F6A70,0x66063BCA,0x11010B5C,
	0x8F659EFF,0xF862AE69,0x616BFFD3,0x166CCF45,
	0xA00AE278,0xD70DD2EE,0x4E048354,0x3903B3C2,
	0xA7672661,0xD06016F7,0x4969474D,0x3E6E77DB,
	0xAED16A4A,0xD9D65ADC,0x40DF0B66,0x37D83BF0,
	0xA9BCAE53,0xDEBB9EC5,0x47B2CF7F,0x30B5FFE9,
	0xBDBDF21C,0xCABAC28A,0x53B39330,0x24B4A3A6,
	0xBAD03605,0xCDD70693,0x54DE5729,0x23D967BF,
	0xB3667A2E,0xC4614AB8,0x5D681B02,0x2A6F2B94,
	0xB40BBE37,0xC30C8EA1,0x5A05DF1B,0x2D02EF8D,
	};

	/*=============================================================================
	Functions:
	=============================================================================*/
	/*-----------------------------------------------------------------------------
	Name : crc32Compute
	Description : Compute a 32-bit CRC
	Inputs :
	Outputs :
	Return :
	----------------------------------------------------------------------------*/
	crc32 crc32Compute(ubyte *packet, udword length)
	{
	udword index, tableIndex;
	crc32 crc;

	crc = 0xffffffff;
	for (index = 0; index < length; index++)
	{
	tableIndex = (crc ^ *(packet++)) & 0x000000FF;
	crc = ((crc >> 8) & 0x00FFFFFF) ^ CRCTable[tableIndex];
	}
	return(~crc);
	}

	The first CRC is the first half of the file name and the second CRC is the second half of the CRC. Why do such a silly scheme? It makes it easy to sort the TOC by CRC and do a binary search for a filename. This makes for faster lookups. All file requests in our file layer are resolved from the text name to an 8-byte CRC.

	As for some unknown data members, the header_unknown member you refer to is always 1. A bit redundant? Yes. The toc_unknown[1..3] can be ignored. They’re padding that is cleared to something by the compiler.
	All indexes, offsets, and counts are little-endian and require
	conversion for the Mac PowerPC architecture, but are ok as-is on the
	Intel platform. Items that are numeric and described as 4 bytes are of
	type uint32_t. Items that are numeric and described as 2 bytes are of
	type uint16_t.

	Overall format is:
	Archive Header
	Section Header describing the four sections immediately
	following the Archive Header (TOC List,
	Folder List, File Info List, and File Name List)
	TOC (Table of Contents) List
	Folder List
	File Info List
	File Name List
	File Data for all the files (including the 264 byte header
	preceeding the file data of each file)

	The format of each of the above is:

	180 byte archive header
	8 bytes of "_ARCHIVE"
	4 bytes version
	16 bytes for MD5 tool signature of archive (MD5 of tool security
	key and full file data excluding the archive header)
	128 bytes for 64 utf16 chars for archive name
	16 bytes for MD5 signature of archive (MD5 of HW2 Root Security Key
	and archive header data)
	4 bytes section header size
	4 bytes exact file data offset

	24 byte section header consisting of four 6 byte sections.
	Each 6 byte section has:
	4 byte offset relative to archive header
	2 byte count

	The four sections are:
	TOC List (describes each TOC entry, that is, each folder hierarchy)
	Folder List (describes the folder hierarchy for each TOC)
	File Info List (describes each file)
	File Name List (the list of file names, including folder names)

	TOC list entry (138 bytes)
	64 character alias name
	64 character name
	2 byte first folder index
	2 byte last folder index
	2 byte first filename index
	2 byte last filename index
	2 byte start folder index for hierarchy

	Folder list entry (12 bytes)
	4 bytes file name offset (relative to file name list offset)
	2 bytes first subfolder index
	2 bytes last subfolder index
	2 bytes first filename index
	2 bytes last filename index

	File info list entry (17 bytes)
	4 bytes file name offset (relative to file name list offset)
	1 byte flags (0x00 if uncompressed
	0x10 to decompress during read -- used for large files
	0x20 to decompress all at once -- used for small files, like .lua files)
	4 bytes file data offset (relative to overall file data offset)
	4 bytes compressed length
	4 bytes decompressed length

	File header preceding file data for each file (264 bytes)
	256 chars for file name
	4 bytes file modification date
	4 bytes CRC of uncompressed file data.

	Note that the file data offset in the file info list entry indicates the
	location of the file data. In order to access the file header
	preceeding the file data you must subtract 264 from the offset.

	The HW2 Root Security Key is an ASCII string that is passed first to
	the MD5 algorithm followed by the archive header data to create the
	archive's 128 bit (16 byte) MD5 signature. The MD5 algorithm used is
	standard. The Root Security Key is embedded in the HW2 application
	and also in Relic's archive tool.

	The tool security key is an ASCII string that is passed first to the MD5
	algorithm followed by the full data in the archive excluding the archive
	header to create the archive's 128 bit (16 byte) MD5 tool signature.
	The MD5 algorithm is standard. The tool security key is embedded in
	Relic's archive tool.

	The file modification date appears to be the number of seconds
	since UTC 00:00:00 January 1st, 1970. This date is the Unix epoch,
	although it is unknown to the author of this document if that is also
	the Windows epoch.

	The CRC algorithm used to calculate the uncompressed file data CRC
	is the exact same algorithm used for Homeworld. Apparently the algorithm
	and table are taken from the 32-Bit CRC International Standard,
	which is based on a particular mathematical formula. Thus there shouldn't
	be any concerns over copyright in this case.