The MIO0 compressed data format is used in several N64 games, including Super Mario 64. It is related to LZ77.
Offset | Len | Description |
---|---|---|
0x00 | 0x10 | Header: |
0x00 | 0x04 | Signature: “MIO0” |
0x04 | 0x04 | Decompressed length |
0x08 | 0x04 | Compressed offset (CO) |
0x0C | 0x04 | Uncompressed offset (UO) |
0x10 | Layout bits: 0 = compressed, 1 = uncompressed data | |
Padding to align compressed | ||
(CO) | Compressed Data: 6-bit length/offset | |
(UO) | Uncompressed Data: individual data bytes |
In all games, the header is 16-bytes and aligned to 4-byte boundaries. In SM64, it is aligned to a 16-byte boundary (last 4 address bits are 0). The header contains the “MIO0” signature followed by three 32-bit values: decompressed length, compressed offset, and uncompressed offset.
The layout bits section begins immediately after the header, at offset 0x10. These bits identify if the next group of output data are described by a compressed group (0) or if the next byte is to be pulled from the uncompressed data (1). The bits are packed starting at most significant first moving down to least significant. Additional padding 0x00 values may be after the layout bits to align the compressed data section to a 4-byte boundary.
The compressed data are located at “Compressed offset” described in the header and is always aligned to a 4-byte boundary (last 2 address bits are 0). They are composed of 16-bit values that describe where to copy the next sequence of bytes from. Each 16-bit value consists of a 4-bit length, and a 12-bit look-back offset: length = upper 4 bits of first byte + 3 [range: 3-18] offset = lower 4 bits of first and all second byte + 1 [range: 1-4096] Note: The length can be greater than the offset. This just means that it will copy from other data that was already copied during this compressed block. This often occurs when there are large sections of the same data byte.
The uncompressed data immediately follows the compressed data and thus is aligned to a 2-byte boundary since the compressed data is aligned to a 4-byte and the compressed data is 16-bit chunks. The uncompressed data are individual bytes of data, one for each '1' in the layout bits.
Thanks to BGNG who did the initial leg work for understanding MIO0 and creating M0CK. I am sure there were others that helped along the way, but many of the old links are dead. If anyone has any more info on previous work, let me know and I'll update the post here.