Crash Bandicoot File Formats

This document is intended to contain all the information I have about the file formats used by the original Crash Bandicoot trilogy for the Sony Playstation. This document is not official in any way.

Contents



The following basic terminology is used in this document:

PSX
The original Sony Playstation.
Byte
An 8-bit quantity of data. This is the byte size of the PSX.
Word
A 32-bit quantity of data. This is the word size of the PSX.
Intn
An n-bit signed integer.
UIntn
An n-bit unsigned integer.
n...m
Any integer in the range from n to m inclusive, or the range n to m itself.
CID
Chunk ID.
EID
Entry ID.
TBD
Not yet documented.
KB
Kilobyte
A unit equivalent to 1024 bytes.

All of the formats described in this document store their fields in little-endian byte order unless otherwise stated. All signed integer types are two's complement unless otherwise stated.

General Overview

Contained on the game disc are the following files:

The game executable file contains the code that is loaded in and started by the playstation BIOS.

The SYSTEM.CNF file contains settings used to configure the kernel before booting the game itself, as well as the filename of the executable. If SYSTEM.CNF is not present, the playstation will assume a default configuration and the executable filename PSX.EXE.

Each NSD/NSF file pair probably contains all of the data necessary for the game engine to play a specific level. The NSF file is an archive of numerically identified file-like structures (referred to in this document as "entries") organized into a series of 64 KB pages (referred to in this document as "chunks"). The NSD file contains supplementary data used to assist the game engine in accessing the NSF file. This document focuses on these two formats and the formats of the files contained within them.

Format Table

The following formats are described in this document, arranged by type and applicable game:

Format Chart
Crash Prototype Crash 1 Crash 2 Crash 3
Known Format
Partially Known Format
Known But TBD Format
Unknown Format
Unknown Purpose
NSF NSF NSF w/ Compression NSF
NSD Prototype NSD Old NSD NSD
T0 Chunks Normal Chunk
T1 Chunks Texture Chunk
T2 Chunks Old Sound Chunk
T3 Chunks Sound Chunk
T4 Chunks Wavebank Chunk
T5 Chunks Speech Chunk
T1 Entries Unknown (Maybe entity model animation?)
T2 Entries Old Model Entry Model Entry
T3 Entries Old Scenery Entry Scenery Entry
T4 Entries Unknown (Maybe display lists for level model polygons?)
T5 Entries Texture Chunk
T6 Entries Unknown (Maybe missing NSD data?)
T7 Entries Old Entity Entry Entity Entry
T11 Entries Code Entry
T12 Entries Sound Entry
T13 Entries Old Music Entry Music Entry
T14 Entries Wavebank Entry
T15 Entries Unknown (Maybe title screen graphics?) Unknown (Maybe special collision detection rules?)
T17 Entries Unknown (Maybe map data?) Unknown (Maybe data of level-specific purpose?)
T18 Entries Unknown (Maybe palettes?)
T19 Entries Demo Entry
T20 Entries Unknown Speech Entry
T21 Entries Unknown

NSF

NSF files contain the actual game data. Each NSF file is directly broken down into 64 KB pages, referred to in this document as "chunks".

NSF files have no header.

Chunk (ID = 1) Chunk (ID = 3) Chunk (ID = 5) 00000 0FFFF 10000 1FFFF 20000 2FFFF

Chunk

Chunk Header (Entry Count = 3) Offset 0 Offset 1 Offset 2 Offset 3 Unused Space (*) Entry 0 Entry 1 Entry 2 Unused Space (*) (*) Not always present

Chunks are the top level objects contained in an NSF file. Each chunk is exactly 64 KB in size. Chunks are containers for the second level objects (hereafter referred to as "entries"). Every entry must be aligned on a word boundary.

Every chunk has an identifier based on its position within the NSF file. This chunk ID is calculated as Chunk Offset / 65536 * 2 + 1. Therefore, the first chunk is assigned ID 1, the second ID 3, the third ID 5, etc. This ensures that all chunk ID's have their least significant bit set, allowing the game engine to differentiate between chunk ID's and pointers.

Chunks start with a header of the following format:

Chunk Header Format
Type Value
Total Size 16 Bytes
Magic Number Int16 0x1234
Type Int16 Zero for T0, three for T3, etc
Chunk ID Int32
Entry Count Int32
Checksum UInt32

Immediately following this structure is Entry Count + 1 Int32's, each of which is an offset relative to the start of the chunk. The first offset marks the start of the first entry in the chunk; the second offset marks the end of the first entry and the start of the second; the third marks the end of the second and the start of the third; etc. The last offset marks the end of the last entry. Note that, although the first entry typically starts immediately after this list of offsets, it is not required to.

The checksum can be calculated with the following C code:

// NSF Chunk Checksum Calculator uint32_t nsfChecksum(const unsigned char *data) { uint32_t checksum = 0x12345678; for (int i = 0;i < 65536;i++) { if (i < 12 || i >= 16) checksum += data[i]; checksum = checksum << 3 | checksum >> 29; } return checksum; }

Entry

Entry Header (Item Count = 5) Offset 0 Offset 1 Offset 2 Offset 3 Offset 4 Offset 5 Unused Space (*) Item 0 Item 1 Item 2 Item 3 Item 4 Unused Space (*) (*) Not always present

Entries are the second level objects contained in an NSF file. All entries are containers for the third level objects (hereafter referred to as "items"). Every item must be aligned on a word boundary. Additionally, the length of every item must be a multiple of 4 bytes to ensure the proper alignment of the following item. Any items not otherwise meeting this requirement are padded at the end with null bytes to satisfy the constraint.

Entries all have a common header format, which is:

Entry Header Format
Type Value
Total Size 16 Bytes
Magic Number Int32 0x100FFFF
Entry ID Int32 See below
Type Int32 One for T1, two for T2, etc
Item Count Int32

Immediately following the header is Item Count + 1 Int32's, each of which is an offset relative to the start of the entry. The first offset marks the start of the first item in the entry; the second offset marks the end of the first item and the start of the second item; the third marks the end of the second and the start of the third; etc. The last offset marks the end of the last item.

Every entry is assigned an ID used to refer to the entry from other locations throughout the game files. This ID is a specially-encoded 5-character string, as a 32-bit value with the following bit layout:

EID Format
Bit Meaning
0 LSB Pointer Identification Bit Set to 1
1...6 First Character
7...12 Second Character
13...18 Third Character
19...24 Fourth Character
25...30 Fifth Character
31 MSB Unknown

The Pointer Identification Bit is used by the game engine to differentiate between EID's and pointers. It must be set for all EID's. The character set used for the characters in the encoding scheme is:

0 1 2 3 4 5 6 7 8 9 a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z _ !

A special EID, 0x6396347F, is the null EID. It is used to refer to nothing instead of a particular entry. Decoded as a string, its value is "NONE!".

NSF Compression

In retail Crash 1, NSF files include a compressed set of copies of the first few (how many? EDIT: at 0x41C in NSD file) chunks that appear in the file. The compressed copies appear at the start of the file, before the usual uncompressed chunks.

Each compressed chunk starts with the following header:

Compressed Chunk Header Format
Type Value
Total Size 12 Bytes
Magic Number Int16 0x1235
Int16 0
Length Int32
Skip Int32

This structure is followed by a compressed string of bytes which, when decompressed, has a length of Length bytes. The compression format is: TBD

This is then followed by Skip unused bytes, followed by 0x10000 - Length uncompressed data bytes, which are appended to the compressed bytes after decompression to form a complete uncompressed chunk.

Compressed chunks are always a multiple of 2048 bytes long. The Skip and Length fields are used to facilitate this.

NSD

TBD. Very similar to Old NSD.

Old NSD

TBD. Very similar to Prototype NSD, but with some added details.

Prototype NSD

TODO: Put a nice diagram here to show how the hash table works

Prototype NSD files are hash tables linking entry ID's to the chunk ID's of their containing chunks. Texture chunks are also included, linking their respective entry and chunk ID's.

The hashing function is easily described in C-style syntax as ((eid >> 15) & 0xFF). To search for an EID in the table, its hash is computed and then used as an index into an initial 256-word table. The resulting value is then used as the starting index to begin a linear search through the following key-value pair list.

Prototype NSD files start with the following structure:

Prototype NSD Info Format
Type Value
Total Size 1032 Bytes
Initial Table Int32[256]
Chunk Count Int32 The number of chunks in the matching NSF file
Entry Count Int32 The number of entries in the matching NSF file

Following this is the key-value pair list, which is Entry Count of the following structure:

Prototype NSD Key-Value Pair Format
Type Value
Total Size 8 Bytes
Chunk ID Int32 The chunk ID of the chunk containing the specified entry
Entry ID Int32 The entry ID of an entry in the NSF file

Normal Chunk

Normal chunks are the standard chunk type. Any entries that do not appear in a different chunk type appear in a normal chunk.

Sound Chunk

Sound chunks are chunks used to hold sound entries. To facilitate this, sound chunks have a special alignment restriction; every entry must begin 8 bytes after a 16-byte boundary. This causes the audio data to be aligned on a 16-byte boundary (the entry header is 24 bytes). Failure to comply with this alignment specification leads to broken sound effects in-game. (For emulators, this restriction is more lax. See Additional Notes.)

For certain old, unused levels in Crash 1, old sound chunks are used instead.

Old Sound Chunk

Old sound chunks are an older variant of the more commonly seen sound chunks. Old sound chunks appear in the unused "waterfall", "cavern", and similar levels. Old sound chunks contain sound entries just as sound chunks do, and follow the same alignment restrictions.

Speech Chunk

Speech chunks have the same format and alignment restrictions as sound chunks. However, speech chunks contain speech entries instead of sound entries

Wavebank Chunk

TODO: Put a nice diagram here to show how the alignment works.

Wavebank chunks are used to hold wavebank entries. To facilitate this, wavebank chunks have a special alignment restriction; every entry must begin 4 bytes after a 16-byte boundary. This causes the audio data to be aligned 8 bytes after a 16-byte boundary (entry header is 28 bytes, plus 8 bytes for the info section), although why this is necessary, if it even is (IIRC it isn't), I have absolutely no idea.

Texture Chunk

Texture chunks, unlike the other types of chunks, do not contain a list of entries. Instead, the entire 64KB block is treated as a special type of entry, which contains graphical data including textures and color palettes.

Texture chunks have a slightly different header format from other chunks. The format is:

Texture Chunk Header Format
Type Value
Total Size 16 Bytes
Magic Number Int16 0x1234
Type Int16 1
Entry ID Int32
Entry Type Int32 5
Checksum Int32

The format of the remaining data is unknown. However, the entire chunk appears to be a 512x128 texture page to be loaded into video memory.

Entity Entry

TBD. Haven't converted this over from the old crash 2-only page layout, so it's still here: T7 Entries.

Code Entry

The first item is TBD

The second item is compiled GOOL bytecode. TBD, some documentation in Additional Notes.

The third item is read-only(?) data used by the GOOL code.

The fourth item, if present, has the following format:

GOOL Spawn/Event Handler Table Format
Type Value
Total Size Variable Length
Event Handlers Int16[Event Handler Count]
Spawn Handlers Int16[...]

Each array element is an index into the fifth item, which (if present) is a list of the following structure:

GOOL Process Initialization Info Format
Type Value
Total Size 16 Bytes
Unknown Int32
Unknown Int32
Pointer To Code EID Int16
Unknown Int16
OnExit Program Counter Int16
Initial Program Counter Int16

The format of the sixth item, if present, is unknown.

Old Model Entry

Old model entries are the Crash 1 version of the model entries. Old model entries always have 2 items, the format of the first of which is:

Old Model Entry Info Format
Type Value
Total Size Variable Length
Unknown Count 1 Int32
X Scale Int32
Y Scale Int32
Z Scale Int32
Unknown Count 2 Int32
Unknown Int32[Unknown Count 2]

The second item is a list of Unknown Count 1 64-bit values of unknown purpose.

Old Scenery Entry

Old scenery entries are the Crash 1 version of the scenery entries. Old scenery entries always have 3 items, the format of the first of which is:

TBD

The second item is a list of 3-point polygons, each of which is 64 bits with the following layout:

Old Scenery Polygon Format
Bit Meaning
0...19 LSB Unknown
20...31 1st Vertex
32...39 Unknown
40...51 2nd Vertex
52...63 MSB 3rd Vertex

The "vertex" fields are indexes into the third item, which is a list of vertices. Each vertex is 64 bits with the following layout:

Old Scenery Vertex Format
Bit Meaning
0...7 LSB Red Component
8...15 Green Component
16...23 Blue Component
24...31 Z Coordinate Low Part
32 Special Effects Flag
33...34 Z Coordinate Middle Part
35...47 X Coordinate
48...50 Z Coordinate High Part
51...63 MSB Y Coordinate

The vertex coordinates are all signed 16-bit values with the lower 3 bits removed. As such, they should be left-shifted by 3 bits and sign-extended before use. The special effects flag indicates whether the vertex should be subject to certain effects in game. These effects vary by level, such as lightning effects for Slippery Climb, flickering light effects in Temple Ruins, red-glowing strobe effects in Cortex Power, illumination effects in Lights Out, and water waving/warping/distortion effects in Up The Creek.

Sound Entry

Each sound entry contains a single sound effect or voice clip which can be played back in-game. A sound entry has a single item, which is raw playstation-format audio. The sample rate is unknown.

For Crash 2 & 3, speech audio is held in speech entries instead.

Speech Entry

Speech entries are identical in format to sound entries, and contain voice clips used during cutscene dialogue and additional dialogue appearing e.g. during boss fights.

Music Entry

Music entries contain data related to the in-game music.

Music entries always have exactly three items. The format of the first item is:

Music Entry Info Format
Type Value
Total Size 36 Bytes
Track Count Int32
VH EID Int32 A reference to the music entry containing the associated VH file
VB Part 0 EID Int32 A reference to the wavebank entry containing part 0 of the associated VB file, or 0x6396347F if there is none
VB Part 1 EID Int32 A reference to the wavebank entry containing part 1 of the associated VB file, or 0x6396347F if there is none
VB Part 2 EID Int32 A reference to the wavebank entry containing part 2 of the associated VB file, or 0x6396347F if there is none
VB Part 3 EID Int32 A reference to the wavebank entry containing part 3 of the associated VB file, or 0x6396347F if there is none
VB Part 4 EID Int32 A reference to the wavebank entry containing part 4 of the associated VB file, or 0x6396347F if there is none
VB Part 5 EID Int32 A reference to the wavebank entry containing part 5 of the associated VB file, or 0x6396347F if there is none
VB Part 6 EID Int32 A reference to the wavebank entry containing part 6 of the associated VB file, or 0x6396347F if there is none

The second item is either a VH file or nothing (empty, no bytes).

The third item is a SEP file containing Track Count tracks.

Wavebank Entry

Wavebank entries are essentially parts of a VB file. Because chunks are 64 KB in size, larger data cannot be fit into them in whole. The VB files used in-game are split into multiple of these entries. They are recombined using the part number from the first item.

Wavebank entries always have exactly two items. The format of the first item is:

Wavebank Entry Info Format
Type Value
Total Size 8 Bytes
Part Number Int32 0...6
Length Int32 The length of the second item

The second item is part of a VB file.

Demo Entry

Demo entries contain a series of inputs, and possibly other data, to be played back in "demo mode", which occurs when a player idles on the start menu for a set amount of time. Each demo entry has a single item, which is of unknown format.


Old Documentation

TODO: Reformat this documentation into the newer style.

T4 Entries

T4 entries have an unknown purpose.

The format of T4 entry items is:

T4 Entry Item Format
Type Value
Total Size Variable Length
Count Int16
Unknown Int16 0 if this is the first or last item in the entry, 1 otherwise
Data Int16[Count]

T7 Entries

T7 entries contain the individual in-game entities, such as boxes and penguins.

Maybe they also contain the layout of the level, like floors and walls?

T7 entries always have at least 2 items. The format of the first two items is unknown.

The remaining items represent in-game entities.

Entities

Entities are expressed as collections of values (hereafter referred to as "fields") and some other stuff that I haven't figured out yet. The entity header format is:

Entity Header Format
Type Value
Total Size 16 Bytes
Length Int32 The length of this item
Unknown Int32 0
Unknown Int32 0
Field Count Int32

This header is followed by Field Count of this structure:

Entity Field Info Format
Type Value
Total Size 8 Bytes
Type Int16
Offset UInt16
Flags Byte See "Additional Notes" for more information
Element Size Int8
Unknown Int16

The offsets are each relative to 12 bytes after the start of the item, and point to the following structure:

Entity Field Format
Type Value
Total Size Variable Length
Element Count Int16
Unknown Int16
Data Byte[Element Size][Element Count]
Unknown Byte[???]

The format and usage of the field data depends on the field type.

2C: Name

Data is an array of characters that forms a null-terminated UTF-8 string. The string is the internal name used to refer to the entity.

It's probably just ASCII instead of UTF-8, but it's best that this be established as the standard now rather than having character encoding fights and incompatibilities later on.

4B: Position

Data is a list of coordinates for the positions used by the entity. Each element in the list has the following structure:

TBD

9F: ID

Data is an Int32. TBD

A4: General Settings

Data is a list of Int32's. TBD

A9: Type

Data is an Int32. The Int32 refers to the type of the entity. E.g. box

Box is 0x22

AA: Subtype

Data is an Int32. The Int32 refers to the variant of the type of the entity. E.g. nitro

For boxes it's:

public enum BoxType : int
{
	TNT = 0,
	Normal = 2,
	Arrow = 3,
	Checkpoint = 4,
	Iron = 5,
	Apple = 6,
	ExclamationMark = 7,
	Life = 8,
	Mask = 9,
	QuestionMark = 10,
	IronArrow = 15,
	Nitro = 18,
	Bodyslam = 23,
	NitroDetonator = 24
}
		
287: Victims

Data is a list of Int16's. Each value is the ID of an entity to be destroyed by this entity in various situations.


Appendix A: PSX Formats

The Playstation has some formats used by many of its games. The formats relevant to crash hacking are (or should be) documented here.

Playstation Textures

Note: This does not describe the playstation TIM file format, only the lower level texture format.

Playstation textures can use the following pixel formats:

The 16-bit color format is similar to ABGR1555, and has the following layout:

PSX Color Format
Bit Meaning
0...4 LSB Red Component
5...9 Green Component
10...14 Blue Component
15 MSB Transparency Control Bit

The transparency control bit is not strictly an alpha component, and its meaning depends on certain settings not (yet) documented here.

Playstation Audio

The Playstation utilizes 3 different audio formats. Of these, only the format used by the SPU is relevant. This format is 4-bit ADPCM or is it? I don't know much about ADPCM.

In this format, audio samples are organized into 16-byte blocks. The first byte is related to decompressing the audio samples contained within the block. The second byte is a set of 8 flags used to control the SPU. The remaining 14 bytes are 28 4-bit ADPCM samples. TBD

Check the links at the bottom of the page for more details.

The command flags are:

PSX ADPCM Command Flags
Bit Meaning
0 LSB Loop end
1 Unknown IRQ-related?
2 Loop start
3 Unknown Unused?
4 Unknown Unused?
5 Unknown Unused?
6 Unknown Unused?
7 MSB Unknown Unused?

SEQ

SEQ files are the playstation analogue of MIDI files.

The format of SEQ files is:

SEQ Format
Type Value
Total Size Variable Length
This format is big-endian.
Magic Number Byte[4] ASCII "pQES"
Version Int32 1
Resolution Int16
Tempo Int24 Yes, 3 bytes
Rhythm Int16
Score Data Byte[...]

Resolution is equivalent to the "time division" found in a Standard MIDI File header.
Tempo is equivalent to one found in a Tempo MIDI meta event.
Rhythm is the first 2 parameters of the Time Signature MIDI meta event.
Score Data is the actual music data. It is the same format as the score data in a Standard MIDI File.

SEP

SEP files are a format similar to SEQ files, but with multiple scores (or tracks). That is, while SEQ files contain only a single track, SEP files can contain multiple tracks.

SEP files start with a header like so:

SEP Header Format
Type Value
Total Size 6 Bytes
This format is big-endian.
Magic Number Byte[4] ASCII "pQES"
Version Int16 0

Followed by any number of the following structure:

SEP Track Format
Type Value
Total Size Variable Length
This format is big-endian.
Track Number Int16 Zero for the first track, one for the second, etc
Resolution Int16
Tempo Int24 Yes, 3 bytes
Rhythm Int16
Score Length Int32
Score Data Byte[Score Length]

This format has some subtle but important differences from SEQ:

Also note that the SEP header does not indicate the number of tracks in the file.

VAB

VH VB VH Header (Program Count = 3) (Tone Count = 5) (Wave Count = 4) Program 0 (Tone Count = 3) Program 1 (Tone Count = 1) Program 2 (Tone Count = 1) Program 3 (Unused) Program 127 (Unused) Program 0 Tone 0 Program 0 Tone 1 Program 0 Tone 2 Program 0 Tone 3 Program 0 Tone 15 Program 1 Tone 0 Program 1 Tone 1 Program 1 Tone 15 Program 2 Tone 0 Program 2 Tone 1 Program 2 Tone 15 Waveform Size Table Wave 1 Wave 2 Wave 3 Wave 4

VAB files are the playstation analogue of DLS files. They contain the audio data and configuration for instruments used to play back SEQ files. A VAB "program" is similar to a DLS "melodic" instrument, and a VAB "tone" is similar to a DLS "region". VAB has no equivalent to a "drum kit", as channel 10 is not given any special treatment by the SEQ player.

A VAB file can contain a maximum of 254 waveforms, 128 programs, and 16 tones per program.

A VAB file is simply a VH file and a VB file concatenated together. Converting VH+VB to VAB is as simple as:

$ cat foo.vh foo.vb > foo.vab

or:

C:\>copy foo.vh+foo.vb foo.vab

VH

A VH file is the "header" portion of a VAB file.

The VH format starts with the following structure:

VH Header Format
Type Value
Total Size 32 Bytes
Magic Number Byte[4] ASCII "pBAV"
Version Int32 7
ID Int32 0
Size Int32 The size of the entire VAB file
Reserved UInt16 0xEEEE
Program Count Int16
Tone Count Int16
Wave Count Int16
Volume Int8
Panning Int8
Attribute 1 Int8
Attribute 2 Int8
Reserved Int32 -1

This is then followed by 128 of the following structure, each of which defines a program (equivalent to a DLS instrument):

VH Program Format
Type Value
Total Size 16 Bytes
Tone Count Int8
Volume Int8
Priority Int8
Mode Int8
Panning Int8
Reserved Int8 -1
Attribute Int16
Reserved Int32 -1
Reserved Int32 -1

Only the first Program Count of the program definitions describe existing programs. The remaining program definitions assume default values.

After this exists Program Count * 16 of the following structure, each of which defines a tone (equivalent to a DLS region):

VH Tone Format
Type Value
Total Size 32 Bytes
Priority Int8
Mode Int8
Volume Int8
Panning Int8
Root Note Int8 The MIDI note number for the associated waveform when played at 44100 Hz
Pitch Shift Int8
Minimum Note Int8 The lowest MIDI note that plays this tone
Maximum Note Int8 The highest MIDI note that plays this tone
Vibrato Width Int8
Vibrato Time Int8
Portamento Width Int8
Portamento Time Int8
Minimum Pitch Bend Int8
Maximum Pitch Bend Int8
Reserved UInt8 0xB1
Reserved UInt8 0xB2
ADSR Byte[4]
Program Int16 The zero-based index for the associated program
Wave Int16 The one-based index for the associated waveform Not zero-based
Reserved UInt16 0xC0
Reserved UInt16 0xC1
Reserved UInt16 0xC2
Reserved UInt16 0xC3

Only the first Tone Count (from the associated program definition, not from the main header) of each set of 16 tone definitions describe existing tones. The remaining tone definitions assume default values. Note that Tone Count in the main header does not include the unused tones. That is, Tone Count is not necessarily equal to Program Count * 16. Also note that while programs and tones have zero-based indexes, wave indexes are one-based.

After the tone definitions is a set of 256 UInt16's. The first value and the last value are each reserved and should be set to zero. The inner 254 values define the length in 8-byte blocks (not in bytes) of each waveform as it appears in the VB file. Each wave starts immediately after the one preceding it.

VB

A VB file is the "body" portion of a VAB file.

VB files contain only PSX audio.

Appendix B: Game Files

TBD

Crash 1

TBD

Crash 2

For PAL, there are multiple copies of the warp room; one for each supported language.

Crash 3

TBD


Additional Notes

PAL seems to have different warp room NSFs for the different languages. They're not just separate speech entries; the entire warproom appears in each file.

Chadderz says the chunk checksum is very poor at detecting errors compared to algorithms such as CRC32. I believe the checksum was not introduced to detect such errors, but rather as a disincentive towards modders. It is only in effect on PAL versions of the games, never on the NTSC versions. Maybe SCEE demanded an anti-modding system be implemented, and the checksum was naughty dog's answer?

On second thought, the checksum may have been a quick debugging-related hack. The code for checksum calculation appears in NTSC games (at least for crash 1), but is never executed. Maybe the PAL versions are actually in some kind of semi-debug mode?

The NSF chunk decompression routine for NTSC-U Crash 1 starts at 0x80013970.

T15 entries for crash 2 and 3 appear in these level types: water (C2), boulder (C2), bee (C2), tomb (C3), dino (C3). Seems to be collision related. Messing with them in tomb levels leads to some walls becoming nonsolid. Deleting them is an instacrash in boulder and water levels, but doesn't crash in bee levels until you go underground (at which point you get a real deal freeze that even stops the music, a very rare occurrence, I've never seen it outside of this situation).

There is no music heard in the crash 1 "prototype". However, some of the NSFs do contain music. 0C and 19 both have what appears to be a pre-retail version of the Jungle Rollers theme. 01 and 09 contain another theme which appears to be "Dance of the Sugar-Plum Fairy" from "The Nutcracker", most likely as a test of the music playback system. This theme is split into two separate SEQs in the SEP file, which should(?) be played in parallel. Very unusual, as typically an entire score will be held within a single SEQ, and the other SEQ will be a separate theme, usually three-mask invul music. Why does the music never play in-game? Is there code to reset the music volume to zero when starting the game? Need to investigate with an emulator and cheat engine or similar. On another note, while many of the level files have no music, most(?) of them have wavebank bodies (no headers without music, though), so they had the instruments but not the music?

Where are the damned audio sample rates stored? Please let it not be directly in the code.

Sound entries don't need to be aligned as strictly for most (all?) emulators. Instead, the only requirement is that the sound data needs to be 8-byte aligned. Perhaps this is a consequence of the SPU addressing memory in 8-byte blocks instead of single bytes.

What do NSD and NSF stand for?

Models:

Level Models:

Changing the byte in NTSC-U Road to Ruin NSF at 0x104DDE (held within a T11 entry) makes mask boxes give lives instead? Happened when incremented by one, but seemed to happen when changed to other values as well.

T7 second item is collision data? [xoffset,yoffset,zoffset?,collision-xoffset,collision-yoffset,collision-zoffset?,???,???,...]

Retail NTSC-U Crash 1 loads entry types at 0x80013034 and calls a function based on it.

Temporary table of entry type names until I find a place to put it:

Crash 1 Crash 2 Crash 3
T0 Entries NONE
T1 Entries SVTX
T2 Entries TGEO
T3 Entries WGEO
T4 Entries SLST
T5 Entries TPAG
T6 Entries LDAT
T7 Entries ZDAT
T8 Entries CPAT
T9 Entries BINF
T10 Entries OPAT
T11 Entries GOOL
T12 Entries ADIO
T13 Entries MIDI
T14 Entries INST
T15 Entries IMAG VCOL
T16 Entries LINK
T17 Entries MDAT RAWD
T18 Entries IPAL
T19 Entries PBAK
T20 Entries CVTX SDIO
T21 Entries VIDO

The VAB section is incorrect. There may be unused programs between the valid programs. See The High Road Crash 1 NTSC-U Retail NSF.

Something involving the T11 (GOOL) third item at NTSC-U Retail Crash 1 0x80015A98. EDIT: Whoa man, nevermind. This is some kind of code for turning EIDs into pointers to NSD links between CIDs and EIDs. Also maybe something that loads the associated chunk into memory.

There's a bunch of entry ID's near the end of the NSD file. They're references to T11 entries. The "type" field in an entity is actually an index into this table!

Neither GOOL or GOAL had runtime GC. GOAL had a kind of runtime compaction. Most datastructures (including processes) were fully relocatable and the compactor would bubble up the empty spaces to compact them.

GOOL didn't have true runtime symbols. GOAL did, but the symbol table was global without name spaces. Chunks of symbols were loaded and unloaded with levels however.

What if entry types aren't actually entry types, but rather identifiers for the in-game subsystem the data is associated with. For example, the GOOL subsystem might be the subsystem #11 in the game, and T11 entries are just data associated with the GOOL subsystem. This would explain the gaps in entry types (10 is OPAT, but there are no T10 entries despite there being OPAT-related code in the game, perhaps OPAT has no need for NSF-stored data). This would also explain why speech and sound effects have different entry types (different subsystems for sfx and speech).

Entity field flags:

Entity Field Flags
Bit Meaning
0...4 LSB Unknown
5 Unknown
6 Unknown
7 MSB Set only if this field is the last field in the entity

T11 (GOOL): (these are also referenced in a table in the NSD file, indexed with the entity type)

Entity type determines what GOOL entry should be used. Is the subtype actually an "initial state" setting?

T11 second item contains legit pieces of code. From the box code in Snow Go:

LW $V0,-4($S6) ORI $V1,$Zero,3686 ADDU $A0,$V1,$V0 ORI $V0,$Zero,58 MULT $A0,$V0 MFLO $A1 SRA $A0,$A1,8 SW $A0,-4($S6)

Crash engine uses psx 1K "scratch pad" at 0x1F800000 as a lookup table for GOOL? See 0x8003A014 in NTSC-U Crash 2 retail. Value at 0x1F80005C is 0x1F800060? This table appears to be copied from 0x8005C514. Also it seems 0x49BE0BE0 marks the beginning of machine code (native MIPS), which must be terminated by the byte string 09 A8 E0 03 00 00 00 00, which is:

JALR $S5,$RA NOP

(cont.) This is because the handler for opcode 49 is like so:

JALR $S5 ; ...

(cont.) Where $S5 is the GOOL interpreter's instruction pointer. Thus, the GOOL interpreter will resume execution of the bytecode starting immediately after the MIPS NOP.

GOOL Instruction Layout
Bit Meaning
0...11 LSB Operand A
12...23 Operand B
24...31 MSB Operator

79 GOOL opcodes (with unofficial names. table is based off NTSC-U retail Crash 2):

GOOL Opcode Table
# Name Pseudocode
00 ADD SS PUSH(A + B);
01 SUBTRACT SS PUSH(A - B);
02 MULTIPLY SS PUSH(A * B);
03 DIVIDE SS PUSH(A / B);
04 CHECK EQUAL TO SS PUSH(A == B);
05 LOGICAL AND SS PUSH(B && A); // Push A if B else zero
06 LOGICAL OR (IMPLEMENTED AS BITWISE) SS PUSH(A | B);
07 BITWISE AND SS PUSH(A & B);
08 BITWISE OR SS [ALIAS FOR 06]
09 CHECK LESS THAN SS PUSH(A < B);
0A CHECK LESS THAN OR EQUAL TO SS PUSH(A <= B);
0B CHECK GREATER THAN SS PUSH(A > B);
0C CHECK GREATER THAN OR EQUAL TO SS PUSH(A >= B);
0D MODULO SS PUSH(A % B);
0E BITWISE EXCLUSIVE OR SS PUSH(A ^ B);
0F BITWISE INVERSE AND SS PUSH(A & ~B);
10 GENERATE RANDOM NUMBER SS PUSH(RANDOM(A,B));
11 MOVE DS *A = B;
12 LOGICAL NOT DS *A = !B;
13
14
15 ARITHMETIC BITSHIFT SS PUSH((B >= 0) ? (A << B) : (A >>> -B));
16
17 BITWISE NOT DS *A = ~B;
18
19 ABSOLUTE VALUE DS *A = (B >= 0) ? B : -B;
1A
1B
1C [GUESS: YIELD]
1D
1E
1F
20 [GUESS: STORE GLOBAL]
21 [GUESS: SUBTRACT FRACTIONAL PART]
22
23
24
25
26
27 DS [GUESS: ACCESS SIXTH ITEM]
28
29
2A SS [GUESS: LOAD WORD FROM ARRAY]
2B
2C
2D
2E
2F NOP --
30
31
32
33 BRANCH IF NOT ZERO BR
34 BRANCH IF ZERO BR
35
36
37
38
39
3A
3B [GUESS: BRANCH OR CALL]
3C
3D
3E [ALIAS FOR 3D]
3F
40
41
42
43
44
45
46 [ALIAS FOR 3F]
47
48
49 BEGIN NATIVE CODE --
4A
4B
4C
4D
4E SS [GUESS: STORE WORD INTO ARRAY]

GOOL source operand format:

GOOL destination operand format:

Stuff: (C2 only, C1 probably has different offsets)

On the topic of the Crash 2 GOOL RNG:

chadderz121: So I think, it's essentially a shift register doing this: (B0,B1,B2,B3,B4,B5,B6,B7) = (B7+B2+B5,B0,B1,B2,B3,B4,B5,B6)
» Where B0-B3 is A1 and B4-B7 is A2
» *A0 not A2
C1 GOOL Events
Meaning
00 HIT FROM THE TOP?
01
02 ADD FRUIT TO FRUIT COUNTER? (ALSO ON LEVEL COMPLETION SCREEN WHEN BOX BREAKS ON CRASH'S HEAD)
03 INFLICT DAMAGE (TO CRASH?) (ALSO HAPPENED WHEN COLLECTED A LIFE)
04 SPIN AGAINST x
05
06
07
08 TRIGGER LIKE EXCLAMATION MARK BOX (ALSO BONUS EXIT? MAYBE THIS IS A GENERIC TRIGGER EVENT)
09 FALL INTO HOLE (and die)
0A
0B
0C READ SOMETHING FROM MEMORY CARD? ALSO WHEN SPINNING AT END OF IDLE ANIMATION?
0D
0E BOX FALL ONTO BOX
0F DO IDLE ANIMATION (ALSO DO BOUNCE-ON-SOMETHING ANIMATION?)
10 KILL ENEMY (DOESN'T REALLY KILL TURTLE)
11 SOMETHING RELATED TO TAWNA IN BONUS ROUND? COLLECT LIFE? (SENT TO CRASH)
12
13
14
15 BOUNCE ON ARROW BOX? BOUNCE ON TURTLE SHELL?
16 EXIT PORTAL?
17 HIT FROM THE BOTTOM (ALSO ON COMPLETION SCREEN WHEN BOX SMASHES ON CRASH'S HEAD)
18
19 GET CRUSHED BY ROLLER?
1A LOAD NEW CAMERA OBJECT?
1B
1C DO DEATH ANIMATION? (ALSO GETTING SWALLOWED BY A RIVER PLANT?)
1D
1E PASSWORD-RELATED?
1F
20 COLLECT FRUIT?
21
22
23
24 CHAIN KILL?
25 GET CRUSHED BY BOULDER?
26
27 ENTER BONUS ROUND?
28
29 MANY OF THIS ARE RAISED WHEN GOING TO LEVEL COMPLETION SCREEN
2A COLLECT FRUIT? ALSO COLLECT LIFE?
2B TOUCH SOMETHING THAT HURTS YOU?
2C
2D