[Fast3D] Improving the way we handle textures in custom levels
Over the past 2 and a half years, the SM64 hacking community has made significant progress on getting ROM hacks to work on the original N64 console. macN64 has even created a guide on the process to fix textures on some older hacks to get them to work on console, which is fantastic. You can see this thread on the current progress on console compatibility. 

A major problem we are having now is the unplayable amount of lag that larger levels cause on console. There are many approaches to try and reduce lag, and one approach that I'm going to introduce in this post is a way to optimize fast3d texture processing. I will make a simple level with 2 (32x32) textures as an example.

[Image: rAqCJQy.png?3][Image: pB07ZZF.png][Image: xO5ingN.png]

I first draw the grass, then the orange rock wall, and then finally I draw the top part with the grass texture. Once imported, the fast3d commands for loading the textures will look something like the image diagram below.

(TMEM = Texture memory cache)
[Image: ErNNabd.png]

As you can see each texture is loaded up and bonded to the following triangles that will be drawn. However, looking at the image above, we see that the game has to switch from Tex1 to Tex2 and then back to Tex1. Why does it do this when we know that all of the grass triangles are going to be drawn anyways? Also, why are we not taking full advantage of TMEM? Every time we want to switch textures, we always just load it up from RAM. This is just a small example level, but just imagine a huge level from Last Impact or Star Road. The game would have to keep loading textures into TMEM from RAM tens to hundreds of times per frame!

My hypothesis

I believe that a large number of 0xF3 (G_LOADBLOCK) commands can cause a large level to lag, because of a delay between loading from RDRAM to TMEM on console. If we reduce the number of times the game has to load textures, then performance should improve to a semi-playable level. At the very least we can reduce the number of Fast3D commands the game has to process, which will be a definite benefit.

Does this really matter?

On console, I believe the answer is yes. See the two paragraphs above.

On emulators, the answer is no. The RAM on your modern machine is monumentally faster than anything from a game console from 1996, so moving data from a emulated N64 RAM to a emulated N64 TMEM cache is basically just copying memory around inside your computer.

2 Solutions

Group by texture

The way levels are rendered is dependent on how you draw them in sketchup. The first triangles you draw usually get rendered first (assuming your textures are fully opague). In the simple level I made above I drew the floor first, so that is where the game starts rendering with. Then it switches over to the wall texture, because I started drawing the walls after the floor. The game then has to load up the first texture again, because I am using the grass texture for the top part. That is the reason why the game has to load up the grass texture twice.

If we want to minimize the number of times the textures have to switch, then we need to group all the triangles that use the same texture together. This way we only have to load up the texture once, and then it can draw all the triangles that go with that texture. I found a free sketchup plugin that can do this easily.  It's called GroupByTexture  and was created by Rick Wilson. You can find a download for this plugin here: http://www.smustard.com/script/GroupByTexture

This plugin will explode all the groups in the model, and then regroup all the faces according to their texture. This will cause the exported .obj file to be organized properly according to the textures, and not draw order. I would only recommend using this plugin once your level is finalized, so you don't have to keep ungrouping all the faces you want to change.

Multi-Texture loading

Remember the unused 2KB of data in the TMEM? Well since there is enough room, why not use all of it? All we would have to do is change the number of texels (textured pixels) that are loaded with the 0xF3 command to account for both textures. Think of it like loading up a 32x64 texture, but we are only going to see half of the texture at a time. If we need to switch to the other texture, then we can call a 0xF5 command to change the TMEM offset to read from.


Using both the Group by texture and Multi-Texture loading techniques, we can reduce the amount of effort it takes to setup textures. Compare the diagram below to the previous one above. We went from 3 load texture block commands down to just 1, and the number of fast3D commands has also significantly been reduced from 21 commands down to just 8.

[Image: XrzXQ6i.png]

So we can only load two (32x32) textures at a time? Big deal.

It's true that RGBA textures are expensive in terms of data size, but think about the 4-bit & 8-bit texture formats. With the 8-bit texture formats, I8 and IA8, you can load up 4 (32x32) textures at a time. If you can somehow get away with using 4-bit textures like I4 and IA4, then you can load up 8 (32x32) textures at a time. CI textures are a little different as half of the TMEM is reserved for the color palettes, so only 2 (32x32) CI8 textures and only 4 (32x32) CI4 textures can fit inside the TMEM.

The actual number of textures you can load vary based on the resolution of the image and the bit depth. As long as the data doesn't exceed 4096 bytes, you can load it to the TMEM.

Testing on console hardware. (Does this actually work?)

Yes, but more testing is needed. I did a quick test on my flash cartridge before writing this post, and the optimized Fast3D code above does seem to work. I cannot tell you how this affects performance yet, so take everything I say with a grain of salt.

If you have any questions or updates on the post, then please leave a reply below and I'll try to respond as soon as I can.
A very good read! Thank you for sharing it with us! So if I understand correctly, the game loads every texture one after the other. Do you know if that also happens often in the original SM64 maps? I'm thinking no, because Nintendo tried to optimize the maps as much as possible, but it might not be the case.
David liked this post
(04-22-2018, 04:54 PM)MelonSpeedruns Wrote: A very good read! Thank you for sharing it with us! So if I understand correctly, the game loads every texture one after the other. Do you know if that also happens often in the original SM64 maps? I'm thinking no, because Nintendo tried to optimize the maps as much as possible, but it might not be the case.

As far as I know, vanilla SM64 only loads up one texture at a time. I wouldn't say the original SM64 levels are unoptimized though, since they draw more triangles with every G_VTX command compared to custom levels.

The main point of this post is that one texture can get loaded multiple times, depending on how you draw the level in sketchup (I'm not sure how blender handles it). This usually causes unnecessary lag, so having a plugin like GroupByTexture should help out with that making a texture load only one time.

Users browsing this thread: 1 Guest(s)