2009-01-17

GL_ARB_vertex_program

As the title states, I've been hardcoring this useful tool in order to combat the 'Dancing Lizards' demo performance ratio of 98% CPU / 16 lizards. Now,

We have 35 x Asty's (Asty is just another fat lizard monster, with 800 verts and 1200 poly count. The original dancing lizard was 300 verts with 400 polys.).

The crazy note is, this is a 10% CPU / 35 fully animated characters, including cell shading, deformers, AND whatever else the hell I want to do. This is plenty of performance.
However, if you actually look at this ugly screenshot, you'll notice the statistics in the upper left seem to contradict me; be patient, the IFPS is what is important (~16) which are how many inter-frames there are (1 ms) between render frames. Since all my apps are capped to exactly 50 FPS internal, this means there are 16 ms in there that the game can run and poll for whatever, which is what it does.

Now for some nitty gritty. Let's say you have Asty as a creature in your game (he's quite a friendly fellow!) and you want to make him all sorts of animated, and you overkill the bone count (most of my models have ~120 or so bones, including fingers and IK stuff). Sadly, in the shader model, you cannot feasibly have more than ~28 bones in your shader program at once. (96 parameters available max) This means you have to preprocess groups of verticies that share bones (4 x shader counts, 1 for 1 matrix, 1 for 2 matrix, ect...) so you can actually do all the deformations. This means switching shader programs, which is costly. So, if you want a bunch of fodder enemies, you'll be needing to create some interesting optimization schemes to lower the bone count so you can avoid switching.

In this screenshot, each monster has every vertex with a GLubyte[4] for matrix local palette index (0 to 96, divide by 3) and GLfloat[4] for the weight. I intend to normalize this weight value so I can use bytes, as you generally don't need that accurate of a weigth float. Also, each vertex is renormalized so I can calculate the nifty cell shading value (sum normal dot eye normal = tex coord 0).

Let's look at some assembly:


PARAM K = {1, 0.5, 0, 3.141525968 };

#Skin with any affine matrix:
#PARAM Matrices[] = { program.local[12..84] } <= 28 matricies # [ Xx, Yx, Zx, x pos(1) ] # [ Xy, Yy, Zy, y pos(1) ] #Multiply each axis by it's scale to use scaling, but this will require renormalization for normals. # [ Xz, Yz, Zz, z pos(1) ] #This matrix is CPU computed from: # Let B be the current bone matrix (local to mesh, current pose) # Let M be the original bone matrix (local to mesh, default pose) # Then, W = (B * M^-1) # Send W to shader as matrix in correct form as above #Per vertex in a matrix deformed mesh, # vertex.attrib[6] store n indicies as unsigned bytes (n*1 bytes) # vertex.attrib[7] store n weights as floats (n*4 bytes) #? can we use byte weights? #Requires: RF, R, aMN; ( RF = sum, R = temp, addr = address ) ALIAS R = temp1; #Temp vector ALIAS RF = temp2; #Vector sum (always set to first matrix deform) ALIAS RFN = temp3; #Normal sum (always set to first matrix deform) #For matrix 1 (x)## ARL addr.x, vertex.attrib[6].x; #Get matrix array index DP4 R.x, Matrices[addr.x + 0], wPos; #Rotate & scale local vector (model position) DP4 R.y, Matrices[addr.x + 1], wPos; DP4 R.z, Matrices[addr.x + 2], wPos; MUL RF, R, vertex.attrib[7].x; #Multiply vector by weight, add to summated deformation vector DP3 R.x, Matrices[addr.x + 0], wNorm; #Rotate normal as needed (don't forget about scale...) DP3 R.y, Matrices[addr.x + 1], wNorm; DP3 R.z, Matrices[addr.x + 2], wNorm; MUL RFN, R, vertex.attrib[7].x; #Sum normal with weight as well ################## #For matrix 2 (y)## ARL addr.x, vertex.attrib[6].y; #Get matrix array index DP4 R.x, Matrices[addr.x + 0], wPos; #Rotate & scale local vector (model position) DP4 R.y, Matrices[addr.x + 1], wPos; DP4 R.z, Matrices[addr.x + 2], wPos; MAD RF, R, vertex.attrib[7].y, RF; #Multiply vector by weight, add to summated deformation vector DP3 R.x, Matrices[addr.x + 0], wNorm; #Rotate normal as needed DP3 R.y, Matrices[addr.x + 1], wNorm; DP3 R.z, Matrices[addr.x + 2], wNorm; MAD RFN, R, vertex.attrib[7].y, RFN; #Sum normal with weight as well ################## #For matrix 3 (z)## ARL addr.x, vertex.attrib[6].z; #Get matrix array index DP4 R.x, Matrices[addr.x + 0], wPos; #Rotate & scale local vector (model position) DP4 R.y, Matrices[addr.x + 1], wPos; DP4 R.z, Matrices[addr.x + 2], wPos; MAD RF, R, vertex.attrib[7].z, RF; #Multiply vector by weight, add to summated deformation vector DP3 R.x, Matrices[addr.x + 0], wNorm; #Rotate normal as needed DP3 R.y, Matrices[addr.x + 1], wNorm; DP3 R.z, Matrices[addr.x + 2], wNorm; MAD RFN, R, vertex.attrib[7].z, RFN; #Sum normal with weight as well ################## #For matrix 4 (w)## ARL addr.x, vertex.attrib[6].w; #Get matrix array index DP4 R.x, Matrices[addr.x + 0], wPos; #Rotate & scale local vector (model position) DP4 R.y, Matrices[addr.x + 1], wPos; DP4 R.z, Matrices[addr.x + 2], wPos; MAD RF, R, vertex.attrib[7].w, RF; #Multiply vector by weight, add to summated deformation vector DP3 R.x, Matrices[addr.x + 0], wNorm; #Rotate normal as needed DP3 R.y, Matrices[addr.x + 1], wNorm; DP3 R.z, Matrices[addr.x + 2], wNorm; MAD RFN, R, vertex.attrib[7].w, RFN; #Sum normal with weight as well ################### MOV wNorm, RFN; #Set final normal DP3 R, wNorm, wNorm; #Renormalize normal after deformations (extremely iffy) RSQ R, R.x; MUL wNorm, wNorm, R; MOV wPos.xyz, RF; #Set final position of deformation (could use non-normal weights too...) And that's all you need to make a skinned model in OpenGL using the common ARG vertex program extension. Naturally, people are idiots, and will ask 'why do this instead of GLSL?' and 'This is too old to be useful'. Obviously, if you can do it using the card's assembly language, it's a cinch to move to higher level languages. In fact, it's a incredibly good excercise in understanding not only SIMD instruction stuff, but also general matrix/vector processing units. Plus, this is damned fast and way more portable than GLSL. As some side notes, I have a lot of other nifty shader code, specifically for this things I find important, like: Normal Map colors Cast deformations (spherical) Push-Cast deformations (point -> sphere outward from point)
Water/Perlin noise wobbling
Texture-coordinate lighting
Multiple light cheats
More shading and lighting models

More to come once I give Asty a soul using my IK algorithms and rig him up proper with lotsa delicious bones. THEN we'll see how much CPU we get, and who knows, maybe we'll even get a game demo where you can SSBM other Asty's.

*Note, I also forgot this demo tests a sphere-triangle collision with EVERY loaded triangle. Thus, fixing that, the CPU usage was at MOST 1% for 25 Asty's. So hah. I can't wait to have 1000 lizards dancing. Eat it, GEICO!

-Z

No comments: