2010-05-31

Fencing - Problem Solved

4 days of brain bashing of a very complex error to track it down and slay it.

At first, the program would crash if run outside of debug mode.

"Okay, so it must be a memory overrun, and I'm writing past the end of the array, because the debugger puts padding for detection against clobbering memory"; which looks like this:

uchar myarray[ 10 ];
myarray[ 10 ] = 60; //WHOA, myarray holds 10 ELEMENTS, so position 10 is the 11th element.

Often called a fence, I couldn't find it. What was weirder, std::string was throwing errors. std::map was having kittens. and ntdll seemed to vomit all over the floor at this party from hell.

As I burrowed into the code, seething rage ignited; was I being burned by out of date compilers? Was my code wrong? What had I done? What had THEY done wrong?

This project had it all, nested complex templates, deep inheritance heirarchies, large blocks of data and nasty C algorithms. Luckly, most of my (good) code has __debug_regression defined, so I was able to quickly rule out very large segments of code against error with a single look back at passed test dates.

But my mind was going, Dave.

I ignored the fact that if my code worked in debugging, then only the heap allocation method would cause 0xc0000005 errors thanks to ntdll 's nice memory bounding protection (mainly protects it from 0xBAADFOOD but yeah)

So, I eventually upgraded my compiler, my debugger, brought in the help of my custom designed memory heap manager (igtl_MMHeapSystem; I love you, I LOVE YOU! OH GOD YOU'RE SO SEXY MMMMM*codepronz*) and tracked down the offending bug. Hours of poking and prodding landed me with a really surprisingly simple conclusion.

I was off by one.

I'd been accused of being off kilter, off base, and not even on this world, but one? Off by one causes these insanely weird errors? Why did the debugger not detect the stack corruption? Why did the heap manager skip telling me I was writing one integer past my array? Why did absolutely none of the tools find this till I made my own?

Here's why:

When my array was writing past the end of it's detection, it wrote (coincidentally) over the bad food word. But, in the running of the program, that error PROPOGATED until some time later, when the error detection actually scanned memory; because no one in their right mind scans every single memory allocation (super slowness). By the time a scan actually occured, it was too late and bizzaro values had multiplied. But only in a small well contained region. That is, inside of a system dll I don;t have debug points in. Ironically, since the error affected nothing else in MY code, it really screwed up ntdll, and cause it to throw weird errors all over the place.. So, my code was just missing a +1, but the stack was being blow to pieces thanks to trying to protect itself.

Moral of the story?

Make sure to take a nice drink if you run into a hard problem. Usually, it's just something stupid. I use High Gravity Steel Reserve.

-Z

Here's some preliminary results with a nifty 3 instruction toon shader I made: (<3 Mecha Dragon)

No comments: