Using SoftICE as post mortem analysis tool

0 Likes

Problem:

I need to find out what happened to the system when it crashes. How can I do this using SoftICE?

Resolution:

SoftICE traps CPU-level unhandled exception. The most common exceptions are "Invalid Opcode", "Stack fault", "General Protection Fault", and "Page Fault".

When one of those exceptions occurs and it not handled by anybody, before the corresponding "unhandled exception handler" is called, SoftICE gives you a control for last chance analysis.

At this point, CS:EIP is at the instruction which caused the exception ( or in a case of GPF(Page Fault), it may be a linear address with no physical memory mapped to it).

Information that you can obtain at this point depends on the type of exception.

First thing you should learn about the nature of "exception" using books like "Intel Pentium Programmer's reference" because understanding of the exception and potential cause of it is essential for effective post motem analysis.

Things you should try:

See if CS:EIP is pointing to valid instruction. If not, it is likely to be caused by "wild jump" using bad function pointer. This will manifest as GPF in 16-bit code and Page Fault in 32-bit code. If CS:EIP is at valid code section, then study the assembly code (if symbol table for that module is loaded, it is displayed in source).  If you have a pentium pro or greater, the LastBranch Registers can give you some ide where the jump came from.  At times you may need to giggle them by putting a bpm on the eip, then Ctrl-Ding out.

Also use 'D' command to study various data variables.

Issue STACK command to see if stack is available. In some cases STACK does not produce any output due to the nature of exception. For instance, if local variable overrun occurs, it is possible that stack is corrupt which prevent SoftICE from walking the stack frames. If you see just row addresses, user MOD, QUERY, command to find out to which module those functions belong. If you find out who are involved, then next time you try to replicate it, you can pre-load symbol/export so that it will be shown with symbolic names.

When STACK command produces no stack, then it is likely that you jumped from your code to non-paged memory. This is difficult to track back because the link is broken. If you own PentiumPro processor, we automatically display "Jump From" and "Jump To" address using PentiumPro's model specific register. This way, you can obtain one level of jump. Unassemble the "Jump From" location to learn what was happening.

Tips for successful post mortem analysis are:

1. Understand the nature of exception

2. Narrow down the problem space as much as possible

3. Load exports/symbol information for those modules that are likely involved as much as possible (if system module, get .DBG files form OS CD-ROM)

4. Once code section is identified, load it with debug output to show trace message.

Old KB# 11735
Comment List
Anonymous
Related Discussions
Recommended