Created On:  14 June 2012

Problem:

When running a large Visual COBOL managed code application, a .NET exception of type Stack Overflow occurs.  What is the cause of this exception and how can I diagnose a solution to it?

Resolution:

Stack Overflow Exceptions in COBOL for .NET

Overview

                A stack overflow exception is thrown when an application consumes too much stack space. In .NET, an exception of type System.StackOverflowException will be thrown when this occurs. Typically, this is caused by too many nested method calls or unchecked recursion. Usually a coding change can circumvent the issue. It’s worth understanding how this issue can occur, what impact COBOL compiled for .NET may have, and how to diagnose and resolve the problem.

Defining the Stack Size

                The initial allocation of the stack is defined by the main application. The default stack allocation of a .NET application is 1Mb and this can be modified by the user. In a .NET COBOL executable application the ILSTACKSIZE directive can be added to the build directives. For example, by adding ILSTACKSIZE(2097152) the application will be allocated 2Mb of stack space. The Microsoft utility ‘dumpbin’ can be used to determine the stack size of an already built application by using the command “dumpbin /headers [executable]” and looking for the value against the Stack Reserve entry. This value will be in hexadecimal; in this example using the directive ILSTACKSIZE(2097152) will show a stack reserve value of 200000 for the built application.
                Of course, COBOL may only be one piece of the application and may be compiled to be used as an external library. Other .NET languages have a similar mechanism for modifying the stack size, but what if the application is already built? One example of this is an ASP.NET application where some the user code may run under an ASP.NET worker process. Using the ‘dumpbin’ utility on the 32-bit ASP.NET worker process (w3wp.exe) shows that here the default stack reserve is only 256Kb. It’s worth noting that a 64-bit process may behave differently; the 64-bit ASP.NET worker process has a stack size of 512Kb but it’s not as simple as assuming that a 64-bit application might need twice the amount of stack space.
                The final detail to consider is the difference between code built for debug or release. A debug build is not subject to optimization in order to aid debugging. Optimization improves performance which, as a side-effect, will also improve stack usage. It is entirely possible that a stack overflow could occur in code compiled for debug, but not in code built for optimization.

Analyzing Stack Usage

            
                Irrespective of whether an actual stack overflow exception has occurred, it can be useful to examine stack usage of an application. There are a couple of techniques that can be applied. The first is to use an extension to the debugger, named ‘SOS’. If debugging a 32-bit application, this extension can be used from Visual Studio. For 64-bit, the WinDbg tool, which is part of the Debugging Tools for Windows, needs to be used instead.
                To use the SOS extension within Visual Studio you need to enable mixed-mode debugging (sometimes called unmanaged or Interop debugging). For a COBOL project, this option is available on the Debug property page. Once this is enabled, just debug your application as normal. At an appropriate breakpoint or even at the stack overflow exception, switch to the Visual Studio Immediate Window (Debug -> Windows -> Immediate).
Type the following:
.load sos
 
Visual Studio should respond with:
extension C:\Windows\Microsoft.NET\Framework\v4.0.30319\sos.dll loaded
 
 
 You can now view the managed stack, by typing:
!CLRStack
 
The stack addresses are in the left most column and the stack grows down rather than up, so to see how much space is being allocated for a given method call you need to take the value of a method lower in the call stack and subtract the value from the one above.
As an example, here’s a snapshot debugging the Windows Forms version of the Book demo that ships with Visual COBOL. The above steps have been executed and the top of the stack is displayed:


From here we can see that for the amount of stack used between these two frames is:
 0034ee5c - 0034ee10 = 4c or 76 bytes.
 
There are several elements that impact the amount of stack allocated, parameters are one, but the major impact is the amount of local data space that is allocated. This is governed by many things, not least of which is whether the code is compiled for optimizations or not. As well as using SOS, a simple look at the disassembly view can give some guidance. For example in the above example, if we switch to the disassembly view and scroll to the top of the method there will usually be some code that adjusts the stack pointer to allocate space for the method:

From here we can see that 34h or 52 bytes of local stack space has been allocated for use during this method call. Obviously, the amount of stack allocated will increase with the complexity of the method and the amount of local data items. Don’t be confused though by COBOL data automatically increasing the stack. A local data item such as
01 buffer pic x(10000).
is a reference type and not allocated on the stack, so is essentially an object pointer and will occupy the same amount of space as for example:
01 buffer pic x(10).
But what of COBOL data and COBOL procedural programs in general?

Procedural COBOL Programs


In order to run under .NET, COBOL programs are still compiled into .NET classes and objects. The difference here is because there is no explicit syntax for methods the compiler gets to choose how it generates the code. Obviously, as the size of COBOL programs increases, the size of an individual method may increase. The COBOL compiler will generate extra methods for specific COBOL constructs, such as PERFORM statements. That means that as you debug through a program that PERFORMs sections or paragraphs, if you look at the call stack you will see these being executed as methods. The structure of COBOL code can be such that under the covers calls might be recursive and this means that on each call the same amount of stack will be allocated again.

Resolving Stack Overflows

 

So, how best to resolve a stack overflow or minimize the chance of one occurring? The first approach is to modify the amount of available stack space. As noted earlier, a COBOL application can control the stack size using the ILSTACKSIZE directive. A Microsoft utility, ‘editbin’ can be used to modify the size of the stack for an already built executable, however this is not recommended.
A common technique in .NET that can also allow the stack space to be programmatically controlled is to create a worker thread which executes the stack intensive code. A parameter to the thread constructor allows the amount of stack space to be specified. This approach is especially useful when the code is running in another application, for example the ASP.NET case previously described. Here’s some example code that will create and execute a new thread in a synchronous manner:

Note, only fully trusted applications can specify a stack size greater than that specified in the executable’s header.

 

Of course, the above snippet makes the assumption that the application is not using excessive stack space in the first place, but what about reducing the amount required? Consider the following procedural COBOL code:

 


At first glance this looks to be reasonably well structured with isolated perform sections, but in reality there is an error which will not have an obvious impact on the execution of the code but will have a considerable impact on the code generated by the compiler. If we compile and debug through this, we’ll see a call stack something like this:

 


The call stack shows that the execution has become recursive and using the technique described earlier to calculate the stack size for this particular example gives a stack allocation of 1300 bytes. The issue is triggered by an ‘exit’ statement out of the perform range in ‘sect00‘. This may well have been triggered by a coding error; we can catch this by recompiling using the ‘RESTRICT-GOTO’ directive. This now produces an error:

 

error COBCH0348 : Procedure name EXIT01 OF SECT00 undeclared, line 13 (first usage)

 

The error is caused by an exit out of a perform section. The code in sect00 should have been  ‘go to exit00’. Now, if we change this, rebuild and step into the debugger we get:



Not only does the call stack become much more representative of the program, the stack allocation has also reduced. In the example above this was 676 bytes. What has essentially happened is that the compiler previously had to generate a single method for the performed section, whereas now it can produce much simpler, smarter code.

 

Other issues can be diagnosed by compiling the code with warnings or informational messages enabled. Examining these warnings can highlight other areas where code can be improved leading to better code generated by the compiler.

 

Conclusion

                Stack overflow exceptions can be problematic to handle and difficult to debug. Hopefully, the information provided in this article will give some help on how to diagnose the problems and some guidance on how to solve the underlying issues.

Incident #2571343