Bug of the Month -- June 2012

We identified in June a bug of the worst kind: the bug introduced during requirements gathering and elicitation, which lead ultimately to building the wrong thing. If you go back to my December post, you'll see how the lab busted their butts getting Transaction Tracking working inside DevPartner Studio. We knew it was risky work, so review again the backdrop for the effort we were about to undertake:

"This feature work, which we are calling Transaction Analysis when it goes generally available in a DevPartner Studio release later this year, represented a steep schedule and technical risk. The request mirrored the existing Entry Point Transaction Tracking already shipping in DevPartner Java Edition, yet DevPartner Studio had none of the basis for entry points or transactions readily available, so we couldn’t just steal code. Instead, due to the risks, we followed a formal methodology including requirements capture, requirements elicitation, preparation of primary and alternative implementation strategies, and writing a functional specification detailing implementation work items, testable checkpoint plateaus, and potential future enhancements."

So to minimize risk and ensure just enough of the right capability needed was built out, we went through fast but thorough design and analysis phase with a business analyst, two or three tight iterations of agile implementation and verification, and delivered the desired capability on time, under duress. Not only that, we thought we exceeded expectations. The DevPartner Java implementation of transaction tracking always suffered from the limitation that you couldn't just click on a package to setup a filter. You actually had to type out the "entry point" (i.e. fully qualified class and method), not have any typos, and usually through trial and error rerunning the target app get it ironed out.  While it has a handy wildcarding feature, the setup was always perceived as quite spartan and required a high burden from the end user to configure properly. The original DPJ developer's design notes from his initial implementation of transaction tracking showed he wanted to add point-and-click method selection in a later development phase, but alas such a requirement in DevPartner Java never surfaced again.  

In the DevPartner Studio implementation this time around, we saw we had the opportunity to add point-and-click logic to setting up the transaction. We leveraged that DevPartner Studio's session control rules already let you discover the DLLs that make up your application, look into their public interfaces for methods and functions, and pick just what you want. The implementation appeared very clean, leveraged existing UI and infrastructure, and met the aspects of transaction filtering and lighter weight overhead, since only the transaction and its call graph did any extra profiling instructions. All other code would run essentially at full speed. The implementation went so slick, we were able not only to do the required .NET methods, but we could even do native C++ with compile time instrumentation. We demonstrated and tested the heck out of the new capability on test applications in C#, VB, and C++, on 32-bit and 64-bit Windows, and even saw decent performance overhead reduction per the design basis. The lab congratulated each other for pulling off a miraculous coding challenge, we shipped a build to the sponsoring customer, our regional sales engineers showed the customer who accepted it, and we shipped the new feature live to all customers at the end of March in our DevPartner Studio 10.6 GA release.

But hold on. We missed two critical aspects that fell out of the initial requirement statement. If you look closely, you'll see the request was to mirror the existing Entry Point Transaction Tracking already shipping in DevPartner Java Edition. We missed two things about mirroring the existing capability that either did not get raised with sufficient priority during elicitation, or got steamrolled in our zeal once we locked in the laser focus on our final design. The first miss was that in the handoffs from customer to business analyst to designer to coder, we dropped off the fact that the customer "liked typing in the transaction." They didn't want to point and click. In fact, typing in the transaction entry point, and using the limited wildcarding, was in fact exactly what they liked most.

The first miss could be dismissed because, indeed, we were trying to shoehorn a new feature into existing code that did not have the notion of entry points to begin with. The other miss however is much more significant and more sinister. When the decision whether to handle .NET or native C++ came up, we stretched and handled both. Well done, chaps, right? While it's true that the greater DevPartner user base will appreciate transaction tracking in all of managed, native, and mixed runtime targets, the sponsoring customer uses a very specific subset of .NET, that being ASP.NET, and even more tightly, a set of solutions based off the Web Site project type in Visual Studio. This specific omission lead to the crux of the current bug: you cannot actually select a ASP.NET assembly as your DLL and pick out a specific public method as the head of the transaction. Why not? The ASP compiler rewrites the internal AssemblyID GUID and randomizes the DLL file name between builds. It does this very intentionally, to allow hot deploy of new versions of compiled objects within the running web site. That fact that picking a specific method in a fully qualified module filename leads to it being wrong on the very next compile and run meant that most of the new transaction tracking feature was rendered approximately useless! A wildcard might have helped a bit, but truly the concepts of entry points were needed, and we really should be exposing namespaces akin to Java packages to pull off the wildcard filter, rather than DLL name.

The fact that DevPartner Java was used by the customer on Java-based web site applications and that DevPartner Studio would be used on ASP web site applications was overlooked while creating a proper user acceptance test case. We did think about it at one point, because looking back I found this unanswered question in our test plan: "Does technique work with IIS based ASP.NET web apps?" Answering that question would have caught this early on, and raised its relatively priority dramatically back in elicitation and design phase. In truth, all is not lost, even though the sponsoring customer is a bit peeved with us. The transaction tracking feature is still very slick with its "start disabled" mode, and it does work like a champ across other .NET and native application architectures. It just might take another crash development effort to get web sites working just as smoothly. Sticking it out with our skiing analogy, the project was easily a double black diamond: we made it down over mogul fields, through the trees, over some ice, and back down to base, only to have on the gondola ride back up a wayward helicopter clip the tow cable with a rotor blade (the team watched http://www.youtube.com/watch?v=v5aMT9MBfZI this week, hence the analogy.) Hopefully the customer won't rescind our lift tickets before we sharpen and wax up our boards and take aim at another run.