28 July 2008

Memory leaks aren't always the process's fault

See also: How to find components causing memory leaks, a follow-up to this post.

Shell extensions, video codecs and the like can be a real pain. On one hand, they are good because they let people hook their favourite programs into their file manager with minimal effort. On the other hand, any crashes or memory leaks caused by extensions are blamed on the file manager because they are in-process.

Almost all shell extensions and video codecs are in-process DLLs. By definition, these DLLs contain code which is loaded into the parent process, like DOpus.exe or Explorer.exe. If that 3rd-party code then allocates a lot of memory or crashes the program then Task Manager or the crash dialog will point to the file manager's process, not the extension's DLL. You have to investigate to find out what is really to blame.

Here's a good example.

Yesterday I was testing context menus in Opus as there have been some small code changes in that area. I wanted to check for memory leaks, using a folder with a large number of items, so I went to System32, selected everything and right-clicked while watching in Task Manager.

I was alarmed to see the DOpus.exe memory usage shoot up from 20meg to 1gig in the space of a few seconds! Even worse, the memory remained allocated after the menu was closed. Houston, we have a problem here...

My Opus configuration has most 3rd party items hidden from the main context menu. When I right-click I just see items for some internal Opus commands, one or two 3rd-party tools I use a lot (e.g. Tortoise SVN) and a sub-menu where I have moved everything else to reduce clutter. The huge memory allocation only happened when I opened that sub-menu so I knew it was probably related to context menu extensions. Due to the recent code change I still assumed Opus was to blame at this stage but I needed to narrow things down.

Context menu extensions can ask for the list of selected files in several different formats and it's up to the file manager (Explorer, Opus, etc.) to translate its internal list into the formats each extension asks for. I figured there might be a bug in the conversion code for a particular format which only a few extensions requested. That would explain why I didn't see the problem with my top-level context menu but did when I opened the sub-menu containing most of the extensions.

To test this theory I decided to check whether the memory allocation happened when one or two particular extensions were initialised or bit-by-bit for every extension. I enabled the context_menu_debug advanced option in Opus (described in the Crash/exit/100%CPU when right-clicking certain files FAQ under 2. Finding the Culprit) and loaded DebugView. Now I could watch the memory usage change in Task Manager while each extension's name was printed as it was initialised.

It was easy to see that the huge memory allocation happened when just one of the extensions, Acronis True Image Shell Context Menu Extension, was initialised. That seemed odd. It was possible that Opus had a bug producing data in a format which only this one extension requested but, on probability, I was starting to think the problem was the extension rather than Opus. I temporarily disabled the extension by adding its GUID (the long-number-in-curly-brackets that appears before the extension's name in DebugView) to the ignore_context_menus advanced option in Opus and, sure enough, the problem went away.

This raises another point: You can't always see which extensions are involved when a problem occurs. There are usually no items on the context menu from the Acronis True Image context menu extension. It's invisible unless you right-click the right type of file yet it's always called whenever a context menu is displayed. The Acronis extension, like many others, is added to the * file-type which means that the file manager always passes it the list of files whenever you right-click something (unless you've hidden 3rd party context menu items). Given the list of files, the extension returns a list of menu items to add. That list can be empty and often is. Unless you switch on context menu debugging you have no way to see the complete list of 3rd party code which may be slowing things down, leaking memory or crashing the process.

It was only at this point I did what I probably should have done right at the start: Test the same thing in Explorer. If both Explorer and Opus experience a huge memory leak while initialising a context menu extension then we can safely blame the extension. Sure enough, that's what I saw:

The jump in the memory usage graph is the point where I right-clicked and you can see that the memory stays allocated for good:

Phew, it's not Opus's fault, then.

The even better news is that the problem appears to be fixed in True Image 11.8101. I had been using the previous version, 11.8053, until I discovered this problem (which doesn't seem to be mentioned in the list of changes but was fixed by updating).