32-bit Debug symbols and how to remove them


Introduction

If you've previously read this article on Clarion debugger internals you'll already have some appreciation as to how the Clarion debugger works and where it gets its debug symbols from.

One of the things that developers may not realise is how useful debug information can be to a hacker/cracker if software is shipped to a customer with debug symbols included (accidentally or on purpose).  It may well be that the customer is experiencing some problems that are particularily difficult to find, so you ship some software that contains more information than your regular version in the hope it may help identify the problem.  

Other products, Microsoft VC++ for example, have the option of storing the debug information in a separate file. In VC++ it can be stored in an external .PDB file - a program database.  Clarion 16-bit applications store a lot of their debug information in external .DBD files. However, there is the potential for problems with 32-bit Clarion applications as they store the debug format internally in the application.  The format of the debug symbols is unfortunately undocumented, but even without knowing the format you can still see a lot of information, such as source file names, procedure names and variables. 

File format history

MS-DOS used both .COM and .EXE extension to indicate that a file was an executable application.  COM files are raw binary executables and as such are a direct copy of the application as it appears in memory. A .COM file can only have a size of less than one segment (64K) including code and static data, since there are no fixups for segment relocation.  An EXE file can span multiple segments, as it contains a header, a relocation table and binary code.  The first 2 bytes of a MS-DOS .EXE are normally 'MZ', which is why DOS programs are sometimes referred to as "MZ executables" or "having an MZ header".

With the introduction of Windows it was realised that the MZ file format was not good enough to hold the extra information that a Windows program needed, so Microsoft invented the "New Executable" (NE) format.  The NE format holds all of the information that Windows needs to run, but also includes a MZ DOS stub.  When you ran the Windows application outside of Windows, it was the DOS stub that displayed the message "This program cannot be run in DOS mode".  Some Windows applications at the time (I believe Microsoft Word 2.x was one) had a custom stub that ran Windows first, then ran the executable itself.  (For history buffs, version 3.0 of the the DOS JPI compilers were able to compile Windows applications that had a custom stub in them - more on that in a future article).

Even though Windows applications use a different file format for the executable (NE) than DOS (MZ), all the development tools of the day (MS C , JPI C++/Modula2, Borland Turbo Pascal etc) used the same basic format for .LIB and .OBJ files, called the Object Module Format. The OMF file format was invented by Intel and was used by all vendors, although each vendor had enough deviations from the standard that meant you couldn't swap OBJs and LIBs between them. Third-party software vendors typically produced multiple versions of their products, depending on whose development tool you were using.

All was fine in file-format land until the introduction of 32-bit Windows, when again the file format of the day (NE) was unable to contain the extra information that 32-bit Windows required. Drawing on it's UNIX and VMS experience the developers of Windows NT came up with another new format, the Portable Executable (PE) format. The PE format is is based on the Common Object File Format (COFF), and called "portable" because it is the same file format on all processors (x86, MIPS and Alpha) that run Windows. Although the instruction set and byte-ordering is different, the physical file on the disk uses the same format for all 3 architectures.  PE files also get the additional baggage of the same MZ MS-DOS stub that the NE files get, although these days one rarely gets to see their message.

Note also that with the imminent introduction of 64-bit Windows the PE file format has undergone another revision.  The "magic value" in the optional header determines whether the executable is for PE32 or PE32+. If it is the latter then many fields in the optional header become 8 bytes long instead of 4.

Microsoft also changed it's development tools so that they produced COFF .OBJs and .LIBs. Other vendor's (Borland, Clarion, Symantec) didn't.  It's possible to create a PE file from either format object file, but it helps explain why you typically can't use MS .LIBs in a Clarion application without running LibMaker first. (That and the fact that Clarion still uses it's extra comment records in it's own object files and doesn't support some of the extensions that are present in other object files).

TSWD debug information

One of the new concepts introduced in the PE file format is that of the section, documented thus:

"A section is the basic unit of code or data within a PE/COFF file. In an object file, for example, all code can be combined within a single section, or (depending on compiler behavior) each function can occupy its own section. With more sections, there is more file overhead, but the linker is able to link in code more selectively. A section is vaguely similar to a segment in Intel® 8086 architecture. All the raw data in a section must be loaded contiguously. In addition, an image file can contain a number of sections, such as .tls or .reloc, that have special purposes."

There are many different sections that an executable file can contain. Each section has an 8-character name and is normally associated with a specific function.  Typical sections include

Name Function
.arch Alpha architecture information
.bss Uninitialized data
.data Initialized data
.debug Debug information
.edata Export tables
.idata Import tables
.pdata Exception information
.rdata Read-only initialized data
.reloc Image relocations
.rsrc Resource directory
.text Executable code
.tls Thread-local storage
.xdata Exception information

As you can see from the above table there is a specific section name, .debug, that is specifically set aside for debug information. The Clarion linker (upto and including Clarion 5.5 beta 2) does not use it. Instead, it places the debug information in the .rdata section of the application.

An application can contain more than 1 type of debug information. Each type of debug information is referenced by it's own debug directory header.  Once you've located the debug directory header, whether it is in the .debug or .rdata sections, it's format is as follows:

Offset Size Field Description Clarion linker value:
0 4 Characteristics A reserved field intended to be used for flags, set to zero for now. 0
4 4 TimeDateStamp Time and date the debug data was created. 0
8 2 MajorVersion Major version number of the debug data format. 0
10 2 MinorVersion Minor version number of the debug data format. 0
12 4 Type Format of debugging information: this field enables support of multiple debuggers. TSWD
16 4 SizeOfData Size of the debug data (not including the debug directory itself). As required
20 4 AddressOfRawData Address of the debug data when loaded, relative to the image base. 0
24 4 PointerToRawData File pointer to the debug data. 0

Clarion uses only 1 type of debug information (it's own) and, as you can see, the Clarion linker uses very few of the fields in the debug directory header.

Specifically, it does not use the AddressOfRawData or PointerToRawData fields to locate the debug information; instead, the linker puts the debug information 512 bytes after the start of the debug directory header. In this screen shot of a hex dump of an application you can see the debug directory header start at offset 0xC00.  None of the fields of the debug directory header are used except the Type (0x54535744) and SizeOfData (0x00000164) fields. You can start the real debug information start at offset 0xE00 with another 0x54535744 signature. 

Included within the debug information are the names of all source modules for this application (dbg32.clw), the name of the main procedure (_main) and the names of any variables declared (LONGVAR and BYTEVAR). Because the TSWD debug information is currently undocumented we obviously don't know what the rest of data is.

Now, back to the original problem. Assume you have a multi-DLL application that is compiled using a compile manager utility of some sort.  You compile your application with debug information in, because you need the information to debug your application. 

If you want to compile your application without debug symbols  you have 2 choices at this point

  1. Open each .APP and turn off the generation of debug symbols. It's a time-consuming and tedious process which I don't recommend
  2. Keep a separate .PRJ for each .APP which has debug symbols turned off.  You can either create a project file like this  manually (if you know the project language syntax and which compiler pragma's to use), or by extracting the .PRJ from the .APP file using a hex editor

This is where a utility I wrote, WIPETSWD, becomes useful.  

Download WIPETSWD application

WIPETSWD is a 32-bit Clarion application that reads through the PE file format and looks for Clarion debug information. When it finds the TSWD signature it overwrites the SizeOfData field in the debug directory header with a zero, and overwrites all of the debug information (except the second TSWD identifier) with nulls.

If you then try to debug the application with the Clarion debugger it will appear as if there is no debug information present - the Clarion debugger will load the app but will be unable to load any symbol information or locate any source files.  The debugger will basically present you with 4 open, empty, windows.

You can run WIPETSWD on one application at a time, or against a whole directory. To clean an entire application suite you can run it twice, once against *.DLL and once for *.EXE. Use the /? or /HELP command line options to view the online help and see the valid command line options.

Obviously it only works as long as SoftVelocity do not change the symbol table format, something that may or may not happen in C6.  It has long been a desire of mine to see Clarion executables support a more standard form of symbol information, so that developers may use more effective debugging techniques against their applications. We may yet live to see the day!

The effects of debug information on a Clarion executable

Some of you may be wondering what effect the debug options have on a Clarion executable.  I know programmers who swear that adding symbol information to an application slows down the execution speed.  Nothing could be further from the truth - depending on the project settings when the application was compiled, the effects of the settings on the 'Debug' tab of the project editor range from 'not much' to 'none'. 

Project setting Respective compiler pragma Effect
Debug setting MIN or FULL

 

#pragma debug(vid=>min), 

#pragma debug(vid=>full)

Adds debug symbol information to the end of the application. Makes the application larger and slower to load but does not affect the execution speed of the application.
Generate line numbers #pragma debug(line_num=>on) The debugger adds line number information to the symbol information - the Clarion debugger does not need this information, but (in theory) it makes it easier to for 3rd party debuggers to debug a Clarion application.  Seeing as the symbol format used by Clarion applications is proprietary and you can't debug a Clarion application with any other debugger anyway this setting is pretty damn useless.
Stack overflow runtime check #pragma check(stack=>on)
Makes the run-time library execute additional code to make sure that the application doesn't run out of stack space
Array index runtime check #pragma check(index=>on)
Makes the run-time library execute additional code to check against an array index larger than the array size
Nil-pointer runtime check #pragma check(nil_ptr=>on)
Makes the run-time library execute additional code to make sure that the application doesn't reference nil pointers
 

Personally I would argue that all applications should always be compiled with the 3 runtime checks on. The debug setting is up to you to decide, and the line number setting is superfluous.

But remember - at the very worst a determined cracker with a good disassembler (or debugger), lots of assembley language documentation and an understanding of the PE file format can determine an awful lot of information about your application even if you do remove the debug symbols.


Back to the home page