$/!\$ this project has been effectively supplanted by the LLDB debugger. (This page was previously last updated 2009-06-27 18:47:37 by MarcelMoolenaar.)

The BSD Debugger

This page contains notes and thoughts about the possibility of replacing the GNU debugger.

See also the BSDdbg SourceForge project here: http://sourceforge.net/projects/bsddbg

Motivation

The GNU debugger is GPL licensed.
Many find contributing code back to any outside project (the GDB project in this case) is a hassle.
FreeBSD is not a focus of the GDB project.
Understanding of GDB is lacking to contribute in a way that fits the GDB project.
Making fundamental changes to, say, core files is difficult as it affects multiple projects.
Threading support is not integrated.
Support for arm, ia64, mips and powerpc is not integrated.
Kernel debugging is not integrated.

Functionality

This section is used to collect notes about basic functionality we expect the debugger to have, as well as more advanced features. Its purpose is to draw the landscape of what should and could be, so as to have an indication of the extend of the work and the potential complexities we may need to have to deal with. This helps us to come up with a better design.

Basic functionality

The debugger needs to support kernel debugging as well as user space debugging. Both kernel and user space debugging requires the support for:

Multi-threading.
Multiple load modules (kernel modules or shared libraries).
Live debugging (remote protocol/ptrace).
Retrospective debugging (core file).
Stack unwinding (backtraces).
Source correlation (DWARF).
Disassembly.
Stepping and/or running the target.
Modification of state when live debugging.

Advanced features

Multiple concurrent debugging sessions, with cross-session event handling. This can be used to test inter-process relationships and/or communications.
Cross-debugging, with each session a different ABI, OS and/or architecture.
Remote debugging; such as over a serial connection, but possibly also Firewire, USB and/or ethernet.
More elaborate event handling (above and beyond break- or watchpoints). Things that come to mind are syscall tracing so that one can stop the debuggee on executing a certain syscall.
Injection of code that can test hypotheses and trigger events. This goes above and beyond conditional break- and watch-points.
Snapshots. This allows saving of debugging sessions that can later be continued.
Scripting.
Expressions, preferably using the same syntax and semantics as the source.

Components

The following sections describe the components or basic building blocks that a debugger can make use of.

ELF (libelf)

The fundamental file format is ELF. It's used for executables, shared libraries, kernel, modules and core files for both user space and kernel. At this time only sparc64 does not use ELF based kernel core dumps, normalizing on ELF means that sparc64 needs to change kernel core file formats. It also means that platforms that don't support kernel core files should implement them as ELF files. Note that amd64 uses relocatable object files as modules. While these are ELF, relocatable object files are not designed to be used as load modules. This probably needs special treatment.

The ElfToolChain project has created a BSD licensed libelf implementation.

DWARF (libdwarf)

Source level debugging requires debug information to be present that maps raw machine-level entities to source-level definitions and declarations. The de facto standard for this is DWARF. It is expected that significant work is required to do the source correlation. A BSD licensed DWARF library would be ideal, but lacking that a LGPL library should be a good solution to start off with.

http://reality.sgiweb.org/davea/dwarf.html

Sean Farley pointed me to:

Unwinding (libunwind)

http://www.nongnu.org/libunwind/

Disassembly of machine code (libdisasm)

Use an API that is based on VLIW hardware. Scalar or superscalar processors are just a special case of which the number of operations per instruction is 1. Therefore, disassembly of an instruction at a given address can return something like the following:

Length of the instruction in bytes (can be fixed or variable)
Number of independent operations
- For each operation:
  - Length of operation in bits
  - Bit offset of operation within instruction
  - Format string

The API needs callback functions so that the disassembler can fetch the instruction bytes as well as possibly obtain register contents. For this it seems logical to use the proc_services API.

disasm API:

uint64_t disasm_get_ipmask() - returns the mask needed to get the address of the intruction, rather than the address of the operation. It is assumed that the IP or PC register will hold the address of the operation. On ia64 for example, the IP contains the slot number in the lowest 4 bits. The mask is used to filter-out the operation (or slot number on ia64) so that the address of the instruction (or bundle on ia64) is left. This is the address that the disassembler uses.
struct instr *disasm(uint64_t) - ...

User interface

The following sections collect thoughts about a user interface and the set of commands the user may want to see implemented. This covers both basic functionality as well as advanced features.

Views

While debugging, the user may want to view (inspect) any of the following:

Machine level information:
- disassembly
- register contents
- threads
- backtrace
- raw memory dump
OS level information:
- memory map
- load modules
- open files
- syscall trace
- shared memory
- environment
Source level information:
- source code
- source files
- static/global functions
- local/static/global variables
- TLS storage
- datatypes
Debugger information:
- breakpoints/watchpoints
- sessions

Actions

While debugging, the user may want to perform any of the following actions:

Machine level actions:
- modify machine instructions
- modify register contents
- modify memory contents
OS level actions:
- modify environment
Source level actions:
- modify variables
Debugger actions:
- set/clear breakpoints/watchpoints
- set/clear events
- run/continue
- run/skip to cursor
- step (into, over, outoff)
- save/restore snapshots
- select current session, thread, load module and/or source file