[Dwarf-discuss] Proposal: Describe prologue and epilogue ranges

Robinson, Paul paul.robinson@sony.com
Mon Mar 18 21:05:33 GMT 2024


After today's call, hearing some viewpoints and hopefully learning a
few things, I thought I'd take a stab at reframing 240108.1. (Without
once mentioning CFI!) It ended up becoming an alternative proposal,
but I'm fine with Zoran taking it over if he wants to.

# Describe prologue and epilogue ranges

## Background

### Stopping Points

Ordinarily, a source-level debugger will prefer to pause execution of a
program at instructions identified by the compiler as good places to do
so. These include instructions flagged as `is_stmt`, `prologue_end`, or
`epilogue_begin`. A user expects debug info such as source coordinates
and variable locations to be sensible and useful at those points.

It is entirely possible for execution to pause at other instructions.
There are a number of possible reasons for this.

- The user has chosen to single-step instructions rather than statements.
- The user has requested a breakpoint at a specific instruction that
happens not to have any of the above flags.
- An asynchronous exception has occurred and the debugger intercepted it.
- The program has crashed and the user is looking at a core dump.

This list is not exhaustive.

Let's call the instruction where a debugger has paused execution (or
the instruction where a crash was triggered) a "stopping point."

### Prologue/Epilogue Ranges

In DWARF v3 thru v5, a subprogram's prologue(s) and epilogue(s) are
described indirectly by the line table. A prologue generally consists
of all instructions from an entry point up to the first executed
instruction that is flagged as `prologue_end`. An epilogue generally
consists of all instructions from an instruction flagged as
`epilogue_begin` to where the subprogram returns to its caller. These
groups of instructions implicitly form ranges. (These ranges might be
empty.)

A subprogram might have multiple prologues if it has multiple entry
points; more often, it might have multiple epilogues if it has multiple
exit or return points. In particular, when there are multiple epilogues
it is not necessarily clear when an epilogue ends and the next basic
block (which might not be part of any epilogue) begins. (Even in the
case of a single epilogue, a cold but functional basic block might be
placed after the epilogue.)

Due to optimization, prologue or epilogue instructions might be mixed
with other instructions, so in practice prologue and epilogue ranges
might not be contiguous. DWARF does not have a way to describe these
non-contiguous prologue and epilogue ranges. Compilers typically have
various heuristics to pick stopping points for optimized prologue and
epilogue ranges.

### Single Location Descriptions

A single location description (which can be either simple or composite
location descriptions) has the lifetime of its closest containing scope.
The case we care about here is when that scope is a subprogram, and
therefore the lifetime spans the entire subprogram. Pedantically, that
lifetime includes prologue and epilogue ranges.

It is common practice for unoptimized code to allocate local variables
to a stack frame, and use that stack location in the single location
description. Because the stack frame is not necessarily in a valid state
during prologue or epilogue code, in practice, debuggers typically assume
that a single location description is not valid during a prologue or
epilogue, although the DWARF spec does not explicitly say so (AFAIK).

## Overview

A stopping point might occur during a prologue or epilogue range, which
means single location descriptions for subprogram-scope objects might
not be valid.

- It would be good if the DWARF spec actually said single location
descriptions were not necessarily valid in those ranges. This is simply
codifying existing practice.
- It would be good if debuggers could reliably identify prologue and
epilogue ranges.

The proposal adds text that excludes prologues and epilogues from the
implicit range of a subprogram-scope object, and adds a register to the
line-table state machine to identify prologues and epilogues.

Unlike `prologue_end` and `epilogue_begin`, the new `prologue_epilogue`
register is "sticky" in that it is not automatically reset on every
row of the line table. At an entry point, it must be set explicitly to
indicate the beginning of a prologue; it is automatically reset by the
DW_LNS_set_prologue_end. In an epilogue, it is automatically set by
DW_LNS_set_epilogue_begin, and reset by DW_LNE_end_sequence. This means
a function with one contiguous prologue and one contiguous epilogue,
terminated by `end_sequence`, the line-number program needs only one
new opcode to support `prologue_epilogue`.

Note: I have not tried to determine whether this minimizes size in
practice. It might be that prologues and/or epilogues typically occupy
only one row of the line table, in which case having the flag reset on
every row might take up less space.

## Proposed Changes

In Section 2.6 "Location Descriptions" modify the last sentence of
item 1 to read as follows (adding the parenthetical exclusion).

> They are sufficient for describing the location of any object as long
as its lifetime is either static or the same as the lexical block that
owns it (excluding any prologue or epilogue ranges), and it does not
move during its lifetime.

In Section 6.2.2 "State Machine Registers" add the `prologue_epilogue`
register to Table 6.3.

| Register Name | Meaning |
| ------------- | ------- |
| `prologue_epilogue` | A boolean indicating that the current address is
within a prologue or epilogue range. |

(Keep the `prologue_end` and `epilogue_begin` registers.)

In Section 6.2.3 "Line Number Program Instructions" add an entry to
Table 6.4 "Line number program initial state."

| `prologue_epilogue` | "false" |

In Section 6.2.5.2 "Standard Opcodes" modify several descriptions as
follows (exact text changes not specified for simplicity).

- DW_LNS_set_prologue_end: sets the `prologue_epilogue` register
to "false."
- DW_LNS_set_epilogue_begin: sets the `prologue_epilogue` register
to "true."

In Section 6.2.5.3 "Extended Opcodes" add a new opcode at the end.

> 4. DW_LNE_set_prologue_epilogue
>
> The DW_LNE_set_prologue_epilogue opcode takes a single parameter, an
unsigned LEB128 integer. If the parameter is 0, it sets the
`prologue_epilogue` register of the state machine to "false;" for any
other value, it sets the register to "true."

In Section 7.22 "Line Number Information" add a new entry for
DW_LNE_set_prologue_epilogue in Table 7.26 (probably 0x05).

## Dependencies

Not really a dependency, but an implication:

Assemblers will need to add syntax to the `.loc` directive to support
setting/resetting the `prologue_epilogue` flag.

## References

[Issue 240108.1](https://dwarfstd.org/issues/240108.1.html): Add
prologue_begin and epilogue_end state machine registers to allow
identifying multiple prologue and epilogue regions


More information about the Dwarf-discuss mailing list