[Dwarf-discuss] Proposal: `DW_LNS_indirect_line`

Tue Jul 23 08:22:19 GMT 2024

On Mon, Jul 22, 2024 at 5:29 PM David Blaikie <dblaikie@gmail.com> wrote:
>
>
>
> On Sun, Jul 21, 2024 at 3:54 PM Jacob Young <jacobly.alt@gmail.com> wrote:
>>
>> > On 12/07/2024 19:04, David Blaikie wrote:
>> > > Thanks for all the context (I noticed you replied directly to me - are
>> > > you happy/OK having this discussion on the mailing list, rather than
>> > > in private? It'd help to keep all the history visible, linkable, etc)
>> > Yes, apologies, that was just a mistake on my part -- I meant to do
>> > this, then realised I accidentally replied to you directly, so went to
>> > forward it to the list, and only now realised that I accidentally
>> > forwarded it to a completely unrelated list :P
>> > > While I can appreciate the desire to make this update O(N) in the
>> > > number of source lines affected - would it be acceptable for this to
>> > > be O(N) in the number of machine/object level functions?
>> > >
>> > > Like if we had a feature for resetting the line table program part-way
>> > > through a line table - would that be adequate? Then your external data
>> > > could keep track of the line number setting operation at the start of
>> > > the new/distinct/indepednent sequence and update that?
>> > While not ideal for us, this would certainly be an improvement. Dealing
>> > with the relative line number offsets is definitely the biggest pain
>> > point wrt constructing the LNP today.
>> > > Such a feature would have some more generality/usefulness directly
>> > > without external/side data - for instance it would make chunks of the
>> > > line table discardable, which could make it easier for a producer to
>> > > use comdats to isolate the line table data associated with an inline
>> > > function and allow the linker to discard such a contribution if desired.
>>
>> My idea for solving this problem without additional side data is to instead
>> add a line table opcode that references an existing DIE with DECL attributes
>> and sets the state machine's line register to the value of its DW_AT_decl_line
>> attribute.  Similarly, DW_AT_decl_file and DW_AT_decl_column could also
>> be copied to file and column for consistency.  By itself, this doesn't reduce
>> the number of updates required, but it could be combined with an additional
>> DIE tag for representing the source decl before inlining/instantiation and a
>> DIE attribute for referencing the source decl from the inlined/instantiated
>> DIE which would indicate that DECL attributes are copied from the source
>> decl.  That way, the DECL attributes would only ever need to be updated in
>> a single place, the source decl DIE.
>>
>> Instead of creating a new tag, it also seems pretty straightforward to just
>> reuse DW_AT_subprogram for the source decl.  Since uninstantiated
>> functions would not correspond to any program addresses, a missing
>> DW_AT_low_pc or new flag attribute could indicate this. The intended
>> meaning of DW_AT_abstract_origin already links an inlined function to
>> the source decl.  For instantiations, it's possible to add meaning to either
>> DW_AT_abstract_origin or DW_AT_specification, or create a new attribute.
>> Both of these existing attributes already indicate that the referenced DIE
>> contains some of the attributes of the referencing DIE, but I am probably
>> stretching the intended meaning of the latter too far for this particular use
>> case.  (Actually, I wrote that reading DW_AT_abstract_origin as being only
>> documented as related to inlined functions, but I see now that it explicitly
>> "can be used with almost any debugging information entry" and the name
>> already seems to me to fit this instantiation use case perfectly.)
>>
>> I should note that I'm already in the process of considering whether new
>> tags/attributes will be needed, some to support incremental, and some to
>> support Zig Language concepts that do not exist in DWARF yet.  It seems
>> likely that at least some number of new definitions will be needed anyway,
>> and I would expect that being able to represent source decls in a separate
>> DIE with references from the generated decls is useful independently of
>> whether this proposed line table opcode is accepted.  (Although with my
>> newfound understanding of DW_AT_abstract_origin, I'm now back down
>> to only one new DIE tag and one new DIE attribute to support incremental.)
>>
>> I'm sure this could easily be converted into a more concrete proposal,
>> but I am interested in getting some feedback first, since I don't know if
>> there are any pre-existing constraints preventing .debug_info from being
>> referenced from .debug_line.  As I hope I have demonstrated, this seems
>> much more extensible to new use cases than the previous proposal.
>
>
> Yeah, that'd be the tough part - currently the line table doesn't reference anything in debug_info, and that's both beneficial in some ways (means you can strip everything but the line table and still get symbolized stack traces (& we added debug_line_str and some other changes to make this more feasible in DWARFv5)) and just trickier to implement (the line table is created by the assembler, but the debug_info is just arbitrary assembly code created by the compiler - so having them reference each other beyond that initial stmt_list reference is more complicated than other DWARF features).
>
> Sharing some info between template instantiations might be nice, but I'd like to see an analysis/prototype of the savings to be gained - it'd probably have to/likely go along with something like simplified template names to allow sharing the base name between instantiations, but really I'm not sure if all the extra DIE-to-DIE references would be cheap enough to help with savings from whatever shared data there might be.

I understand that the new line table opcode might be more difficult to
emit for some
producers, but in the same way that DW_LNS_ﬁxed_advance_pc was added for
producers who cannot encode LEB128, the rest of the existing opcodes would exist
for producers who cannot encode DIE references.  Moreover, since I wouldn't
necessarily expect the new opcode to produce smaller line tables, I
would only expect
it to be used when incremental compiler features are in use, and that the same
compiler would not use this opcode when emitting a release build with
debug info.

I see now the mention that "It is needed to support the common
practice of stripping
all but the line number sections (.debug_line and .debug_line_str)
from an executable."
I agree that binaries that contain this new line table opcode would
not be able to be
stripped by simply deleting certain sections while retaining useful
info in the .debug_line
section.  However, debug info aware tooling could be capable of
rewriting the line table
to remove this opcode while preserving its meaning by inspecting the
referenced DIE
and replacing it with the equivalent state changes before deleting the
DIE section.
Even more importantly, I would not expect this opcode to be used in a
non-incremental
output binary, which would still retain the ability of being partially
stripped by simple tools.

To be clear, incremental binaries are only meant to be used during the
development
loop, where compilation speed and good debug info are more important
than compact
binaries.  It does seem like sharing info between instantiations is
already possible without
changes to the spec, and the incremental use case is for fewer things
to update when
source code shifts, even when it comes at the expense of more compact encodings.

Based on the feedback so far, nothing seems to be blocking this
particular use case, so
it seems reasonable to continue development of an incremental mode
that makes use of
these features, to see how well they work out in practice.