[Dwarf-Discuss] .debug_addr entry plus offset

David Blaikie dblaikie@gmail.com
Tue Sep 15 18:56:44 GMT 2020

On Tue, Sep 15, 2020 at 10:13 AM Robinson, Paul via Dwarf-Discuss
<dwarf-discuss at lists.dwarfstd.org> wrote:
> David Blaikie has brought this up with me (or in conversations that
> I observed) a couple of times:

Thanks for bringing this up! Not sure if I've raised this on
dwarf-discuss specifically before.. ah, yeah, 3 years ago:

Most recently I had an idea for a workaround that I proposed on the
llvm-dev mailing list:
The idea being that actually using debug_rnglists even for contiguous
ranges would reduce .o/executable file size when using Split DWARF. I
think the data I had even showed breakeven for non-split DWARF object
files, probably slight growth for linked executables in that case,

> It's common to want to refer to a particular address plus an offset,
> for example for DW_AT_low_pc or DW_AT_ranges to describe a lexical
> block or inlined subprogram within another subprogram.

Yep - the ones I'm especially interested in now, are those that won't
be addressed even by a "ranges everywhere" approach (though that
approach does have size tradeoffs that I'd like to avoid/improve on
too, for sure!) - DW_TAG_call_site's
DW_AT_call_pc/DW_AT_call_return_pc and DW_TAG_label's DW_AT_low_pc.
The latter isn't super common in code I'm dealing with, but the former
is pretty ubiquitous now.

>  Generally
> the only symbolic address available is the entry point of the
> containing subprogram.  Back when addresses were held directly in
> the .debug_info section, the attributes would have relocations, the
> offset would be encoded into the relocation and the linker would
> just do the right thing.
> With DWARF v5, we now have the .debug_addr section, which contains
> the addresses to be fixed up by the linker.  But, we don't have a
> way to specify an offset to add to an entry in the .debug_addr
> section; instead, each unique addr+offset requires its own entry
> in the .debug_addr table.  This consumes additional space, these
> entries are generally not reusable, and it doesn't reduce the
> overall number of relocations that the linker must process.

If you're encountering size penalties with non-split DWARFv5 due to
debug_addr indirection - we could change LLVM to choose which
addresses to indirect and which ones to use the classing/DWARFv4-esque
(But, yeah, overall, I think it's better for lots of use cases to
support an addr+offset encoding)

> It's not feasible to define a new attribute for address+offset,
> because an attribute has only one value, and the attribute would
> have to specify both the .debug_addr index and the offset to add.

I don't follow this ^ - I think previously we've discussed at least 2
representations that could do this:
generalized exprloc support

admittedly uleb+uleb has the problem that it's a variable-length
encoding, but at least LLVM currently is using addrx exclusively, and
not the addrxN fixed length encodings.

> But, we could define an "indirect" entry in .debug_addr, and then
> reference it with an attribute in the same way that we reference
> any other .debug_addr entry.

This direction would, for my use case, be unfortunate - since my goal
is to remove as much DWARF from object files as possible under Split
DWARF - so leaving anything extra in debug_addr works against that

> An indirect entry would be the same size as all other entries in
> .debug_addr (i.e., the size of an address on the target).  The
> upper half would be another index into .debug_addr and the lower
> half would be the addend.  The consumer adds the addend to the
> value from the entry specified by the "another index."

If it's OK to use such a small fixed length encoding (addrx supports
variable length with fixed lengths of 1/2/3/4 - offsets in LLVM are
emitted as data4) then we could introduce that as the
FORM_addrx4_offset4 (or could make it variable length depending on
pointer size - but that seems less relevant when it's not uin the
debug_addr section) form and a uleb+uleb form, without providing all
the possible combinations of addrx{1,2,3,4,N}_offset{1,2,3,4,M}.

In any case, I think of these forms as sort of special
case/compact/easier to parse encodings of the generalized exprloc
(DW_OP_addrx(N), DW_OP_constu(M), DW_OP_plus).

> This solution doesn't save space in .debug_addr, but it does
> reduce the number of relocations.  Ideally .debug_addr would
> require only one relocation per function.
> We can debate whether the addend should be signed or unsigned,
> and whether the indirect entries should be a separate subtable,
> but I wanted to float the idea here before I wrote it up as a
> proposal.

I'd be fairly in favor of unsigned. Generally LLVM already picks the
first address used by DWARF in any ELF section as the address to put
in the pool - trying to do everything relative to that that it can.
(so, eg: if you have DW_AT_ranges on a DW_TAG_lexical_scope in your
function, the rnglist for that will set a base address of the start of
the function (say, assuming function sections) and use offset pairs
relative to that)

> Alternatively, the indirect sub-table could be encoded with
> ULEB/SLEB pairs, but that makes it hard to find them by index.
> They could be found by a direct reference, but that requires a
> relocation from .debug_info to .debug_addr, so we haven't saved
> any relocations that way.
> If there are obvious flaws I can't see, or someone is inspired
> to come up with another solution, please let me know!  Otherwise
> I'll write it up as a formal proposal probably later this week.
> Thanks,
> --paulr
> _______________________________________________
> Dwarf-Discuss mailing list
> Dwarf-Discuss at lists.dwarfstd.org
> http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org

More information about the Dwarf-discuss mailing list