[Dwarf-discuss] [DWARF5] .debug_names + fdebug-types-sections

Mon Oct 16 16:12:26 GMT 2023

On Mon, Oct 16, 2023 at 8:57 AM Alexander Yermolovich <ayermolo@meta.com>
wrote:

> For background llvm discussion on how to implement it:
>
> https://discourse.llvm.org/t/debuginfo-dwarfv5-lld-debug-names-with-fdebug-type-sections/73445
>
> Thanks for explaining the issue, and proposing spec change. 🙂
> The question I have. Is non-bit identical TUs with the same hash a
> fundamental issue that needs to be addressed somehow in the next version of
> the spec? If we could have such guarantee that should simplify things quite
> a bit. The linker can just follow the same path as for functions. Compiler
> can generate symbol name unique for the type unit hash. So, when linker
> comdats TU sections entries in TU list will point to correct address and no
> special logic is needed for tombstone. I guess there is a hashing mechanism
> in DWARF spec, but LLVM is not using it. Should we go back to it, is it
> enough?
>

The hashing mechanism in the spec doesn't guarantee bit-identicality, I
believe. It's structural equivalence (eg: if you produce the main type DIE
followed by an int DIE that the main type needs, or you emit the int DIE
first, followed by the main type DIE - these hash to the same value
(because you start from the type DIE and hash outwards/to what it can
reach, and has structural equivalence - int is int, no matter what offset
it's at)) not bit identical. For a bunch of reasons this is preferable.

(yes, clang takes this further and hashes based on the C++ ODR - which is
off-spec, but workable in our experience)

I was thinking another direction we could go is that, I think, the only
things in a type unit that can be referenced is the type (I think?) then
perhaps we could modify how types defined in type units are referenced.

If only the type can be referenced in a type unit, we could emit a
.debug_names entry without a DW_IDX_die_offset - just the DW_IDX_type_unit
- and the consumer can use the header of the type unit to find the exact
type unit DIE.

Are there any other things that could be referenced within a type unit?

>
> Alex
>
>
>
>
>
>
>
> ------------------------------
> *From:* David Blaikie <dblaikie@gmail.com>
> *Sent:* Monday, September 25, 2023 9:02 AM
> *To:* Alexander Yermolovich <ayermolo@meta.com>
> *Cc:* dwarf-discuss@lists.dwarfstd.org <dwarf-discuss@lists.dwarfstd.org>
> *Subject:* Re: [Dwarf-discuss] [DWARF5] .debug_names +
> fdebug-types-sections
>
>
>
> On Fri, Sep 15, 2023 at 2:45 PM Alexander Yermolovich via
> Dwarf-discuss <dwarf-discuss@lists.dwarfstd.org> wrote:
> >
> > Hello
> >
> > I am trying to enable debug names acceleration table with
> fdebug-types-sections in LLVM. One part I am not sure about is the local TU
> list. It contains an offset into .debug_info section. All the entries have
> an index entry that points to the local TU list. DIEs within entry offsets
> are relative to the TU entry.
> >
> > Linker de-duplicates Type Units using COMDAT. So, the final result will
> have less type units. As the result Local Type Unit List will be invalid,
> and all the Entries that point to that TU will not be valid either. Even if
> we Linker is modified so that somehow when it de-duplicates type sections
> Local Type Units will get the right offset, that still leaves all the
> duplicate entries.
> > Am I missing something in that linker, specifically LLD, will need to be
> aware of context of .debug_names sections when it de-duplicates type
> sections?
> >
> >
> > It seems to me that to fully support it .debug_names need to be created
> by post build tool (or by linker....).
> >
> > Thanks.
>
> While DWARF consumers will benefit from a content-aware linking of
> .debug_names (using one hash table is more efficient than probing
> hundreds/thousands of small hash tables), I don't believe the spec
> as-is requires that for correctness.
>
> In the case of type units, I'd expect behavior somewhat similar to how
> linkers behave with inline functions - if the two copies of the
> function are identical, it's possible that the linker will resolve all
> relocations to the function to the single copy that remains after
> linking (so two CUs would both describe the inline function "f1" and
> both descriptions would have the same start address/length, the two
> CUs CU-level DW_AT_ranges would overlap/both contain that function's
> addresses - and neither would use the tombstone address). So in that
> case, all the duplicate index entries would remain valid (their
> TU-relative offsets would be correct - since the TUs were bit-wise
> identical, so the offsets still point to the same things).
>
> In the case where a producer produces equivalent but not
> bitwise-identical TUs, the linker will choose one, drop the rest, and
> use the tombstone value to resolve the relocation used in the local
> TUs offset list. A consumer should ignore any entries that reference a
> tombstone offset in the local TU list (& probably wouldn't hurt to use
> the same code and ignore any tombstoned CUs too - I can't immediately
> think of a situation/reason that'd happen, but seems like a good
> general idea)
>
> If a consumer does a semantic aware merge of the indexes, then it
> should discard (rather than tombstoning) the index entries that
> reference dead TUs and the dead TUs in the local TU list itself, and
> also discard any duplicate index entries and duplicate elements in the
> local TU list.
>
> We could document the use of the tombstone in this context.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.dwarfstd.org/pipermail/dwarf-discuss/attachments/20231016/1fab90f6/attachment-0001.htm>