[Dwarf-discuss] [DWARF5] .debug_names + fdebug-types-sections

Mon Oct 16 21:21:05 GMT 2023

> On Oct 16, 2023, at 9:12 AM, David Blaikie via Dwarf-discuss <dwarf-discuss@lists.dwarfstd.org> wrote:
> 
> 
> 
> On Mon, Oct 16, 2023 at 8:57 AM Alexander Yermolovich <ayermolo@meta.com <mailto:ayermolo@meta.com>> wrote:
>> For background llvm discussion on how to implement it: 
>> https://discourse.llvm.org/t/debuginfo-dwarfv5-lld-debug-names-with-fdebug-type-sections/73445
>> 
>> Thanks for explaining the issue, and proposing spec change. 🙂
>> The question I have. Is non-bit identical TUs with the same hash a fundamental issue that needs to be addressed somehow in the next version of the spec? If we could have such guarantee that should simplify things quite a bit. The linker can just follow the same path as for functions. Compiler can generate symbol name unique for the type unit hash. So, when linker comdats TU sections entries in TU list will point to correct address and no special logic is needed for tombstone. I guess there is a hashing mechanism in DWARF spec, but LLVM is not using it. Should we go back to it, is it enough?
> 
> The hashing mechanism in the spec doesn't guarantee bit-identicality, I believe. It's structural equivalence (eg: if you produce the main type DIE followed by an int DIE that the main type needs, or you emit the int DIE first, followed by the main type DIE - these hash to the same value (because you start from the type DIE and hash outwards/to what it can reach, and has structural equivalence - int is int, no matter what offset it's at)) not bit identical. For a bunch of reasons this is preferable.
> 
> (yes, clang takes this further and hashes based on the C++ ODR - which is off-spec, but workable in our experience)
> 
> I was thinking another direction we could go is that, I think, the only things in a type unit that can be referenced is the type (I think?) then perhaps we could modify how types defined in type units are referenced.
> 
> If only the type can be referenced in a type unit, we could emit a .debug_names entry without a DW_IDX_die_offset - just the DW_IDX_type_unit - and the consumer can use the header of the type unit to find the exact type unit DIE.
> 
> Are there any other things that could be referenced within a type unit?

LLDB will want access to any types contained within the type units. Many classes contain type definitions within the class itself. Any CUs wanting to access these types of course can't, so they have the duplicate the entire declaration context for the type (containing namespaces and the class itself with a DW_AT_declaration(true) attribute) then create the copy of the contained type if it is simple. 

For example every STL class defines all sorts of "iterator", "const_iterator", "reverse_iterator", "size_type", "pointer_type", "reference_type", etc inside of the class. If no variables from a CU references these types, then we won't have access to them if we only add the main type unit type to the .debug_names table. 

So it is correct that the only thing that can be referenced in a type unit is the main type itself from a DWARF perspective, but it would be a shame if no debugger clients can use any of the extra types in the type units unless they are directly referenced (and duplicated) in a CU. 

LLDB notes which CUs and TUs have an entry in the .debug_names table and it will manually index any that didn't have entries. If the .debug_names tables end up only emitting the main type unit type, we will need to manually index each TU to make sure we have access to contained types. 

So I would vote to completely index each TU if possible.

>  
>> 
>> Alex
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> From: David Blaikie <dblaikie@gmail.com <mailto:dblaikie@gmail.com>>
>> Sent: Monday, September 25, 2023 9:02 AM
>> To: Alexander Yermolovich <ayermolo@meta.com <mailto:ayermolo@meta.com>>
>> Cc: dwarf-discuss@lists.dwarfstd.org <mailto:dwarf-discuss@lists.dwarfstd.org> <dwarf-discuss@lists.dwarfstd.org <mailto:dwarf-discuss@lists.dwarfstd.org>>
>> Subject: Re: [Dwarf-discuss] [DWARF5] .debug_names + fdebug-types-sections
>>  
>> 
>> 
>> On Fri, Sep 15, 2023 at 2:45 PM Alexander Yermolovich via
>> Dwarf-discuss <dwarf-discuss@lists.dwarfstd.org <mailto:dwarf-discuss@lists.dwarfstd.org>> wrote:
>> >
>> > Hello
>> >
>> > I am trying to enable debug names acceleration table with fdebug-types-sections in LLVM. One part I am not sure about is the local TU list. It contains an offset into .debug_info section. All the entries have an index entry that points to the local TU list. DIEs within entry offsets are relative to the TU entry.
>> >
>> > Linker de-duplicates Type Units using COMDAT. So, the final result will have less type units. As the result Local Type Unit List will be invalid, and all the Entries that point to that TU will not be valid either. Even if we Linker is modified so that somehow when it de-duplicates type sections Local Type Units will get the right offset, that still leaves all the duplicate entries.
>> > Am I missing something in that linker, specifically LLD, will need to be aware of context of .debug_names sections when it de-duplicates type sections?
>> >
>> >
>> > It seems to me that to fully support it .debug_names need to be created by post build tool (or by linker....).
>> >
>> > Thanks.
>> 
>> While DWARF consumers will benefit from a content-aware linking of
>> .debug_names (using one hash table is more efficient than probing
>> hundreds/thousands of small hash tables), I don't believe the spec
>> as-is requires that for correctness.
>> 
>> In the case of type units, I'd expect behavior somewhat similar to how
>> linkers behave with inline functions - if the two copies of the
>> function are identical, it's possible that the linker will resolve all
>> relocations to the function to the single copy that remains after
>> linking (so two CUs would both describe the inline function "f1" and
>> both descriptions would have the same start address/length, the two
>> CUs CU-level DW_AT_ranges would overlap/both contain that function's
>> addresses - and neither would use the tombstone address). So in that
>> case, all the duplicate index entries would remain valid (their
>> TU-relative offsets would be correct - since the TUs were bit-wise
>> identical, so the offsets still point to the same things).
>> 
>> In the case where a producer produces equivalent but not
>> bitwise-identical TUs, the linker will choose one, drop the rest, and
>> use the tombstone value to resolve the relocation used in the local
>> TUs offset list. A consumer should ignore any entries that reference a
>> tombstone offset in the local TU list (& probably wouldn't hurt to use
>> the same code and ignore any tombstoned CUs too - I can't immediately
>> think of a situation/reason that'd happen, but seems like a good
>> general idea)
>> 
>> If a consumer does a semantic aware merge of the indexes, then it
>> should discard (rather than tombstoning) the index entries that
>> reference dead TUs and the dead TUs in the local TU list itself, and
>> also discard any duplicate index entries and duplicate elements in the
>> local TU list.
>> 
>> We could document the use of the tombstone in this context.
> -- 
> Dwarf-discuss mailing list
> Dwarf-discuss@lists.dwarfstd.org
> https://lists.dwarfstd.org/mailman/listinfo/dwarf-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.dwarfstd.org/pipermail/dwarf-discuss/attachments/20231016/ffe8f98d/attachment-0001.htm>