[Dwarf-Discuss] Fission + cross-CU references (ref_addr)

Thu May 4 19:46:12 GMT 2017

I think David is correct, that we did not consider LTO and assumed a .dwo file would have a single compilation unit in the .debug_info section.  It seems to me not hard to fix, but my idea would require an extension to the package-file index and I don't see provision in the package-file index for vendor extensions (another oversight?).

In a split-DWARF scenario producing multiple CUs, it's clear that each split-full unit in the .dwo file would need a corresponding skeleton unit in the .o file, with matching unique DWO IDs.  The v5 spec basically already says that.  With multiple split-full units in the same .debug_info section, then DW_FORM_ref_addr can support cross-CU references within the section; the producer can supply the correct offset within the section without needing any relocations.

How to describe this in the package file?  I'd leave DW_SECT_INFO meaning what it does now?describing the base and size of the individual unit.  I'd add a new "section identifier" DW_SECT_INFO_FILE or whatever, which describes the base and size of the entire .debug_info section contributed by the .dwo *file* that the unit came from.  This allows a consumer to find each individual unit by DWO ID, as today, and the extra _FILE column describes the base-and-size to use when interpreting a DW_FORM_ref_addr from that unit.  For any .dwo file that contains only one unit, DW_SECT_INFO and DW_SECT_INFO_FILE would have the same values.  The tool that creates the package file can omit DW_SECT_INFO_FILE from the index if every input .dwo file has only one unit.

This solution avoids the problem of the *consumer* having to scan the .debug_info contribution to find the units; that work can be done once up front by the packaging tool.

Section identifiers are 32 bits wide, and the defined values are just 1-8; surely we can allocate some for vendor extensions!  And then it's no problem to have tools produce the new column for the index.  Consumers will just ignore section identifiers that they don't recognize, same as any other part of DWARF.

Would that address the problem?
--paulr

From: Dwarf-Discuss [mailto:dwarf-discuss-bounces@lists.dwarfstd.org] On Behalf Of David Blaikie
Sent: Tuesday, May 02, 2017 12:10 PM
To: dwarf-discuss at lists.dwarfstd.org
Subject: [Dwarf-Discuss] Fission + cross-CU references (ref_addr)

I've recently been trying to resolve the use of Fission in LLVM's ThinLTO mode (though this would apply to plain LTO too).

One of the things that happens here is that cross-CU DIE references (DW_FORM_ref_addr) are used to describe inlining a function in one CU into another CU.

This format has been implemented in LLVM and GCC for ~years and seems to work well outside of Fission.

So the question is: what to do with Fission?

It seemed to me that a good representation would be to produce multiple CUs into a single DWO file, which GDB can't yet consume, but I'm working on patches to help there. DW_FORM_ref_addr would not use any ELF relocation, but be assumed to be "relative to the chunk of debug_info it was in" (within the .dwo file)

But what about DWP files? Currently binutils dwp produces records like this:

(this dwp contains 3 CUs, two from one LTO compile, and one from a standalone compile linked in for comparison):

Index Signature          INFO     ABBR     LINE     STR_OFF
----- ------------------ -------- -------- -------- --------
    2 0x7bd765349b7e7631 [2d, 65) [38, ae) [11, 22) [14, 3c)
    8 0x66f4e160661d2687 [00, 2d) [00, 38) [00, 11) [00, 14)
   11 0x32dd6d7121dd1d9a [65, 98) [38, ae) [11, 22) [14, 3c)

So the ABBR/LINE/STR_OFF sections are kept as-is (no analysis is done to find which portions of the dwo file are used by which CUs, etc), but the INFO section is fragmented on the CU boundaries. Fragmenting the TYPES section on the TU boundaries is necessary/useful for deduplication of types, but this fragmenting of the CU makes it impossible (I think) to use ref_addr in a dwp file.

If this fragmenting were not done - consumers (GDB, etc) would need to change to account for this - searching through the INFO range to find the CU matching the signature, rather than knowing it starts at the start of the INFO range. This could have a noticeable performance impact especially in a full LTO build (where /all/ the CUs were in the same .dwo - so the index would be entirely unhelpful, I think).

Does all this sound right/sane - anyone have ideas/perspectives/thoughts on how this should work?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.dwarfstd.org/private.cgi/dwarf-discuss-dwarfstd.org/attachments/20170504/93ce0082/attachment-0001.htm>