[Dwarf-Discuss] Fission + cross-CU references (ref_addr)

Thu May 4 20:27:15 GMT 2017

On Thu, May 4, 2017 at 12:46 PM Robinson, Paul <paul.robinson at sony.com>
wrote:

> I think David is correct, that we did not consider LTO and assumed a .dwo
> file would have a single compilation unit in the .debug_info section.  It
> seems to me not hard to fix, but my idea would require an extension to the
> package-file index and I don't see provision in the package-file index for
> vendor extensions (another oversight?).
>

(jumping the gun a bit) - that said, an extension to the set of valid
columns is backwards compatible-ish (yeah, an existing consumer might error
on it, I suppose). But, yeah, might be a good opportunity to revisit & add
versioning and vendor extension point, maybe?

>  In a split-DWARF scenario producing multiple CUs, it's clear that each
> split-full unit in the .dwo file would need a corresponding skeleton unit
> in the .o file,
>

Eric was tossing around an idea that would diverge from that - having a
single skeleton for the whole DWO file (where the DWO file would contain
multiple CUs - though the specifics there we hadn't hashed out - maybe
every CU having the same DWO ID or the like), which could reduce the size
of the debug info in object files in these situations. (LLVM produces a
single debug_addr/debug_ranges/etc in Fission anyway - so every CU in a
fission object file would include the same addr_base, ranges_base, the same
abbrev offset, etc anyway)

But that'd probably be rather invasive a change to Fission? Not sure.

> with matching unique DWO IDs.  The v5 spec basically already says that.
> With multiple split-full units in the same .debug_info section, then
> DW_FORM_ref_addr can support cross-CU references within the section; the
> producer can supply the correct offset within the section without needing
> any relocations.
>

Yep

> How to describe this in the package file?  I'd leave DW_SECT_INFO meaning
> what it does now?describing the base and size of the individual unit.  I'd
> add a new "section identifier" DW_SECT_INFO_FILE or whatever, which
> describes the base and size of the entire .debug_info section contributed
> by the .dwo *file* that the unit came from.  This allows a consumer to find
> each individual unit by DWO ID, as today, and the extra _FILE column
> describes the base-and-size to use when interpreting a DW_FORM_ref_addr
> from that unit.  For any .dwo file that contains only one unit,
> DW_SECT_INFO and DW_SECT_INFO_FILE would have the same values.  The tool
> that creates the package file can omit DW_SECT_INFO_FILE from the index if
> every input .dwo file has only one unit.
>

Ah - that's a smart idea I hadn't considered & makes a lot of sense. This
sounds like it could address my concern/idea around type units too... (will
detail that a bit more later)

> This solution avoids the problem of the *consumer* having to scan the
> .debug_info contribution to find the units; that work can be done once up
> front by the packaging tool.
>

Yep - I don't have a good sense of how expensive such a scan is, but the
index is relatively small, I think (would be good to measure what the index
looks like for LLVM's ThinLTO which will cause many more CUs to exist
(because every primary compilation that imports a few functions from other
CUs will get a separate CU for each CU it imports from))

> Section identifiers are 32 bits wide, and the defined values are just 1-8;
> surely we can allocate some for vendor extensions!
>

Seems legit.

> And then it's no problem to have tools produce the new column for the
> index.  Consumers will just ignore section identifiers that they don't
> recognize, same as any other part of DWARF.
>

For sure.

> Would that address the problem?
>

Sounds like it.

So here's some extra wrinkles/ideas:

Wrinkle 1: I think binutils DWP currently drops duplicate units (units with
the same DWO ID). With this change, that wouldn't be possible - or at least
all the /bytes/ of the DWO would have to be imported regardless, and in a
contiguous chunk, so that cross-CU references would resolve correctly (if
you had a DWO with 3 CUs in it, the middle of which turned out to be
duplicate & was dropped, then the offset from a DIE in CU 1 to a DIE in CU
3 (now 2) would be broken). I think that's probably OK - such a DWP can
only have one entry in the cu_index for that signature - but it can't drop
the bytes anymore... *shrug*

Wrinkle 2: Type units go in the debug_info section, but you really do want
to be able to drop duplicate type units when creating a DWP (that being the
point of type units). So maybe require that DWO files have all the CUs
first, then the TUs? and the INFO_FILE range only applies to the range over
the CUs? This hurts/walks back the unification of CUs and TUs by special
casing, unfortunately... - other ideas?

Idea: I've been wondering about the idea of not putting types in type units
if the producer knows there won't be duplicates (for example Clang's (&
GCC's) vtable-based optimization - if a type has a vtable, only put the
type definition where the vtable is emitted - well, if the key function is
strong, then you know it's going in exactly one place... so why add the
overhead of a type unit?). But that makes it awkward for types in type
units that want to refer to these ununited types. A simple implementation
could produce a declaration of the ununited type in the united type, but
that's some overhead - an alternative would be to use ref_addr to refer to
the ununited type in the CU - /assuming/ that ref_addr always refers to the
debug_info section, not the debug_types section - or in the case of DWP,
assuming that ref_addr is resolved relative to the new INFO_FILE range
you're proposing (with my ammendment above that it only apply to the
(required to be) contiguous range of CUs).

- Dave

>
> --paulr
>
>
>
> *From:* Dwarf-Discuss [mailto:dwarf-discuss-bounces at lists.dwarfstd.org] *On
> Behalf Of *David Blaikie
> *Sent:* Tuesday, May 02, 2017 12:10 PM
> *To:* dwarf-discuss at lists.dwarfstd.org
> *Subject:* [Dwarf-Discuss] Fission + cross-CU references (ref_addr)
>
>
>
> I've recently been trying to resolve the use of Fission in LLVM's ThinLTO
> mode (though this would apply to plain LTO too).
>
>
>
> One of the things that happens here is that cross-CU DIE references
> (DW_FORM_ref_addr) are used to describe inlining a function in one CU into
> another CU.
>
>
>
> This format has been implemented in LLVM and GCC for ~years and seems to
> work well outside of Fission.
>
>
>
> So the question is: what to do with Fission?
>
>
>
> It seemed to me that a good representation would be to produce multiple
> CUs into a single DWO file, which GDB can't yet consume, but I'm working on
> patches to help there. DW_FORM_ref_addr would not use any ELF relocation,
> but be assumed to be "relative to the chunk of debug_info it was in"
> (within the .dwo file)
>
>
>
> But what about DWP files? Currently binutils dwp produces records like
> this:
>
>
>
> (this dwp contains 3 CUs, two from one LTO compile, and one from a
> standalone compile linked in for comparison):
>
>
>
> Index Signature          INFO     ABBR     LINE     STR_OFF
>
> ----- ------------------ -------- -------- -------- --------
>
>     2 0x7bd765349b7e7631 [2d, 65) [38, ae) [11, 22) [14, 3c)
>
>     8 0x66f4e160661d2687 [00, 2d) [00, 38) [00, 11) [00, 14)
>
>    11 0x32dd6d7121dd1d9a [65, 98) [38, ae) [11, 22) [14, 3c)
>
>
>
> So the ABBR/LINE/STR_OFF sections are kept as-is (no analysis is done to
> find which portions of the dwo file are used by which CUs, etc), but the
> INFO section is fragmented on the CU boundaries. Fragmenting the TYPES
> section on the TU boundaries is necessary/useful for deduplication of
> types, but this fragmenting of the CU makes it impossible (I think) to use
> ref_addr in a dwp file.
>
>
>
> If this fragmenting were not done - consumers (GDB, etc) would need to
> change to account for this - searching through the INFO range to find the
> CU matching the signature, rather than knowing it starts at the start of
> the INFO range. This could have a noticeable performance impact especially
> in a full LTO build (where /all/ the CUs were in the same .dwo - so the
> index would be entirely unhelpful, I think).
>
>
>
> Does all this sound right/sane - anyone have ideas/perspectives/thoughts
> on how this should work?
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.dwarfstd.org/private.cgi/dwarf-discuss-dwarfstd.org/attachments/20170504/a0ccc33f/attachment.htm>