[Dwarf-Discuss] Fission + cross-CU references (ref_addr)

David Blaikie dblaikie at gmail.com
Thu May 4 19:43:04 PDT 2017


On Thu, May 4, 2017 at 6:22 PM Robinson, Paul <paul.robinson at sony.com>
wrote:

> I think it's pretty safe to say:
>
> -        A reference into a TU from a CU or a different TU is invariably
> by ref_sig8, never by section offset.
>
> -        A reference into a CU from another CU has to be by ref_addr; in
> a .o file this can use a relocation, in a .dwo file it has to be from
> inside the same .debug_info contribution.
>
> -        A reference into a CU from a TU is not allowed, even if the TU
> lives in the same .debug_info contribution.
>
> I don't have my hands on words in the document that say these things, but
> I am quite sure that's the intent.  It's not important whether object-file
> mechanics would allow you to do the things that aren't allowed above.
>

Fair points, all - I'd be curious to see the wording, for sure. From what I
could read it certainly seemed implied/assumed, but not explicit.

I was mentioning this mostly by way of opportunistically suggesting "it
might be a thing that would be good to think explicitly about, possibly
allow, and design this new stuff around the possibility, perhaps".


>
>
> The rest of what I'm suggesting all follows (reiterating for clarity):
>
> -        If a .dwo file has multiple split-full CUs, they each have a
> unique DWO ID (so the index can describe them individually).
>
> -        Therefore, the corresponding .o has a distinct corresponding
> skeleton CU for each split-full CU.
>
> -        Cross-CU references within a .dwo file are by DW_FORM_ref_addr
> and the related CUs must be in the same .debug_info contribution.
>
> -        Split-full CUs without cross-CU references can be in separate
> .debug_info contributions within the .dwo file.
>
Adding semantics to "contributions" in the .dwo file seems like a big step
that wasn't present before I think.

It would mandate the use of (at least 2) sections when using Fission+type
units, reducing some compression opportunity. (yeah, in theory,
implementation detail - a platform could agree to some other format where,
say, .debug_info.dwo starts with an int specifying how long the CU prefix
is, after which there are type units)

> -        A packager should look for multiple CUs in a .debug_info
> contribution, be willing to create an index entry for each one, and not
> split up the contribution even if one or more of the CUs has already been
> included from elsewhere.
>
But only for CUs, not TUs, right. So as long as the producer used a
separate section chunk/"contribution" for the TUs (even in the v5 TU/CU
unification into the debug_info section) then the packager could continue
to fragment a chunk containing only TUs, but if it contained any CUs that
chunk would be indivisible.

> -        A packager can drop an entire .debug_info contribution if *all*
> of the CUs in that contribution have been included from elsewhere.  (This
> trivially covers the one-CU-per-contribution case.)
>
> -        The package index should get a new column to describe the entire
> .debug_info contribution containing the CU, so that consumers can know how
> to resolve DW_FORM_ref_addr.
>
>
>
> You're probably still thinking of wrinkles I haven't addressed; let me
> know.
>

Not much really - it about covers it, I think it just gets a little hairy
in spots mentioned above.

- Dave


> --paulr
>
>
>
> *From:* David Blaikie [mailto:dblaikie at gmail.com]
> *Sent:* Thursday, May 04, 2017 5:30 PM
> *To:* Robinson, Paul; dwarf-discuss at lists.dwarfstd.org; Eric Christopher
> *Subject:* Re: [Dwarf-Discuss] Fission + cross-CU references (ref_addr)
>
>
>
>
>
> On Thu, May 4, 2017 at 5:05 PM Robinson, Paul <paul.robinson at sony.com>
> wrote:
>
> Skeleton units are pretty small; it's a 20-byte header, plus values for
> the compile_unit DIE, which is spec'd to have no children.  I would not be
> concerned about space there.  And having unique DWO IDs per unit seems
> pretty useful.
>
> I tend to agree, though - what sort of uses do you have in mind?
>
>
>
> A unique DWO ID per unit lets each DWO unit have a distinct entry in the
> index… saves the consumer the trouble of having to read the .debug_info
> section to find the units.
>
>
> Yep
>
>
>   If you want to require consumers to do more work, you can make DWO IDs
> be per-file instead of per-unit, and then there's no need for an INFO_FILE
> column because the INFO column would necessarily have to cover the entire
> .debug_info section from that file.
>
>
> Yep - time/space tradeoff, and I'd probably err on the side of time myself
> (by having separate skeletons and cu_index entries for each CU) as you've
> suggested. Just floating the other as an alternative since it did come up.
>
>
>
>
> In non-split DWARF, type units are spec'd to have their own object-file
> section contributions, separate from the compile unit(s);
>
> That's sort of an implementation detail though, isn't it? DWARF just talks
> about bytes in sections (type units go in the debug_types section (or, now,
> the debug_info section)) and, yeah, you can use comdat groups and separate
> chunks of debug_types sections to deduplicate them, but I don't think DWARF
> requires/speaks about that, does it?
>
> Actually DWARF 5 Appendix E does describe this; not as a required tactic,
> but a way to achieve the useful effect of deduplicating type units etc.  So
> yes it was overstating the case to say they are _spec'd_ to have their own
> contributions, but that would be what a producer would normally do.
>
>
> Ah, cool - thanks for the pointer about where the wording is.
>
>
>
>
> that's what lets type units have a COMDAT key and be uniqued by the
> linker, even though all those separate contributions have the same section
> name.  (In ELF, you have multiple section headers with the same section
> name.)  Surely the DWO file could be (is?) done the same way, with each
> type unit in its own contribution to the .debug_info section?
>
> Funny story about that...
>
> Heh.  Which way works better (types all together, or each in their own
> contribution) depends on whether your packager wants to deduplicate in a
> linker-like way, based on COMDATs,
>
>
> I think neither GCC nor Clang used COMDATs in DWO files - but GCC still
> put them in separate sections (sections with the same name... ) - a weird
> beast to me, but apparently it's a thing that works.
>
> But yeah, for non-Fission, COMDATs seem solid, though do represent a
> limitation on compressibility, etc.
>
>
> or in a purpose-built way, by looking at the type-unit signatures
> directly.  DWARF doesn't say you have to do one or the other, which
> provides implementation flexibility to the toolchain.  If your packager is
> willing to look through TUs for signatures and deduplicate that way, then
> you can stuff all the TUs into one section contribution and get better
> compression.  Quality of Implementation, as we like to say.
>
>
> To be sure!
>
>
>
>
> still there would need to be a special case where the TU's debug_info
> chunk would have an INFO_FILE contribution that represente the CU chunk. So
> a DWP tool would have to special case the info chunk that contained the CUs
> (& would have to require that there be only one if there are to be TU->CU
> references. I suppose if TU->CU references aren't supported
>
>
>
> I don't think TU->CU references are permitted.
>
>
> For now, with Fission, I agree.
>
> I think without Fission you could certainly use ref_addr to refer to
> something in a CU from a TU - /maybe/ even from a TU to another TU but I
> don't think so (not sure if the linker would do the right thing about
> reachability, etc - and if your TUs differed in layout, which they can even
> in Clang, that wouldn't work out well if it picked a different TU and
> either null'd out the ref_addr, or made it refer to the same offset in a
> different copy of the type (I don't think any linker/reloc construct would
> really result in this latter situation))
>
>
> Certainly you could not have a v4 split TU referencing a CU, that would be
> impossible.
>
>
>
> v4 didn't have split things (Fission being a v5 feature, I think), did it?
> What's the distinction you're drawing there?
>
>
>
>   (Without relocations, you can't use DW_FORM_ref_addr to point from
> .debug_types to .debug_info; and DW_FORM_ref_sig8 is only for references to
> other type units.)  While you could engineer the possibility in v5, because
> type units have moved back into .debug_info and in principle you could
> arrange for DW_FORM_ref_addr to do that, I am morally certain there was no
> intent to allow that.
>
>
> Right, I doubt there was any intent - but as we're choosing some new
> representations, etc, I'm wondering if it's something to think about.
>
> Even without the TU->XU reference question, the TU/CU unification still
> means that a DWP creation tool would have to special case the CUs in some,
> or require the TUs to be placed in separate sections as GCC does it. (then
> it could treat each unit section as an indivisible blob)
>
> Maybe this fits into quality of implementation - but I Think the presence
> of cross-unit references makes this a bit more of a matter for the standard
> as to how these groups are defined, where cross-CU references are resolved
> relative to, how can type units be dropped (or not), etc.
>
> I'm sort of leaning towards "ref_addr offsets are resolved relative to the
> widest range of CUs in a single section that contains the referring DIE" -
> though that is a bit of a mouthful/awkward thing to implement.
>
> - Dave
>
>
> --paulr
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.dwarfstd.org/private.cgi/dwarf-discuss-dwarfstd.org/attachments/20170505/79bc250d/attachment-0001.htm>


More information about the Dwarf-Discuss mailing list