[Dwarf-Discuss] Fission + cross-CU references (ref_addr)

Fri May 5 01:22:08 GMT 2017

I think it's pretty safe to say:

-        A reference into a TU from a CU or a different TU is invariably by ref_sig8, never by section offset.

-        A reference into a CU from another CU has to be by ref_addr; in a .o file this can use a relocation, in a .dwo file it has to be from inside the same .debug_info contribution.

-        A reference into a CU from a TU is not allowed, even if the TU lives in the same .debug_info contribution.
I don't have my hands on words in the document that say these things, but I am quite sure that's the intent.  It's not important whether object-file mechanics would allow you to do the things that aren't allowed above.

The rest of what I'm suggesting all follows (reiterating for clarity):

-        If a .dwo file has multiple split-full CUs, they each have a unique DWO ID (so the index can describe them individually).

-        Therefore, the corresponding .o has a distinct corresponding skeleton CU for each split-full CU.

-        Cross-CU references within a .dwo file are by DW_FORM_ref_addr and the related CUs must be in the same .debug_info contribution.

-        Split-full CUs without cross-CU references can be in separate .debug_info contributions within the .dwo file.

-        A packager should look for multiple CUs in a .debug_info contribution, be willing to create an index entry for each one, and not split up the contribution even if one or more of the CUs has already been included from elsewhere.

-        A packager can drop an entire .debug_info contribution if *all* of the CUs in that contribution have been included from elsewhere.  (This trivially covers the one-CU-per-contribution case.)

-        The package index should get a new column to describe the entire .debug_info contribution containing the CU, so that consumers can know how to resolve DW_FORM_ref_addr.

You're probably still thinking of wrinkles I haven't addressed; let me know.
--paulr

From: David Blaikie [mailto:dblaikie@gmail.com]
Sent: Thursday, May 04, 2017 5:30 PM
To: Robinson, Paul; dwarf-discuss at lists.dwarfstd.org; Eric Christopher
Subject: Re: [Dwarf-Discuss] Fission + cross-CU references (ref_addr)

On Thu, May 4, 2017 at 5:05 PM Robinson, Paul <paul.robinson at sony.com<mailto:paul.robinson at sony.com>> wrote:
Skeleton units are pretty small; it's a 20-byte header, plus values for the compile_unit DIE, which is spec'd to have no children.  I would not be concerned about space there.  And having unique DWO IDs per unit seems pretty useful.
I tend to agree, though - what sort of uses do you have in mind?

A unique DWO ID per unit lets each DWO unit have a distinct entry in the index? saves the consumer the trouble of having to read the .debug_info section to find the units.

Yep

  If you want to require consumers to do more work, you can make DWO IDs be per-file instead of per-unit, and then there's no need for an INFO_FILE column because the INFO column would necessarily have to cover the entire .debug_info section from that file.

Yep - time/space tradeoff, and I'd probably err on the side of time myself (by having separate skeletons and cu_index entries for each CU) as you've suggested. Just floating the other as an alternative since it did come up.

In non-split DWARF, type units are spec'd to have their own object-file section contributions, separate from the compile unit(s);
That's sort of an implementation detail though, isn't it? DWARF just talks about bytes in sections (type units go in the debug_types section (or, now, the debug_info section)) and, yeah, you can use comdat groups and separate chunks of debug_types sections to deduplicate them, but I don't think DWARF requires/speaks about that, does it?
Actually DWARF 5 Appendix E does describe this; not as a required tactic, but a way to achieve the useful effect of deduplicating type units etc.  So yes it was overstating the case to say they are _spec'd_ to have their own contributions, but that would be what a producer would normally do.

Ah, cool - thanks for the pointer about where the wording is.

that's what lets type units have a COMDAT key and be uniqued by the linker, even though all those separate contributions have the same section name.  (In ELF, you have multiple section headers with the same section name.)  Surely the DWO file could be (is?) done the same way, with each type unit in its own contribution to the .debug_info section?
Funny story about that...
Heh.  Which way works better (types all together, or each in their own contribution) depends on whether your packager wants to deduplicate in a linker-like way, based on COMDATs,

I think neither GCC nor Clang used COMDATs in DWO files - but GCC still put them in separate sections (sections with the same name... ) - a weird beast to me, but apparently it's a thing that works.

But yeah, for non-Fission, COMDATs seem solid, though do represent a limitation on compressibility, etc.

or in a purpose-built way, by looking at the type-unit signatures directly.  DWARF doesn't say you have to do one or the other, which provides implementation flexibility to the toolchain.  If your packager is willing to look through TUs for signatures and deduplicate that way, then you can stuff all the TUs into one section contribution and get better compression.  Quality of Implementation, as we like to say.

To be sure!

still there would need to be a special case where the TU's debug_info chunk would have an INFO_FILE contribution that represente the CU chunk. So a DWP tool would have to special case the info chunk that contained the CUs (& would have to require that there be only one if there are to be TU->CU references. I suppose if TU->CU references aren't supported

I don't think TU->CU references are permitted.

For now, with Fission, I agree.

I think without Fission you could certainly use ref_addr to refer to something in a CU from a TU - /maybe/ even from a TU to another TU but I don't think so (not sure if the linker would do the right thing about reachability, etc - and if your TUs differed in layout, which they can even in Clang, that wouldn't work out well if it picked a different TU and either null'd out the ref_addr, or made it refer to the same offset in a different copy of the type (I don't think any linker/reloc construct would really result in this latter situation))

Certainly you could not have a v4 split TU referencing a CU, that would be impossible.

v4 didn't have split things (Fission being a v5 feature, I think), did it? What's the distinction you're drawing there?

  (Without relocations, you can't use DW_FORM_ref_addr to point from .debug_types to .debug_info; and DW_FORM_ref_sig8 is only for references to other type units.)  While you could engineer the possibility in v5, because type units have moved back into .debug_info and in principle you could arrange for DW_FORM_ref_addr to do that, I am morally certain there was no intent to allow that.

Right, I doubt there was any intent - but as we're choosing some new representations, etc, I'm wondering if it's something to think about.

Even without the TU->XU reference question, the TU/CU unification still means that a DWP creation tool would have to special case the CUs in some, or require the TUs to be placed in separate sections as GCC does it. (then it could treat each unit section as an indivisible blob)

Maybe this fits into quality of implementation - but I Think the presence of cross-unit references makes this a bit more of a matter for the standard as to how these groups are defined, where cross-CU references are resolved relative to, how can type units be dropped (or not), etc.

I'm sort of leaning towards "ref_addr offsets are resolved relative to the widest range of CUs in a single section that contains the referring DIE" - though that is a bit of a mouthful/awkward thing to implement.

- Dave

--paulr
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.dwarfstd.org/private.cgi/dwarf-discuss-dwarfstd.org/attachments/20170505/b50f5da4/attachment-0001.htm>