[Dwarf-Discuss] Fission + cross-CU references (ref_addr)

Fri May 5 00:30:11 GMT 2017

On Thu, May 4, 2017 at 5:05 PM Robinson, Paul <paul.robinson at sony.com>
wrote:

> Skeleton units are pretty small; it's a 20-byte header, plus values for
> the compile_unit DIE, which is spec'd to have no children.  I would not be
> concerned about space there.  And having unique DWO IDs per unit seems
> pretty useful.
>
> I tend to agree, though - what sort of uses do you have in mind?
>
>
>
> A unique DWO ID per unit lets each DWO unit have a distinct entry in the
> index? saves the consumer the trouble of having to read the .debug_info
> section to find the units.
>

Yep

>   If you want to require consumers to do more work, you can make DWO IDs
> be per-file instead of per-unit, and then there's no need for an INFO_FILE
> column because the INFO column would necessarily have to cover the entire
> .debug_info section from that file.
>

Yep - time/space tradeoff, and I'd probably err on the side of time myself
(by having separate skeletons and cu_index entries for each CU) as you've
suggested. Just floating the other as an alternative since it did come up.

>
>
> In non-split DWARF, type units are spec'd to have their own object-file
> section contributions, separate from the compile unit(s);
>
> That's sort of an implementation detail though, isn't it? DWARF just talks
> about bytes in sections (type units go in the debug_types section (or, now,
> the debug_info section)) and, yeah, you can use comdat groups and separate
> chunks of debug_types sections to deduplicate them, but I don't think DWARF
> requires/speaks about that, does it?
>
> Actually DWARF 5 Appendix E does describe this; not as a required tactic,
> but a way to achieve the useful effect of deduplicating type units etc.  So
> yes it was overstating the case to say they are _spec'd_ to have their own
> contributions, but that would be what a producer would normally do.
>

Ah, cool - thanks for the pointer about where the wording is.

>
>
> that's what lets type units have a COMDAT key and be uniqued by the
> linker, even though all those separate contributions have the same section
> name.  (In ELF, you have multiple section headers with the same section
> name.)  Surely the DWO file could be (is?) done the same way, with each
> type unit in its own contribution to the .debug_info section?
>
> Funny story about that...
>
> Heh.  Which way works better (types all together, or each in their own
> contribution) depends on whether your packager wants to deduplicate in a
> linker-like way, based on COMDATs,
>

I think neither GCC nor Clang used COMDATs in DWO files - but GCC still put
them in separate sections (sections with the same name... ) - a weird beast
to me, but apparently it's a thing that works.

But yeah, for non-Fission, COMDATs seem solid, though do represent a
limitation on compressibility, etc.

> or in a purpose-built way, by looking at the type-unit signatures
> directly.  DWARF doesn't say you have to do one or the other, which
> provides implementation flexibility to the toolchain.  If your packager is
> willing to look through TUs for signatures and deduplicate that way, then
> you can stuff all the TUs into one section contribution and get better
> compression.  Quality of Implementation, as we like to say.
>

To be sure!

>
>
> still there would need to be a special case where the TU's debug_info
> chunk would have an INFO_FILE contribution that represente the CU chunk. So
> a DWP tool would have to special case the info chunk that contained the CUs
> (& would have to require that there be only one if there are to be TU->CU
> references. I suppose if TU->CU references aren't supported
>
>
>
> I don't think TU->CU references are permitted.
>

For now, with Fission, I agree.

I think without Fission you could certainly use ref_addr to refer to
something in a CU from a TU - /maybe/ even from a TU to another TU but I
don't think so (not sure if the linker would do the right thing about
reachability, etc - and if your TUs differed in layout, which they can even
in Clang, that wouldn't work out well if it picked a different TU and
either null'd out the ref_addr, or made it refer to the same offset in a
different copy of the type (I don't think any linker/reloc construct would
really result in this latter situation))

> Certainly you could not have a v4 split TU referencing a CU, that would be
> impossible.
>

v4 didn't have split things (Fission being a v5 feature, I think), did it?
What's the distinction you're drawing there?

  (Without relocations, you can't use DW_FORM_ref_addr to point from
> .debug_types to .debug_info; and DW_FORM_ref_sig8 is only for references to
> other type units.)  While you could engineer the possibility in v5, because
> type units have moved back into .debug_info and in principle you could
> arrange for DW_FORM_ref_addr to do that, I am morally certain there was no
> intent to allow that.
>

Right, I doubt there was any intent - but as we're choosing some new
representations, etc, I'm wondering if it's something to think about.

Even without the TU->XU reference question, the TU/CU unification still
means that a DWP creation tool would have to special case the CUs in some,
or require the TUs to be placed in separate sections as GCC does it. (then
it could treat each unit section as an indivisible blob)

Maybe this fits into quality of implementation - but I Think the presence
of cross-unit references makes this a bit more of a matter for the standard
as to how these groups are defined, where cross-CU references are resolved
relative to, how can type units be dropped (or not), etc.

I'm sort of leaning towards "ref_addr offsets are resolved relative to the
widest range of CUs in a single section that contains the referring DIE" -
though that is a bit of a mouthful/awkward thing to implement.

- Dave

> --paulr
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.dwarfstd.org/private.cgi/dwarf-discuss-dwarfstd.org/attachments/20170505/25820836/attachment.htm>