[Dwarf-Discuss] Fission + cross-CU references (ref_addr)

Thu May 4 21:41:27 GMT 2017

On Thu, May 4, 2017 at 2:26 PM Robinson, Paul <paul.robinson at sony.com>
wrote:

>  In a split-DWARF scenario producing multiple CUs, it's clear that each
> split-full unit in the .dwo file would need a corresponding skeleton unit
> in the .o file,
>
> Eric was tossing around an idea that would diverge from that - having a
> single skeleton for the whole DWO file (where the DWO file would contain
> multiple CUs - though the specifics there we hadn't hashed out - maybe
> every CU having the same DWO ID or the like), which could reduce the size
> of the debug info in object files in these situations. (LLVM produces a
> single debug_addr/debug_ranges/etc in Fission anyway - so every CU in a
> fission object file would include the same addr_base, ranges_base, the same
> abbrev offset, etc anyway)
>
> Skeleton units are pretty small; it's a 20-byte header, plus values for
> the compile_unit DIE, which is spec'd to have no children.  I would not be
> concerned about space there.  And having unique DWO IDs per unit seems
> pretty useful.
>

I tend to agree, though - what sort of uses do you have in mind?

I don't see any particular benefit to having separate skeletons/DWO IDs for
ecah CU in a single DWO file. The (GNU) pubnames & pubtypes that are used
to produce a gdb_index might have to be modified a bit as they're probably
CU-relative currently, but would have to be "DWO info chunk" relative
instead, which might require a fair bit of work. & having them per CU does
reduce the workload for the consumer a little - consumer doesn't have to
build a table of CU ranges for a DWO info chunk and then find which CU
range the offset falls in to figure out which CU to deserialize.

Other ideas/uses/issues to consider?

>  Wrinkle 1: I think binutils DWP currently drops duplicate units (units
> with the same DWO ID). With this change, that wouldn't be possible - or at
> least all the /bytes/ of the DWO would have to be imported regardless, and
> in a contiguous chunk, so that cross-CU references would resolve correctly
> (if you had a DWO with 3 CUs in it, the middle of which turned out to be
> duplicate & was dropped, then the offset from a DIE in CU 1 to a DIE in CU
> 3 (now 2) would be broken). I think that's probably OK - such a DWP can
> only have one entry in the cu_index for that signature - but it can't drop
> the bytes anymore... *shrug*
>
>
> That's true, each .debug_info section that had at least one retained
> split-full unit would have to be brought in completely.
>

*nod* (good clarification - as you say, a whole DWO could be dropped, but
if any CU in it is retained, all the CUs must be)

>
>
> Wrinkle 2: Type units go in the debug_info section, but you really do want
> to be able to drop duplicate type units when creating a DWP (that being the
> point of type units). So maybe require that DWO files have all the CUs
> first, then the TUs? and the INFO_FILE range only applies to the range over
> the CUs? This hurts/walks back the unification of CUs and TUs by special
> casing, unfortunately... - other ideas?
>
> In non-split DWARF, type units are spec'd to have their own object-file
> section contributions, separate from the compile unit(s);
>

That's sort of an implementation detail though, isn't it? DWARF just talks
about bytes in sections (type units go in the debug_types section (or, now,
the debug_info section)) and, yeah, you can use comdat groups and separate
chunks of debug_types sections to deduplicate them, but I don't think DWARF
requires/speaks about that, does it?

> that's what lets type units have a COMDAT key and be uniqued by the
> linker, even though all those separate contributions have the same section
> name.  (In ELF, you have multiple section headers with the same section
> name.)  Surely the DWO file could be (is?) done the same way, with each
> type unit in its own contribution to the .debug_info section?
>

Funny story about that...

Back when I implemented Clang/LLVM's type unit support, then implemented
the Fission+type unit support - when I read the Fission spec/proposal and
it said "type units don't need to be in comdat groups" I did the natural
thing, removed the comdating, and LLVM naturally produced a single
debug_types section containing all the types.

This broke the binutils dwp tool at the time - since it was still expecting
separate debug_types sections, one for each type. That was fixed and we
carried on...

At some point I noticed that LLVM's .dwo files not only were smaller than
GCC's for other reasons, but they got /much/ smaller when
compress-debug-sections was enabled. LLVM's debug info was somehow more
compressible than GCC's.

It wasn't until 2-3 years later and I was implementing the llvm-dwp
prototype that it struck me: By placing each type in a separate debug_types
section, each type was compressed separately - not nearly as efficient as
compressing them all together.

So one of the reasons GCC's debug info was bigger than Clang's was because
it was less compressible due to these boundaries between types. (it could,
arguably, make the DWP tool more efficient by allowing it to uncompress
only a single type at a time - rather than having to uncompress teh whole
debug_types section in one go, though)

>   Then the packager can incorporate or drop each of those contributions
> independently.  Being separate .debug_info lumps, the CUs are already on
> their own, and this Wrinkle goes away.
>

Not sure I quite follow this - now in a .dwo file there will be some number
of debug_info chunks - arguably one for all the CUs and one for each TU
(notwithstanding the suboptimality of the compressibility of such a
representation) - still there would need to be a special case where the
TU's debug_info chunk would have an INFO_FILE contribution that represente
the CU chunk. So a DWP tool would have to special case the info chunk that
contained the CUs (& would have to require that there be only one if there
are to be TU->CU references. I suppose if TU->CU references aren't
supported then you could actually emit chunks of CUs at whatever
granularity you like - that could then allow some multi-CU dwos to still
haev CUs linked in independently, rather than all having to be pulled in
together, if any are used (LLVM probably wouldn't use this, except at some
weird -O0 LTO mode (for that whole program devirt/analysis mode, maybe) I
guess))

- Dave

>
>
> --paulr
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.dwarfstd.org/private.cgi/dwarf-discuss-dwarfstd.org/attachments/20170504/3165168b/attachment.htm>