[Dwarf-Discuss] Including the DWO name in the CU hash

Wed May 31 20:28:51 GMT 2017

Ping?

I did end up hacking around this by hashing in the DWO name into the CU
hash if LLVM's producing more than one CU. It's not perfect (really it's
more about the ThinLTO importing stage - normal LTO doesn't need this sort
of mangling) but suffices for now:
http://llvm.org/viewvc/llvm-project?rev=304119&view=rev

But it seems to suffice for now.

On Fri, May 19, 2017 at 3:51 PM David Blaikie <dblaikie at gmail.com> wrote:

> some context:
>
> 1) A little while ago, I added the dwo_name to the dwo CU to improve
> diagnostic quality on CU hash collisions in during a dwp action (previously
> it could only report which input files (possibly other DWP files) contained
> the duplicates/collision which could be very manual to track back to the
> original input DWO files - having the original DWO names in the diagnostic
> made it relatively easy to track down).
>
> 2) LLVM's new ThinLTO presents a high chance of duplicate DWO CUs - it
> does this by creating effectively "new" CUs containing a stripped down
> version of an existing CU - containing only a handful of functions that may
> be relevant to optimizing some other CU. (imagine two CUs both using a
> single inline function from a 3rd CU - the 3rd CU's inline function and the
> basic CU itself is imported into the compilation steps of the other two CUs
> - so in the end you get two DWO files, each with two CUs, where one CU
> contains only an abstract definition of the inline function).
>
> My initial thinking here was that I could cross-pollinate the CU hash from
> each CU within a single compilation, since the primary CU would have enough
> uniqueness (hash all the CUs, then cross-hash them).
>
> But then I realized the CUs should already be unique because they include
> the dwo_name which will be different between the two stripped down CU
> clones. But the dwo_name isn't included in the hash - so I prototyped
> including it & it does what you'd expect.
>
> Extra wrinkle: Once the dwo_name is in the hash, then it defeats my
> original motivation for including it in the DWO CU in the first place: such
> CUs will never collide, so the name would never be useful for diagnostic
> quality.
>
> Should I drop the dwo_name from the DWO CU and manually/explicitly include
> it in the hash? Does cross pollination sound better? Should I only do
> either of these when dealing with more than one CU in a DWO? (in which case
> the diagnostic improvement would still be valid - it catches some
> interesting cases, but they're not /very/ interesting like major bugs (&
> does DWO ID collisions have some false positives too, which hashing the
> dwo_id would fix), etc... and the mechanism wasn't built for bug catching
> in any case)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.dwarfstd.org/pipermail/dwarf-discuss-dwarfstd.org/attachments/20170531/52e12b2b/attachment.htm>