[Dwarf-Discuss] Including the DWO name in the CU hash

Tue Jun 27 18:47:06 GMT 2017

ping, in case anyone's got some ideas here

On Wed, May 31, 2017 at 1:28 PM David Blaikie <dblaikie at gmail.com> wrote:

> Ping?
>
> I did end up hacking around this by hashing in the DWO name into the CU
> hash if LLVM's producing more than one CU. It's not perfect (really it's
> more about the ThinLTO importing stage - normal LTO doesn't need this sort
> of mangling) but suffices for now:
> http://llvm.org/viewvc/llvm-project?rev=304119&view=rev
>
> But it seems to suffice for now.
>
>
> On Fri, May 19, 2017 at 3:51 PM David Blaikie <dblaikie at gmail.com> wrote:
>
>> some context:
>>
>> 1) A little while ago, I added the dwo_name to the dwo CU to improve
>> diagnostic quality on CU hash collisions in during a dwp action (previously
>> it could only report which input files (possibly other DWP files) contained
>> the duplicates/collision which could be very manual to track back to the
>> original input DWO files - having the original DWO names in the diagnostic
>> made it relatively easy to track down).
>>
>> 2) LLVM's new ThinLTO presents a high chance of duplicate DWO CUs - it
>> does this by creating effectively "new" CUs containing a stripped down
>> version of an existing CU - containing only a handful of functions that may
>> be relevant to optimizing some other CU. (imagine two CUs both using a
>> single inline function from a 3rd CU - the 3rd CU's inline function and the
>> basic CU itself is imported into the compilation steps of the other two CUs
>> - so in the end you get two DWO files, each with two CUs, where one CU
>> contains only an abstract definition of the inline function).
>>
>> My initial thinking here was that I could cross-pollinate the CU hash
>> from each CU within a single compilation, since the primary CU would have
>> enough uniqueness (hash all the CUs, then cross-hash them).
>>
>> But then I realized the CUs should already be unique because they include
>> the dwo_name which will be different between the two stripped down CU
>> clones. But the dwo_name isn't included in the hash - so I prototyped
>> including it & it does what you'd expect.
>>
>> Extra wrinkle: Once the dwo_name is in the hash, then it defeats my
>> original motivation for including it in the DWO CU in the first place: such
>> CUs will never collide, so the name would never be useful for diagnostic
>> quality.
>>
>> Should I drop the dwo_name from the DWO CU and manually/explicitly
>> include it in the hash? Does cross pollination sound better? Should I only
>> do either of these when dealing with more than one CU in a DWO? (in which
>> case the diagnostic improvement would still be valid - it catches some
>> interesting cases, but they're not /very/ interesting like major bugs (&
>> does DWO ID collisions have some false positives too, which hashing the
>> dwo_id would fix), etc... and the mechanism wasn't built for bug catching
>> in any case)
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.dwarfstd.org/pipermail/dwarf-discuss-dwarfstd.org/attachments/20170627/30b2ab6f/attachment.htm>