[Dwarf-Discuss] debug_aranges use and overhead

Thu Feb 24 22:37:07 GMT 2022

On Thu, Feb 24, 2022 at 2:24 PM Samy Al Bahra <sbahra at repnop.org> wrote:

> Hi David
>
> I implemented some optimizations in the form of a specialized parser for
> fast AT_ranges scanning and performance is now comparable to lazy
> evaluation through .debug_aranges (only marginally worse assuming buffer
> cache warmed up). We've since shipped with these optimizations. I have to
> do some work in the same code base in March and will run a comparison then
> / share numbers here including after dropping buffers. If you would benefit
> from having them sooner, let me know and I'll make it happen.
>

No rush - just came across the thread and was curious if there were any
updates/closure/lessons to factor in, etc. I'm glad to hear you ended up
with fairly similar performance - that matches my expectation, that there
aren't some hidden scalability issues here. But yeah, curious to hear more
if/when you happen to have more to share.

- Dave

>
> On Thu, Feb 24, 2022 at 3:44 PM David Blaikie <dblaikie at gmail.com> wrote:
>
>> Hey Samy - curious if you ever happened to end up getting further details
>> here.
>>
>> On Fri, Apr 9, 2021 at 1:05 PM Samy Al Bahra <sbahra at repnop.org> wrote:
>>
>>> Thanks for the detailed response David.
>>>
>>> On Fri, Apr 9, 2021 at 2:52 PM David Blaikie <dblaikie at gmail.com> wrote:
>>>
>>>> I'm not suggesting scanning all of .debug_info - only the CU DIE for
>>>> DW_AT_ranges or high/low_pc, then skip to the next CU DIE (via the
>>>> unit header's next unit offset).
>>>>
>>>
>>>> It sounded like CU ranges couldn't be used to build such an index at
>>>> all/that your code used quite a different strategy in the absence of
>>>> aranges? (rather than building the index from the CU ranges - somewhat
>>>> slower I'm sure, but I wouldn't've thought (& am trying to understand
>>>> if it is/why) so fundamentally slower that it wouldn't be the next
>>>> fallback rather than skipping the index entirely or employing some
>>>> more fundamentally different approach)
>>>
>>>
>>> This is still significantly less dense than aranges, involves more disk
>>> I/O and memory pressure. Let me see what optimizations I can implement here
>>> and get back to you with the results / what I came up with. This would be a
>>> better basis for apples to apples comparison.
>>>
>>>
>>>>
>>>> If you mean building ranges from all the DIEs deep inside a CU - yeah,
>>>> that's going to be fundamentally slower in a bunch of ways that maybe
>>>> I could see that would necessitate a totally different approach/that
>>>> the index wouldn't make sense anymore (though I'd still like to
>>>> understand it) - but I'm especially curious about the case where the
>>>> CU DIE itself does have comprehensive address range information.
>>>>
>>>
>>> Will report back on this.
>>>
>>>
>>>>
>>>> - Dave
>>>>
>>>> >
>>>> >>
>>>> >>
>>>> >>>
>>>> >>> (+ complexities Greg mentions later in the thread). In cases where
>>>> we lack this, we use our own persistent cache which introduces unnecessary
>>>> complexity. Now I am considering going as far as adding a multi-threaded
>>>> indexer for cases where a persistent cache / build system modifications
>>>> aren't an option (work to begin in the next week or two).
>>>> >>>
>>>> >>> .debug_aranges would provide a lot of value to our users.
>>>> >>>
>>>> >>> On Thu, Mar 11, 2021 at 3:48 PM David Blaikie via Dwarf-Discuss <
>>>> dwarf-discuss at lists.dwarfstd.org> wrote:
>>>> >>>>
>>>> >>>> On Thu, Mar 11, 2021 at 5:48 AM <paul.robinson at sony.com> wrote:
>>>> >>>>>
>>>> >>>>> Hopefully not to side-track things too much... maybe wants its own
>>>> >>>>> thread, if there's more to debate here.
>>>> >>>>
>>>> >>>>
>>>> >>>> Yeah, how about we spin it off into another thread (done here)
>>>> >>>>
>>>> >>>>>
>>>> >>>>> >> For the case you suggested where it would be useful to keep
>>>> the range
>>>> >>>>> >> list for the CU in the .o file, I think .debug_aranges is what
>>>> you're
>>>> >>>>> >> looking for.
>>>> >>>>> >
>>>> >>>>> > aranges has been off by default in LLVM for a while - it adds a
>>>> lot of
>>>> >>>>> > overhead (doesn't have all the nice rnglist encodings for
>>>> instance -
>>>> >>>>> > nor can it use debug_addr, and if it did it'd still be
>>>> duplicate with
>>>> >>>>> > the CU ranges wherever they were).
>>>> >>>>>
>>>> >>>>> Did you want to file an issue to improve how .debug_aranges works?
>>>> >>>>
>>>> >>>>
>>>> >>>> I don't currently understand the value it provides, and I at least
>>>> don't have a use case for it, so I'm not sure I'd be the best person to
>>>> advocate/drive that work.
>>>> >>>>
>>>> >>>>> Complaining that it duplicates CU ranges is missing the point,
>>>> though;
>>>> >>>>> it's an index, like .debug_names, of course it duplicates other
>>>> info.
>>>> >>>>> If you want to suggest an improved index, like we did with
>>>> .debug_names,
>>>> >>>>> that would be great too.
>>>> >>>>
>>>> >>>>
>>>> >>>> .debug_names is quite different though - it collects information
>>>> from across the DIE tree - information that is expensive to otherwise
>>>> gather (walking the whole DIE tree).
>>>> >>>>
>>>> >>>> .debug_aranges is not like that for most producers (producers that
>>>> do include the address ranges on the CU DIE) - the data is readily
>>>> available immediately on the CU. That does involve reading some of
>>>> .debug_abbrev, and interpreting a handful of attributes - but at least for
>>>> the use cases I'm aware of, that overhead isn't worth the size increase.
>>>> >>>>
>>>> >>>> Do you have numbers on the benefits of .debug_aranges compared to
>>>> parsing the ranges from CU DIEs?
>>>> >>>>
>>>> >>>> (one possible issue: the CU doesn't /have/ to contain
>>>> low/high/ranges if its children DIEs contain addresses - having that as a
>>>> guarantee, or some preferred way of encoding zero length (high/low of 0
>>>> would be acceptable, I guess) would be nice & make it cheap to skip over
>>>> CUs that don't have any address ranges)
>>>> >>>>
>>>> >>>> Roughly, a modern debug_aranges to me would look something like:
>>>> >>>>
>>>> >>>> <length>
>>>> >>>> <version>
>>>> >>>> <CU sec_offset>
>>>> >>>> <addr_base>
>>>> >>>> <rnglist sec_offset>
>>>> >>>>
>>>> >>>> So it could fully re-use the rnglist encoding. If this was going
>>>> to be as compact as possible, it'd need to be configurable which encodings
>>>> it uses - ranges V high/low, addrx V addr - at which point it'd probably
>>>> look like a small DIE with an inline abbrev (similar to the way DWARFv5
>>>> encodes the file and directory entries now, and how debug_names is
>>>> self-describing) - at which point it looks to me a lot like parsing the CU
>>>> DIEs.
>>>> >>>>
>>>> >>>> _______________________________________________
>>>> >>>> Dwarf-Discuss mailing list
>>>> >>>> Dwarf-Discuss at lists.dwarfstd.org
>>>> >>>> http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> --
>>>> >>> Samy Al Bahra [http://repnop.org]
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Samy Al Bahra [http://repnop.org]
>>>>
>>>
>>>
>>> --
>>> Samy Al Bahra [http://repnop.org]
>>>
>>
>
> --
> Samy Al Bahra [http://repnop.org]
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.dwarfstd.org/pipermail/dwarf-discuss-dwarfstd.org/attachments/20220224/fd375526/attachment-0001.html>