[Dwarf-Discuss] debug_aranges use and overhead

Fri Apr 9 18:12:55 GMT 2021

Responses inline.

On Fri, Mar 19, 2021 at 9:59 PM David Blaikie <dblaikie at gmail.com> wrote:

> On Fri, Mar 19, 2021 at 9:34 AM Samy Al Bahra <sbahra at repnop.org> wrote:
>

[...]

> This is quite old (excuse the formatting) but numbers are here:
>> https://engineering.backtrace.io/2014-09-15-bt-lightweight-backtrace-tool/
>> , search for "Chromium".  This is something other debuggers can take
>> advantage of if they run in a non-interactive / batch mode (think bulk
>> processing of millions - billions of dumps a month)
>>
>
> "This is something... " - what is "this" you're referring to there? Lazy
> loading? Yeah, for sure. Why do you restrict/suggest that a highly lazy
> approach would only be suitable for non-interactive/batch execution?
>

This is quite old, this = blog post.

This is something other debuggers can take advantage of: Lazy loading is
more effective for automated analysis tools than interactive debuggers
which more often than not don't benefit from lazy evaluation if folks are
expecting auto-complete for types, variables, etc... Of course, it is still
useful for non-blocking loads of debug data especially if you implement
job cancellation (allow commands to be executed concurrently while loading
is being completed).

[...]

>
>
>> I'm also happy to run benchmarks for you with and without .debug_aranges
>> on top of our debugger if it'll be useful.
>>
>
> Yeah, I'd certainly be curious if you have a chance! Though it may depend
> a bit on what your implementation does in the absence of .debug_aranges.
>

I'll get back to you on this shortly!

>
>
>> One of the crucial optimizations we made is incremental indexing on top
>> of .debug_aranges based on PC values
>>
>
> Could you explain that in more detail - and why that approach can't be
> used with CU ranges?
>

.debug_aranges is significantly smaller and faster to load than scanning
all of .debug_info.

>
>
>> (+ complexities Greg mentions later in the thread). In cases where we
>> lack this, we use our own persistent cache which introduces unnecessary
>> complexity. Now I am considering going as far as adding a multi-threaded
>> indexer for cases where a persistent cache / build system modifications
>> aren't an option (work to begin in the next week or two).
>>
>> .debug_aranges would provide a lot of value to our users.
>>
>> On Thu, Mar 11, 2021 at 3:48 PM David Blaikie via Dwarf-Discuss <
>> dwarf-discuss at lists.dwarfstd.org> wrote:
>>
>>> On Thu, Mar 11, 2021 at 5:48 AM <paul.robinson at sony.com> wrote:
>>>
>>>> Hopefully not to side-track things too much... maybe wants its own
>>>> thread, if there's more to debate here.
>>>>
>>>
>>> Yeah, how about we spin it off into another thread (done here)
>>>
>>>
>>>> >> For the case you suggested where it would be useful to keep the range
>>>> >> list for the CU in the .o file, I think .debug_aranges is what you're
>>>> >> looking for.
>>>> >
>>>> > aranges has been off by default in LLVM for a while - it adds a lot of
>>>> > overhead (doesn't have all the nice rnglist encodings for instance -
>>>> > nor can it use debug_addr, and if it did it'd still be duplicate with
>>>> > the CU ranges wherever they were).
>>>>
>>>> Did you want to file an issue to improve how .debug_aranges works?
>>>>
>>>
>>> I don't currently understand the value it provides, and I at least don't
>>> have a use case for it, so I'm not sure I'd be the best person to
>>> advocate/drive that work.
>>>
>>> Complaining that it duplicates CU ranges is missing the point, though;
>>>> it's an index, like .debug_names, of course it duplicates other info.
>>>> If you want to suggest an improved index, like we did with .debug_names,
>>>> that would be great too.
>>>>
>>>
>>> .debug_names is quite different though - it collects information from
>>> across the DIE tree - information that is expensive to otherwise gather
>>> (walking the whole DIE tree).
>>>
>>> .debug_aranges is not like that for most producers (producers that do
>>> include the address ranges on the CU DIE) - the data is readily available
>>> immediately on the CU. That does involve reading some of .debug_abbrev, and
>>> interpreting a handful of attributes - but at least for the use cases I'm
>>> aware of, that overhead isn't worth the size increase.
>>>
>>> Do you have numbers on the benefits of .debug_aranges compared to
>>> parsing the ranges from CU DIEs?
>>>
>>> (one possible issue: the CU doesn't /have/ to contain low/high/ranges if
>>> its children DIEs contain addresses - having that as a guarantee, or some
>>> preferred way of encoding zero length (high/low of 0 would be acceptable, I
>>> guess) would be nice & make it cheap to skip over CUs that don't have any
>>> address ranges)
>>>
>>> Roughly, a modern debug_aranges to me would look something like:
>>>
>>> <length>
>>> <version>
>>> <CU sec_offset>
>>> <addr_base>
>>> <rnglist sec_offset>
>>>
>>> So it could fully re-use the rnglist encoding. If this was going to be
>>> as compact as possible, it'd need to be configurable which encodings it
>>> uses - ranges V high/low, addrx V addr - at which point it'd probably look
>>> like a small DIE with an inline abbrev (similar to the way DWARFv5 encodes
>>> the file and directory entries now, and how debug_names is self-describing)
>>> - at which point it looks to me a lot like parsing the CU DIEs.
>>>
>>> _______________________________________________
>>> Dwarf-Discuss mailing list
>>> Dwarf-Discuss at lists.dwarfstd.org
>>> http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
>>>
>>
>>
>> --
>> Samy Al Bahra [http://repnop.org]
>>
>

-- 
Samy Al Bahra [http://repnop.org]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.dwarfstd.org/pipermail/dwarf-discuss-dwarfstd.org/attachments/20210409/5632bcb9/attachment.html>