[Dwarf-Discuss] debug_aranges use and overhead

Fri Mar 12 00:58:13 GMT 2021

On Thu, Mar 11, 2021 at 4:29 PM Greg Clayton <clayborg at gmail.com> wrote:

>
>
> On Mar 11, 2021, at 1:12 PM, Paul Robinson via Dwarf-Discuss <
> dwarf-discuss at lists.dwarfstd.org> wrote:
>
> Tom Russell could perhaps speak to this better, but my understanding is
> that our debugger guys like having .debug_aranges, because parsing the CU
> DIE does take that extra effort.  I am unfamiliar with their code so I have
> to take their word on it.  But I can certainly imagine that probing
> hundreds to thousands of CUs in order to collect range information with
> lengthy range lists would be more expensive than running through a
> comparatively compact .debug_aranges list.  If Tom tells me I?m wrong,
> well, wouldn?t be the first time.
>
>
> We will use them if they are there, but one interesting issue that we ran
> into with LLDB is some compile units might be in .debug_aranges because the
> compiler made a .debug_aranges section in the .o file, but others might
> not. So we had to add code to LLDB to figure out which compile units have
> any entries in the .debug_aranges section, and read the DW_AT_ranges from
> the DW_TAG_compile_unit if it exist, and if it doesn't, manually index the
> DWARF to create one on the fly each time.
>
>
> One thing we have encountered (see issue 210113.1) is that when we?ve done
> dead-stripping, .debug_aranges entries (one per function, typically,
> because -ffunction-sections) can end up pointing to nothing.  In our
> proprietary linker I believe we compress/rewrite .debug_aranges to minimize
> the number of entries, which by coincidence ends up producing a conforming
> aranges list; LLD doesn?t do that, which means it produces a non-conforming
> list (with zero-length entries), hence the issue.
>
> I?ll have to think about what a ?modern? .debug_aranges might want to look
> like.
>
>
> A big issue with any of the DWARF sections is we are subject to making the
> contents work with linkers that just want to concatenate + relocate. This
> often leads to information being kept around when dead stripping occurs
> because anything that is dead stripped will just have its address zero'ed
> out or -1'ed out, but this bogus info is still in the data.
>

Yeah, we talked some last year about formalizing this more into the -1
tombstone - I thought maybe Paul had proposed that for standardization,
though at a glance I don't see the proposal. It's probably somewhere there.

> If we don't need a format that can simply be concatenated and relocated,
> the GSYM format, which is open sourced in llvm.org already, might be good
> inspiration for a .debug_aranges successor section that has very efficient
> lookups. The GSYM format could actually be used as is by adding only a new
> DIE offset IntoType.
>
> Besides ".debug_names", all other DWARF accelerator tables are really just
> random indexes that must be linearly scanned or pre-indexed prior to being
> used because of the concatenate + relocate style that is used for these
> DWARF sections. It would be great if any future accelerator tables are "map
> into memory and use as is" kind of tables like ".debug_names" and the
> ".apple_XXX" name accelerator tables.
>

Ah, fair point - could come up with a rather different structure if it were
designed for fast on-disk query (though then, like .debug_names (which I
don't think we have any linkers that can link today, for instance), you'd
probably /really/ want it to be linked in a content-aware manner, because
probing separate lookup tables (even if they're more designed for that)
per-CU doesn't probably gain you a lot).

- Dave

>
>
> Thanks,
> --paulr
>
> *From:* David Blaikie <dblaikie at gmail.com>
> *Sent:* Thursday, March 11, 2021 3:48 PM
> *To:* Robinson, Paul <paul.robinson at sony.com>
> *Cc:* Cary Coutant <ccoutant at gmail.com>; DWARF Discuss <
> dwarf-discuss at lists.dwarfstd.org>
> *Subject:* debug_aranges use and overhead
>
> On Thu, Mar 11, 2021 at 5:48 AM <paul.robinson at sony.com> wrote:
>
> Hopefully not to side-track things too much... maybe wants its own
> thread, if there's more to debate here.
>
>
> Yeah, how about we spin it off into another thread (done here)
>
>
> >> For the case you suggested where it would be useful to keep the range
> >> list for the CU in the .o file, I think .debug_aranges is what you're
> >> looking for.
> >
> > aranges has been off by default in LLVM for a while - it adds a lot of
> > overhead (doesn't have all the nice rnglist encodings for instance -
> > nor can it use debug_addr, and if it did it'd still be duplicate with
> > the CU ranges wherever they were).
>
> Did you want to file an issue to improve how .debug_aranges works?
>
>
> I don't currently understand the value it provides, and I at least don't
> have a use case for it, so I'm not sure I'd be the best person to
> advocate/drive that work.
>
> Complaining that it duplicates CU ranges is missing the point, though;
> it's an index, like .debug_names, of course it duplicates other info.
> If you want to suggest an improved index, like we did with .debug_names,
> that would be great too.
>
>
> .debug_names is quite different though - it collects information from
> across the DIE tree - information that is expensive to otherwise gather
> (walking the whole DIE tree).
>
> .debug_aranges is not like that for most producers (producers that do
> include the address ranges on the CU DIE) - the data is readily available
> immediately on the CU. That does involve reading some of .debug_abbrev, and
> interpreting a handful of attributes - but at least for the use cases I'm
> aware of, that overhead isn't worth the size increase.
>
> Do you have numbers on the benefits of .debug_aranges compared to parsing
> the ranges from CU DIEs?
>
> (one possible issue: the CU doesn't /have/ to contain low/high/ranges if
> its children DIEs contain addresses - having that as a guarantee, or some
> preferred way of encoding zero length (high/low of 0 would be acceptable, I
> guess) would be nice & make it cheap to skip over CUs that don't have any
> address ranges)
>
> Roughly, a modern debug_aranges to me would look something like:
>
> <length>
> <version>
> <CU sec_offset>
> <addr_base>
> <rnglist sec_offset>
>
> So it could fully re-use the rnglist encoding. If this was going to be as
> compact as possible, it'd need to be configurable which encodings it uses -
> ranges V high/low, addrx V addr - at which point it'd probably look like a
> small DIE with an inline abbrev (similar to the way DWARFv5 encodes the
> file and directory entries now, and how debug_names is self-describing) -
> at which point it looks to me a lot like parsing the CU DIEs.
>
> _______________________________________________
> Dwarf-Discuss mailing list
> Dwarf-Discuss at lists.dwarfstd.org
> http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.dwarfstd.org/pipermail/dwarf-discuss-dwarfstd.org/attachments/20210311/c0329516/attachment-0001.html>