[Dwarf-Discuss] Interaction between aranges and unit proposals

Wed Apr 2 10:18:49 GMT 2014

Hi Eric,

On Tue, 2014-04-01 at 16:51 -0700, Eric Christopher wrote:
> On Tue, Apr 1, 2014 at 4:38 AM, Mark Wielaard <mjw at redhat.com> wrote:
> > Is there a way to reconcile these proposals so they keep the benefit of
> > both (quick/complete address scan without having to load/parse bulk data
> > and simplifying the DWARF data structures by combining various units in
> > one section)?
> >
> Absolutely a fan. Knowing what various consumers need is going to be
> key for any tables to speed up access.

So for the .debug_aranges table the two proposals try to make it
possible for a consumer to quickly create a table of address ranges that
describe which part of the .debug_info might be needed to read when an
address is encountered without having to actually read any of
the .debug_info/abbrev at all (if possible). There are two reasons this
currently cannot be done.

First producers often just skip generating an aranges entry for units
that don't cover any addresses, so you'll don't know whether it was just
not generated in the first place or really is empty. That is what issue
100430.2 tries to address, GCC was changed to follow this
recommendation.

Secondly you can sadly not be sure that all producers follow the
previous recommendation (it is deemed a quality of service matter
whether an aranges entry is generated for a CU) so if you have a module
that combined the output of various producers you need a way to check
they all really produced aranges entries for all the units. That is what
issue 100430.1 tries to address. By adding a unit length field like
other tables have you can just scan the aranges headers, check there are
no gaps of uncovered debug_info data and not have to even try to load
the .debug_info/.debug_abbrev data in that case. Of course if you do
find a gap you still need to read in and scan through all the unit data
itself, but at least you know you are doing it on purpose and only for
those modules that were generated by producers that don't generate
aranges for all units. GDB noticed this really matters for larger
programs with lots of modules, just having to map in all and scan
through the .debug sections you might not need creates a big (startup)
delay.

> > One way might be to reverse the last proposal. Instead of removing the
> > aranges for type units (which did indeed not make much sense in the
> > split .debug_info/.debug_type approach), add an empty aranges header if
> > a type unit appears in .debug_info in the way of the second proposal for
> > address-less CUs.
> >
> We could do this, but I think adding one for every type unit would be
> a bit wasteful. Since type units are going to have a flag in the
> header would it be possible for you to notice that when looking
> through the units? I'm not sure how you know that you have complete
> coverage so I'm just throwing out words here, could you provide a bit
> of a description of how this works for me if you don't mind?

You are right. It certainly is a trade-off. The goal is to not have to
read any of the unit data if at all possible. With the type units
separate in .debug_types that was easy.

Maybe the solution is to have an alternate .debug_aranges header just
for empty units that is as small as possible? Or reuse the existing
header fields as "flag"? Maybe have the proposed header format of issue
100430.1 but if address_size and segment_size are both zero then no
address range descriptor will be added and that headers signals a
"no-address" unit?

Cheers,

Mark