[Dwarf-Discuss] string reduction techniques

David Blaikie dblaikie@gmail.com
Mon Nov 1 20:06:33 GMT 2021


Hey Todd,

Just some details regarding the string reduction strategies I'm pursuing to
address DWARF32 overflowing .debug_str.dwo/.debug_str_offsets.dwo sections
in some large binaries at Google.

So the extreme cases I'm dealing with are predominantly C++ Expression
templates (in TensorFlow and Eigen) - these produce types with very large
DW_AT_names ("f1<int>") and DW_AT_linkage_names (eg: "_Z2f1IiEvv") (but
with many more template parameters, none of which are ever user-written but
deduced).

So the main fix I'm pursuing (roughly called "simplified template names")
is to omit template parameter lists from DW_AT_names of templates in most
cases, allowing the consumer to reconstruct the name from
DW_AT_template_*_parameters itself, recursively. Further discussion and
details here:
https://groups.google.com/g/llvm-dev/c/ekLMllbLIZg/m/-dhJ0hO1AAAJ - in
terms of how this affects scaling factors, it means that adding an
additional template instantiation of existing types would add no new data
to .debug_str (eg: going from a program with "t1<int>" to "t1<t1<int>>"
would add no new entries to .debug_str). Not all names can be readily
reconstructed - so I'm opting the feature out on those, but we could have a
more deeper discussion about how to handle them if we wanted to make this a
full-fledged/robust feature (maybe one the DWARF spec suggests/encourages).

GDB seems to handle this sort of debug info OK - I guess someone did real
work to support that at some point (so maybe some other debugger already
generates DWARF like this).


The other half, though, is DW_AT_linkage_names - and in theory similar
rebuilding could be done, but that'd require baking a lot fo
implementation knowledge into the DWARF Consumer that DWARF is meant to
help avoid... so I'm unsure what the right solution is there just now, but
there's a few ideas I'm still kicking around. At least linkage names have
less redundancy (within a single name they avoid redundancy - "t1<t1<int>,
t1<int>>" only ends up with a single description of "t1<int>" instead of
two of them like you get with the DW_AT_name) than DW_AT_names, so they do
scale a bit better already.

Happy to discuss these ideas in specific, or their impact on debug_str
growth in more detail any time (here, video chat, discords, etc).

- Dave
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.dwarfstd.org/pipermail/dwarf-discuss-dwarfstd.org/attachments/20211101/98b31f5a/attachment.html>



More information about the Dwarf-discuss mailing list