[Dwarf-Discuss] string reduction techniques

Tue Nov 9 05:52:38 GMT 2021

> On Nov 7, 2021, at 12:36 PM, Todd Allen <todd.allen at concurrent-rt.com> wrote:
> 
> Just spitballing an idea here, but would there be value in a new DW_FORM (or
> two) that referenced the names from .strtab or .dynstr, instead of .debug_str?
> It would only work if the symbols already were there, but I would expect that
> for many/most/all(?) functions defined in the compilation unit.  It does
> somewhat relegate this to being Someone Else's Problem, but given that the
> .strtab already has the problem of zillions of these huge symbols, maybe that's
> not so bad?
> 
> Maybe, if that's too onerous for tools that need to manipulate .strtab, it could
> reference them indirectly through a .debug_strtab_offsets section similar to
> .debug_str_offsets.

Interesting idea! One issue is if someone strips the binary, this could end up stripping local symbols that have mangled names that the DWARF refers to and cause the DW_FORM values to point to invalid offsets. 

> 
> On Tue, Nov 02, 2021 at 10:09:16AM -0700, Dwarf Discussion wrote:
>> On Mon, Nov 1, 2021 at 7:14 PM Greg Clayton via Dwarf-Discuss
>> <[1]dwarf-discuss at lists.dwarfstd.org> wrote:
>> 
>>   LLDB also uses mangled names. The clang compiler is our expression
>>   parser and it always tries to resolve symbols during compilation/JIT and
>>   it supplies mangled names when looking for functions to resolve when it
>>   JITs code up. It is nice to be able to do quick name lookups using these
>>   mangled names to find the address of the function. That being said, we
>>   could work around it. Not sure how easy that would be though as mangled
>>   names can end up demangling to the same name with some loss of
>>   information and it would be important to be able to find the right in
>>   charge or out of charge constructor when the compiler asks for a
>>   specific symbol using the mangled name. We have more uses of mangled
>>   names but most of them relate to parsing the symbol tables, so removing
>>   them from DWARF wouldn't affect those areas.
>>   I wonder if these is a way to have a DW_AT_partial_linkage_name that
>>   relies on the decl context of a DIE. Like if you have a class "foo" in
>>   the global namespace it could have a DW_AT_partial_linkage_name with the
>>   value "_Z3foo". A DW_TAG_subprogram that is a child of this "foo" class
>>    inside this class could have another partial linkage name "3bari" that
>>   could be put together with the parent "_Z3foo" for a function like:
>>   Void foo::bar(int);
>>   Since many mangled names often start with the same prefix it might help
>>   reduce the string table size.
>> 
>> It's a thought - though I'm not sure how much that would really generalize
>> across different mangling schemes that use different mechanisms for
>> backreferences, etc. Or whether the return type should be included (it's
>> included for function templates in itanium mangling, for instance -
>> presumably also in MSVC mangling, but maybe some manglings include it even
>> in non-templates? I'm not sure) - since the partial linkage name for a
>> type would be context-insensitive (since it'd be attached to the type
>> rather than any use of the type) it'd be up to the consumer to fix that
>> up, eg:
>> 
>> [2]https://godbolt.org/z/TqYjeevqx
>> Itanium:
>>   f1<>(): _Z2f1IJEEvv
>>   f1<t1, t1>(): _Z2f1IJ2t1S0_S0_EEvv
>> MSVC:
>>   f1<>(): ??$f1@$$V@@YAXXZ
>>   f1<t1, t1>(): ??$f1 at Ut1@@U1 at U1@@@YAXXZ
>> I'm not sure how much less a consumer would know about mangling if it had
>> to know about how to assemble these things, insert backrefs, insert empty
>> list markers, etc - without having to know how to mangle a specific user
>> defined type or name, like "3foo" versus "@Ut1@"?
>> 
>>     On Nov 1, 2021, at 6:52 PM, Daniel Berlin via Dwarf-Discuss
>>     <[3]dwarf-discuss at lists.dwarfstd.org> wrote:
>>     Finally, a question i know the answer to!
>>     It brings us all the way back to when I was the C++ maintainer for
>>     GDB, which is the most ancient of history.
>>     Unfortunately, this a trip to a horrible place
>>     I actually spent a lot of time trying to make it so we didn't need
>>     linkage names, because, even then, they took up a *lot* of space.
>>     On Mon, Nov 1, 2021 at 8:35 PM Cary Coutant via Dwarf-Discuss
>>     <[4]dwarf-discuss at lists.dwarfstd.org> wrote:
>> 
>>>> I can't be sure about this exponential growth.  I don't have the
>>       data to back it
>>>> up.  But I will say, when we created DWARF64, I was skeptical
>>       that it would be
>>>> needed during my career.  And yet here we are...
>>> 
>>> Yep, still got mixed feelings about DWARF64 - partly the pieces
>>       that we're seeing with the need for some solutions for mixed
>>       DWARF32/64, etc, makes it feel like maybe it's not got a bit of
>>       "settling in" to do. And I'm still rather hopeful we might be able
>>       to reduce the overheads enough to avoid widespread use of DWARF64 -
>>       but it's not a sure thing by any means.
>> 
>>       Agreed. I'd like to explore as many avenues as we can to eliminate
>>       the
>>       need for DWARF64.
>> 
>>>> Honestly, I've never been sure why gcc generates
>>       DW_AT_linkage_name.  Our
>>>> debugger almost never uses it.  (There is one use to detect "GNU
>>       indirect"
>>>> functions.)  I wonder if it would be possible to avoid them if
>>       you provided
>>>> enough info about the template parameters, if the debugger had
>>       its own name
>>>> mangler.  I had to write one for our debugger a couple years ago,
>>       and it
>>>> definitely was a persnickety beast.  But doable with enough
>>       information.  Mind
>>>> you, I'm not sure there is enough information to do it perfectly
>>       with the state
>>>> of DWARF & gcc right now.
>>> 
>>> Yeah, that was/is certainly my first pass - the way I've done the
>>       DW_AT_name one is to have a feature in clang that produces the short
>>       name "t1" but then also embeds the template argument list in the
>>       name (like this: "_STNt1|<int>") - then llvm-dwarfdump will detect
>>       this prefix, split up the name, rebuild the original name as it
>>       would if it'd been given only the simple name ("t1") and compare it
>>       to the one from clang. Then I can run this over large programs and
>>       check everything round-trips correctly & in clang, classify any
>>       names we can't roundtrip so they get emitted in full rather than
>>       shortened.
>>> We could do something similar with linkage names - since to know
>>       there's some prior art in your work there.
>>> 
>>> I wouldn't be averse to considering what'd take to make DWARF
>>       robust enough to always roundtrip simple and linkage names in this
>>       way - I don't think it'd take a /lot/ of extra DWARF content.
>> 
>>       Fuzzy memory here, but as I recall, GCC didn't generate linkage
>>       names
>>       (or only did in some very specific cases) until the LTO folks
>>       convinced us they needed it in order to relate profile data back to
>>       the source. Perhaps if we came up with a better way of doing that,
>>       we
>>       could eliminate the linkage names.
>> 
>>     No, see, that's a mildly reasonable answer.
>>     If you go far enough back, the linkage names exist for a few reasons:
>>     1. Because the debug info wasn't always good enough, and so GDB used
>>     to demangle the linkage names and parse them using a hacked up C++-ish
>>     parser for type info.
>>     2. Even when it didn't, it decoded linkage names to detect things like
>>     destructors/constructors, etc.
>>     3. Because It used it to do remangling properly and try to generate
>>     method signatures to lookup (and for #1)
>>     4. Because it was used to do symbol lookup of in the ELF/etc symbol
>>     tables for static things/etc.
>>     5. Because it saved space in STABS to do #1 (they predate DWARF by
>>     far).
>>     If you checkout gdb source code, circa 2001, and search for things
>>     like check_stub_method, and follow all the things it calls (like
>>     gdb_mangle_name), you can learn the history of linkage names (and
>>     probably throw up in your mouth a little).
>>      If you do a case insensitive search for things like "physname" and
>>     "phys_name", you'll see all the places it used to use the linkage
>>     names.
>>     I spent a lot of time abstracting out things like the
>>     constructor/destructor name testing, vptr name finding, etc, so that
>>     someone later might have a chance to get rid of linkage names (it was
>>     also necessary because of the gcc 2.95->3.0 ABI change).
>> 
>>     _______________________________________________
>>     Dwarf-Discuss mailing list
>>     [5]Dwarf-Discuss at lists.dwarfstd.org
>>     [6]http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
>> 
>>   _______________________________________________
>>   Dwarf-Discuss mailing list
>>   [7]Dwarf-Discuss at lists.dwarfstd.org
>>   [8]http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
>> 
>> References
>> 
>> Visible links
>> 1. mailto:dwarf-discuss at lists.dwarfstd.org
>> 2. https://godbolt.org/z/TqYjeevqx
>> 3. mailto:dwarf-discuss at lists.dwarfstd.org
>> 4. mailto:dwarf-discuss at lists.dwarfstd.org
>> 5. mailto:Dwarf-Discuss at lists.dwarfstd.org
>> 6. http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
>> 7. mailto:Dwarf-Discuss at lists.dwarfstd.org
>> 8. http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
> 
>> _______________________________________________
>> Dwarf-Discuss mailing list
>> Dwarf-Discuss at lists.dwarfstd.org
>> http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
> 
> 
> -- 
> Todd Allen
> Concurrent Real-Time