[Dwarf-Discuss] string reduction techniques
Greg Clayton
clayborg@gmail.com
Tue Nov 9 05:52:38 GMT 2021
> On Nov 7, 2021, at 12:36 PM, Todd Allen <todd.allen at concurrent-rt.com> wrote:
>
> Just spitballing an idea here, but would there be value in a new DW_FORM (or
> two) that referenced the names from .strtab or .dynstr, instead of .debug_str?
> It would only work if the symbols already were there, but I would expect that
> for many/most/all(?) functions defined in the compilation unit. It does
> somewhat relegate this to being Someone Else's Problem, but given that the
> .strtab already has the problem of zillions of these huge symbols, maybe that's
> not so bad?
>
> Maybe, if that's too onerous for tools that need to manipulate .strtab, it could
> reference them indirectly through a .debug_strtab_offsets section similar to
> .debug_str_offsets.
Interesting idea! One issue is if someone strips the binary, this could end up stripping local symbols that have mangled names that the DWARF refers to and cause the DW_FORM values to point to invalid offsets.
>
> On Tue, Nov 02, 2021 at 10:09:16AM -0700, Dwarf Discussion wrote:
>> On Mon, Nov 1, 2021 at 7:14 PM Greg Clayton via Dwarf-Discuss
>> <[1]dwarf-discuss at lists.dwarfstd.org> wrote:
>>
>> LLDB also uses mangled names. The clang compiler is our expression
>> parser and it always tries to resolve symbols during compilation/JIT and
>> it supplies mangled names when looking for functions to resolve when it
>> JITs code up. It is nice to be able to do quick name lookups using these
>> mangled names to find the address of the function. That being said, we
>> could work around it. Not sure how easy that would be though as mangled
>> names can end up demangling to the same name with some loss of
>> information and it would be important to be able to find the right in
>> charge or out of charge constructor when the compiler asks for a
>> specific symbol using the mangled name. We have more uses of mangled
>> names but most of them relate to parsing the symbol tables, so removing
>> them from DWARF wouldn't affect those areas.
>> I wonder if these is a way to have a DW_AT_partial_linkage_name that
>> relies on the decl context of a DIE. Like if you have a class "foo" in
>> the global namespace it could have a DW_AT_partial_linkage_name with the
>> value "_Z3foo". A DW_TAG_subprogram that is a child of this "foo" class
>> inside this class could have another partial linkage name "3bari" that
>> could be put together with the parent "_Z3foo" for a function like:
>> Void foo::bar(int);
>> Since many mangled names often start with the same prefix it might help
>> reduce the string table size.
>>
>> It's a thought - though I'm not sure how much that would really generalize
>> across different mangling schemes that use different mechanisms for
>> backreferences, etc. Or whether the return type should be included (it's
>> included for function templates in itanium mangling, for instance -
>> presumably also in MSVC mangling, but maybe some manglings include it even
>> in non-templates? I'm not sure) - since the partial linkage name for a
>> type would be context-insensitive (since it'd be attached to the type
>> rather than any use of the type) it'd be up to the consumer to fix that
>> up, eg:
>>
>> [2]https://godbolt.org/z/TqYjeevqx
>> Itanium:
>> f1<>(): _Z2f1IJEEvv
>> f1<t1, t1>(): _Z2f1IJ2t1S0_S0_EEvv
>> MSVC:
>> f1<>(): ??$f1@$$V@@YAXXZ
>> f1<t1, t1>(): ??$f1 at Ut1@@U1 at U1@@@YAXXZ
>> I'm not sure how much less a consumer would know about mangling if it had
>> to know about how to assemble these things, insert backrefs, insert empty
>> list markers, etc - without having to know how to mangle a specific user
>> defined type or name, like "3foo" versus "@Ut1@"?
>>
>> On Nov 1, 2021, at 6:52 PM, Daniel Berlin via Dwarf-Discuss
>> <[3]dwarf-discuss at lists.dwarfstd.org> wrote:
>> Finally, a question i know the answer to!
>> It brings us all the way back to when I was the C++ maintainer for
>> GDB, which is the most ancient of history.
>> Unfortunately, this a trip to a horrible place
>> I actually spent a lot of time trying to make it so we didn't need
>> linkage names, because, even then, they took up a *lot* of space.
>> On Mon, Nov 1, 2021 at 8:35 PM Cary Coutant via Dwarf-Discuss
>> <[4]dwarf-discuss at lists.dwarfstd.org> wrote:
>>
>>>> I can't be sure about this exponential growth. I don't have the
>> data to back it
>>>> up. But I will say, when we created DWARF64, I was skeptical
>> that it would be
>>>> needed during my career. And yet here we are...
>>>
>>> Yep, still got mixed feelings about DWARF64 - partly the pieces
>> that we're seeing with the need for some solutions for mixed
>> DWARF32/64, etc, makes it feel like maybe it's not got a bit of
>> "settling in" to do. And I'm still rather hopeful we might be able
>> to reduce the overheads enough to avoid widespread use of DWARF64 -
>> but it's not a sure thing by any means.
>>
>> Agreed. I'd like to explore as many avenues as we can to eliminate
>> the
>> need for DWARF64.
>>
>>>> Honestly, I've never been sure why gcc generates
>> DW_AT_linkage_name. Our
>>>> debugger almost never uses it. (There is one use to detect "GNU
>> indirect"
>>>> functions.) I wonder if it would be possible to avoid them if
>> you provided
>>>> enough info about the template parameters, if the debugger had
>> its own name
>>>> mangler. I had to write one for our debugger a couple years ago,
>> and it
>>>> definitely was a persnickety beast. But doable with enough
>> information. Mind
>>>> you, I'm not sure there is enough information to do it perfectly
>> with the state
>>>> of DWARF & gcc right now.
>>>
>>> Yeah, that was/is certainly my first pass - the way I've done the
>> DW_AT_name one is to have a feature in clang that produces the short
>> name "t1" but then also embeds the template argument list in the
>> name (like this: "_STNt1|<int>") - then llvm-dwarfdump will detect
>> this prefix, split up the name, rebuild the original name as it
>> would if it'd been given only the simple name ("t1") and compare it
>> to the one from clang. Then I can run this over large programs and
>> check everything round-trips correctly & in clang, classify any
>> names we can't roundtrip so they get emitted in full rather than
>> shortened.
>>> We could do something similar with linkage names - since to know
>> there's some prior art in your work there.
>>>
>>> I wouldn't be averse to considering what'd take to make DWARF
>> robust enough to always roundtrip simple and linkage names in this
>> way - I don't think it'd take a /lot/ of extra DWARF content.
>>
>> Fuzzy memory here, but as I recall, GCC didn't generate linkage
>> names
>> (or only did in some very specific cases) until the LTO folks
>> convinced us they needed it in order to relate profile data back to
>> the source. Perhaps if we came up with a better way of doing that,
>> we
>> could eliminate the linkage names.
>>
>> No, see, that's a mildly reasonable answer.
>> If you go far enough back, the linkage names exist for a few reasons:
>> 1. Because the debug info wasn't always good enough, and so GDB used
>> to demangle the linkage names and parse them using a hacked up C++-ish
>> parser for type info.
>> 2. Even when it didn't, it decoded linkage names to detect things like
>> destructors/constructors, etc.
>> 3. Because It used it to do remangling properly and try to generate
>> method signatures to lookup (and for #1)
>> 4. Because it was used to do symbol lookup of in the ELF/etc symbol
>> tables for static things/etc.
>> 5. Because it saved space in STABS to do #1 (they predate DWARF by
>> far).
>> If you checkout gdb source code, circa 2001, and search for things
>> like check_stub_method, and follow all the things it calls (like
>> gdb_mangle_name), you can learn the history of linkage names (and
>> probably throw up in your mouth a little).
>> If you do a case insensitive search for things like "physname" and
>> "phys_name", you'll see all the places it used to use the linkage
>> names.
>> I spent a lot of time abstracting out things like the
>> constructor/destructor name testing, vptr name finding, etc, so that
>> someone later might have a chance to get rid of linkage names (it was
>> also necessary because of the gcc 2.95->3.0 ABI change).
>>
>> _______________________________________________
>> Dwarf-Discuss mailing list
>> [5]Dwarf-Discuss at lists.dwarfstd.org
>> [6]http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
>>
>> _______________________________________________
>> Dwarf-Discuss mailing list
>> [7]Dwarf-Discuss at lists.dwarfstd.org
>> [8]http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
>>
>> References
>>
>> Visible links
>> 1. mailto:dwarf-discuss at lists.dwarfstd.org
>> 2. https://godbolt.org/z/TqYjeevqx
>> 3. mailto:dwarf-discuss at lists.dwarfstd.org
>> 4. mailto:dwarf-discuss at lists.dwarfstd.org
>> 5. mailto:Dwarf-Discuss at lists.dwarfstd.org
>> 6. http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
>> 7. mailto:Dwarf-Discuss at lists.dwarfstd.org
>> 8. http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
>
>> _______________________________________________
>> Dwarf-Discuss mailing list
>> Dwarf-Discuss at lists.dwarfstd.org
>> http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
>
>
> --
> Todd Allen
> Concurrent Real-Time
More information about the Dwarf-discuss
mailing list