[Dwarf-Discuss] string reduction techniques

Daniel Berlin dberlin@dberlin.org
Tue Nov 2 02:22:35 GMT 2021

On Mon, Nov 1, 2021 at 10:14 PM Greg Clayton <clayborg at gmail.com> wrote:

> LLDB also uses mangled names. The clang compiler is our expression parser
> and it always tries to resolve symbols during compilation/JIT and it
> supplies mangled names when looking for functions to resolve when it JITs
> code up.

GDB was nearly the same

> It is nice to be able to do quick name lookups using these mangled names
> to find the address of the function.

Yep - GDB also required them to be able to do binary search for the name of
the function -> address mapping (the "minimal symbol" table).

> That being said, we could work around it. Not sure how easy that would be
> though as mangled names can end up demangling to the same name with some
> loss of information and it would be important to be able to find the right
> in charge or out of charge constructor when the compiler asks for a
> specific symbol using the mangled name.

Yes - we felt the same way at the time. We could resolve the symbol table
speed issue in a variety of ways if we had to, but you'd end up with quite
an interface to get all the info being extracted from the linkage names
passed along to the right places to find the right symbols without them.

> We have more uses of mangled names but most of them relate to parsing the
> symbol tables, so removing them from DWARF wouldn?t affect those areas.
> I wonder if these is a way to have a DW_AT_partial_linkage_name that
> relies on the decl context of a DIE. Like if you have a class "foo" in the
> global namespace it could have a DW_AT_partial_linkage_name with the
> value "_Z3foo". A DW_TAG_subprogram that is a child of this "foo" class
>  inside this class could have another partial linkage name "3bari" that
> could be put together with the parent "_Z3foo" for a function like:
> Void foo::bar(int);
> Since many mangled names often start with the same prefix it might help
> reduce the string table size.

This is similar to how gdb constructed mangled names for certain things, so
it certainly is doable.

> On Nov 1, 2021, at 6:52 PM, Daniel Berlin via Dwarf-Discuss <
> dwarf-discuss at lists.dwarfstd.org> wrote:
> Finally, a question i know the answer to!
> It brings us all the way back to when I was the C++ maintainer for GDB,
> which is the most ancient of history.
> Unfortunately, this a trip to a horrible place
> I actually spent a lot of time trying to make it so we didn't need linkage
> names, because, even then, they took up a *lot* of space.
> On Mon, Nov 1, 2021 at 8:35 PM Cary Coutant via Dwarf-Discuss <
> dwarf-discuss at lists.dwarfstd.org> wrote:
>> >> I can't be sure about this exponential growth.  I don't have the data
>> to back it
>> >> up.  But I will say, when we created DWARF64, I was skeptical that it
>> would be
>> >> needed during my career.  And yet here we are...
>> >
>> > Yep, still got mixed feelings about DWARF64 - partly the pieces that
>> we're seeing with the need for some solutions for mixed DWARF32/64, etc,
>> makes it feel like maybe it's not got a bit of "settling in" to do. And I'm
>> still rather hopeful we might be able to reduce the overheads enough to
>> avoid widespread use of DWARF64 - but it's not a sure thing by any means.
>> Agreed. I'd like to explore as many avenues as we can to eliminate the
>> need for DWARF64.
>> >> Honestly, I've never been sure why gcc generates DW_AT_linkage_name.
>> Our
>> >> debugger almost never uses it.  (There is one use to detect "GNU
>> indirect"
>> >> functions.)  I wonder if it would be possible to avoid them if you
>> provided
>> >> enough info about the template parameters, if the debugger had its own
>> name
>> >> mangler.  I had to write one for our debugger a couple years ago, and
>> it
>> >> definitely was a persnickety beast.  But doable with enough
>> information.  Mind
>> >> you, I'm not sure there is enough information to do it perfectly with
>> the state
>> >> of DWARF & gcc right now.
>> >
>> > Yeah, that was/is certainly my first pass - the way I've done the
>> DW_AT_name one is to have a feature in clang that produces the short name
>> "t1" but then also embeds the template argument list in the name (like
>> this: "_STNt1|<int>") - then llvm-dwarfdump will detect this prefix, split
>> up the name, rebuild the original name as it would if it'd been given only
>> the simple name ("t1") and compare it to the one from clang. Then I can run
>> this over large programs and check everything round-trips correctly & in
>> clang, classify any names we can't roundtrip so they get emitted in full
>> rather than shortened.
>> > We could do something similar with linkage names - since to know
>> there's some prior art in your work there.
>> >
>> > I wouldn't be averse to considering what'd take to make DWARF robust
>> enough to always roundtrip simple and linkage names in this way - I don't
>> think it'd take a /lot/ of extra DWARF content.
>> Fuzzy memory here, but as I recall, GCC didn't generate linkage names
>> (or only did in some very specific cases) until the LTO folks
>> convinced us they needed it in order to relate profile data back to
>> the source. Perhaps if we came up with a better way of doing that, we
>> could eliminate the linkage names.
> No, see, that's a mildly reasonable answer.
> If you go far enough back, the linkage names exist for a few reasons:
> 1. Because the debug info wasn't always good enough, and so GDB used to
> demangle the linkage names and parse them using a hacked up C++-ish parser
> for type info.
> 2. Even when it didn't, it decoded linkage names to detect things like
> destructors/constructors, etc.
> 3. Because It used it to do remangling properly and try to generate method
> signatures to lookup (and for #1)
> 4. Because it was used to do symbol lookup of in the ELF/etc symbol tables
> for static things/etc.
> 5. Because it saved space in STABS to do #1 (they predate DWARF by far).
> If you checkout gdb source code, circa 2001, and search for things like
> check_stub_method, and follow all the things it calls (like
> gdb_mangle_name), you can learn the history of linkage names (and probably
> throw up in your mouth a little).
>  If you do a case insensitive search for things like "physname" and
> "phys_name", you'll see all the places it used to use the linkage names.
> I spent a lot of time abstracting out things like the
> constructor/destructor name testing, vptr name finding, etc, so that
> someone later might have a chance to get rid of linkage names (it was also
> necessary because of the gcc 2.95->3.0 ABI change).
>> _______________________________________________
> Dwarf-Discuss mailing list
> Dwarf-Discuss at lists.dwarfstd.org
> http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.dwarfstd.org/pipermail/dwarf-discuss-dwarfstd.org/attachments/20211101/09a09f8e/attachment.html>

More information about the Dwarf-discuss mailing list