[Dwarf-Discuss] string reduction techniques

Tue Nov 2 17:09:16 GMT 2021

On Mon, Nov 1, 2021 at 7:14 PM Greg Clayton via Dwarf-Discuss <
dwarf-discuss at lists.dwarfstd.org> wrote:

> LLDB also uses mangled names. The clang compiler is our expression parser
> and it always tries to resolve symbols during compilation/JIT and it
> supplies mangled names when looking for functions to resolve when it JITs
> code up. It is nice to be able to do quick name lookups using these mangled
> names to find the address of the function. That being said, we could work
> around it. Not sure how easy that would be though as mangled names can end
> up demangling to the same name with some loss of information and it would
> be important to be able to find the right in charge or out of charge
> constructor when the compiler asks for a specific symbol using the mangled
> name. We have more uses of mangled names but most of them relate to parsing
> the symbol tables, so removing them from DWARF wouldn?t affect those areas.
>
> I wonder if these is a way to have a DW_AT_partial_linkage_name that
> relies on the decl context of a DIE. Like if you have a class "foo" in the
> global namespace it could have a DW_AT_partial_linkage_name with the
> value "_Z3foo". A DW_TAG_subprogram that is a child of this "foo" class
>  inside this class could have another partial linkage name "3bari" that
> could be put together with the parent "_Z3foo" for a function like:
>
> Void foo::bar(int);
>
> Since many mangled names often start with the same prefix it might help
> reduce the string table size.
>

It's a thought - though I'm not sure how much that would really generalize
across different mangling schemes that use different mechanisms for
backreferences, etc. Or whether the return type should be included (it's
included for function templates in itanium mangling, for instance -
presumably also in MSVC mangling, but maybe some manglings include it even
in non-templates? I'm not sure) - since the partial linkage name for a type
would be context-insensitive (since it'd be attached to the type rather
than any use of the type) it'd be up to the consumer to fix that up, eg:

https://godbolt.org/z/TqYjeevqx
Itanium:
  f1<>(): _Z2f1IJEEvv
  f1<t1, t1>(): _Z2f1IJ*2t1S0_S0_*EEvv
MSVC:
  f1<>(): ??$f1@$$V@@YAXXZ
  f1<t1, t1>(): ??$f1*@Ut1@@U1 at U1@*@@YAXXZ

I'm not sure how much less a consumer would know about mangling if it had
to know about how to assemble these things, insert backrefs, insert empty
list markers, etc - without having to know how to mangle a specific user
defined type or name, like "3foo" versus "@Ut1@"?

>
> On Nov 1, 2021, at 6:52 PM, Daniel Berlin via Dwarf-Discuss <
> dwarf-discuss at lists.dwarfstd.org> wrote:
>
> Finally, a question i know the answer to!
>
> It brings us all the way back to when I was the C++ maintainer for GDB,
> which is the most ancient of history.
> Unfortunately, this a trip to a horrible place
> I actually spent a lot of time trying to make it so we didn't need linkage
> names, because, even then, they took up a *lot* of space.
>
> On Mon, Nov 1, 2021 at 8:35 PM Cary Coutant via Dwarf-Discuss <
> dwarf-discuss at lists.dwarfstd.org> wrote:
>
>> >> I can't be sure about this exponential growth.  I don't have the data
>> to back it
>> >> up.  But I will say, when we created DWARF64, I was skeptical that it
>> would be
>> >> needed during my career.  And yet here we are...
>> >
>> > Yep, still got mixed feelings about DWARF64 - partly the pieces that
>> we're seeing with the need for some solutions for mixed DWARF32/64, etc,
>> makes it feel like maybe it's not got a bit of "settling in" to do. And I'm
>> still rather hopeful we might be able to reduce the overheads enough to
>> avoid widespread use of DWARF64 - but it's not a sure thing by any means.
>>
>> Agreed. I'd like to explore as many avenues as we can to eliminate the
>> need for DWARF64.
>>
>>
>> >> Honestly, I've never been sure why gcc generates DW_AT_linkage_name.
>> Our
>> >> debugger almost never uses it.  (There is one use to detect "GNU
>> indirect"
>> >> functions.)  I wonder if it would be possible to avoid them if you
>> provided
>> >> enough info about the template parameters, if the debugger had its own
>> name
>> >> mangler.  I had to write one for our debugger a couple years ago, and
>> it
>> >> definitely was a persnickety beast.  But doable with enough
>> information.  Mind
>> >> you, I'm not sure there is enough information to do it perfectly with
>> the state
>> >> of DWARF & gcc right now.
>> >
>> > Yeah, that was/is certainly my first pass - the way I've done the
>> DW_AT_name one is to have a feature in clang that produces the short name
>> "t1" but then also embeds the template argument list in the name (like
>> this: "_STNt1|<int>") - then llvm-dwarfdump will detect this prefix, split
>> up the name, rebuild the original name as it would if it'd been given only
>> the simple name ("t1") and compare it to the one from clang. Then I can run
>> this over large programs and check everything round-trips correctly & in
>> clang, classify any names we can't roundtrip so they get emitted in full
>> rather than shortened.
>> > We could do something similar with linkage names - since to know
>> there's some prior art in your work there.
>> >
>> > I wouldn't be averse to considering what'd take to make DWARF robust
>> enough to always roundtrip simple and linkage names in this way - I don't
>> think it'd take a /lot/ of extra DWARF content.
>>
>> Fuzzy memory here, but as I recall, GCC didn't generate linkage names
>> (or only did in some very specific cases) until the LTO folks
>> convinced us they needed it in order to relate profile data back to
>> the source. Perhaps if we came up with a better way of doing that, we
>> could eliminate the linkage names.
>>
>
> No, see, that's a mildly reasonable answer.
> If you go far enough back, the linkage names exist for a few reasons:
> 1. Because the debug info wasn't always good enough, and so GDB used to
> demangle the linkage names and parse them using a hacked up C++-ish parser
> for type info.
> 2. Even when it didn't, it decoded linkage names to detect things like
> destructors/constructors, etc.
> 3. Because It used it to do remangling properly and try to generate method
> signatures to lookup (and for #1)
> 4. Because it was used to do symbol lookup of in the ELF/etc symbol tables
> for static things/etc.
> 5. Because it saved space in STABS to do #1 (they predate DWARF by far).
>
> If you checkout gdb source code, circa 2001, and search for things like
> check_stub_method, and follow all the things it calls (like
> gdb_mangle_name), you can learn the history of linkage names (and probably
> throw up in your mouth a little).
>  If you do a case insensitive search for things like "physname" and
> "phys_name", you'll see all the places it used to use the linkage names.
> I spent a lot of time abstracting out things like the
> constructor/destructor name testing, vptr name finding, etc, so that
> someone later might have a chance to get rid of linkage names (it was also
> necessary because of the gcc 2.95->3.0 ABI change).
>
>
>
>> _______________________________________________
> Dwarf-Discuss mailing list
> Dwarf-Discuss at lists.dwarfstd.org
> http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
>
>
> _______________________________________________
> Dwarf-Discuss mailing list
> Dwarf-Discuss at lists.dwarfstd.org
> http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.dwarfstd.org/pipermail/dwarf-discuss-dwarfstd.org/attachments/20211102/202a645e/attachment-0001.html>