[Dwarf-Discuss] string reduction techniques

Tue Nov 2 01:52:08 GMT 2021

Finally, a question i know the answer to!

It brings us all the way back to when I was the C++ maintainer for GDB,
which is the most ancient of history.
Unfortunately, this a trip to a horrible place
I actually spent a lot of time trying to make it so we didn't need linkage
names, because, even then, they took up a *lot* of space.

On Mon, Nov 1, 2021 at 8:35 PM Cary Coutant via Dwarf-Discuss <
dwarf-discuss at lists.dwarfstd.org> wrote:

> >> I can't be sure about this exponential growth.  I don't have the data
> to back it
> >> up.  But I will say, when we created DWARF64, I was skeptical that it
> would be
> >> needed during my career.  And yet here we are...
> >
> > Yep, still got mixed feelings about DWARF64 - partly the pieces that
> we're seeing with the need for some solutions for mixed DWARF32/64, etc,
> makes it feel like maybe it's not got a bit of "settling in" to do. And I'm
> still rather hopeful we might be able to reduce the overheads enough to
> avoid widespread use of DWARF64 - but it's not a sure thing by any means.
>
> Agreed. I'd like to explore as many avenues as we can to eliminate the
> need for DWARF64.
>
>
> >> Honestly, I've never been sure why gcc generates DW_AT_linkage_name.
> Our
> >> debugger almost never uses it.  (There is one use to detect "GNU
> indirect"
> >> functions.)  I wonder if it would be possible to avoid them if you
> provided
> >> enough info about the template parameters, if the debugger had its own
> name
> >> mangler.  I had to write one for our debugger a couple years ago, and it
> >> definitely was a persnickety beast.  But doable with enough
> information.  Mind
> >> you, I'm not sure there is enough information to do it perfectly with
> the state
> >> of DWARF & gcc right now.
> >
> > Yeah, that was/is certainly my first pass - the way I've done the
> DW_AT_name one is to have a feature in clang that produces the short name
> "t1" but then also embeds the template argument list in the name (like
> this: "_STNt1|<int>") - then llvm-dwarfdump will detect this prefix, split
> up the name, rebuild the original name as it would if it'd been given only
> the simple name ("t1") and compare it to the one from clang. Then I can run
> this over large programs and check everything round-trips correctly & in
> clang, classify any names we can't roundtrip so they get emitted in full
> rather than shortened.
> > We could do something similar with linkage names - since to know there's
> some prior art in your work there.
> >
> > I wouldn't be averse to considering what'd take to make DWARF robust
> enough to always roundtrip simple and linkage names in this way - I don't
> think it'd take a /lot/ of extra DWARF content.
>
> Fuzzy memory here, but as I recall, GCC didn't generate linkage names
> (or only did in some very specific cases) until the LTO folks
> convinced us they needed it in order to relate profile data back to
> the source. Perhaps if we came up with a better way of doing that, we
> could eliminate the linkage names.
>

No, see, that's a mildly reasonable answer.
If you go far enough back, the linkage names exist for a few reasons:
1. Because the debug info wasn't always good enough, and so GDB used to
demangle the linkage names and parse them using a hacked up C++-ish parser
for type info.
2. Even when it didn't, it decoded linkage names to detect things like
destructors/constructors, etc.
3. Because It used it to do remangling properly and try to generate method
signatures to lookup (and for #1)
4. Because it was used to do symbol lookup of in the ELF/etc symbol tables
for static things/etc.
5. Because it saved space in STABS to do #1 (they predate DWARF by far).

If you checkout gdb source code, circa 2001, and search for things like
check_stub_method, and follow all the things it calls (like
gdb_mangle_name), you can learn the history of linkage names (and probably
throw up in your mouth a little).
 If you do a case insensitive search for things like "physname" and
"phys_name", you'll see all the places it used to use the linkage names.
I spent a lot of time abstracting out things like the
constructor/destructor name testing, vptr name finding, etc, so that
someone later might have a chance to get rid of linkage names (it was also
necessary because of the gcc 2.95->3.0 ABI change).

>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.dwarfstd.org/pipermail/dwarf-discuss-dwarfstd.org/attachments/20211101/178b3ef4/attachment.html>