[Dwarf-Discuss] string reduction techniques

Sun Nov 7 20:36:13 GMT 2021

Just spitballing an idea here, but would there be value in a new DW_FORM (or
two) that referenced the names from .strtab or .dynstr, instead of .debug_str?
It would only work if the symbols already were there, but I would expect that
for many/most/all(?) functions defined in the compilation unit.  It does
somewhat relegate this to being Someone Else's Problem, but given that the
.strtab already has the problem of zillions of these huge symbols, maybe that's
not so bad?

Maybe, if that's too onerous for tools that need to manipulate .strtab, it could
reference them indirectly through a .debug_strtab_offsets section similar to
.debug_str_offsets.

On Tue, Nov 02, 2021 at 10:09:16AM -0700, Dwarf Discussion wrote:
>    On Mon, Nov 1, 2021 at 7:14 PM Greg Clayton via Dwarf-Discuss
>    <[1]dwarf-discuss at lists.dwarfstd.org> wrote:
> 
>      LLDB also uses mangled names. The clang compiler is our expression
>      parser and it always tries to resolve symbols during compilation/JIT and
>      it supplies mangled names when looking for functions to resolve when it
>      JITs code up. It is nice to be able to do quick name lookups using these
>      mangled names to find the address of the function. That being said, we
>      could work around it. Not sure how easy that would be though as mangled
>      names can end up demangling to the same name with some loss of
>      information and it would be important to be able to find the right in
>      charge or out of charge constructor when the compiler asks for a
>      specific symbol using the mangled name. We have more uses of mangled
>      names but most of them relate to parsing the symbol tables, so removing
>      them from DWARF wouldn't affect those areas.
>      I wonder if these is a way to have a DW_AT_partial_linkage_name that
>      relies on the decl context of a DIE. Like if you have a class "foo" in
>      the global namespace it could have a DW_AT_partial_linkage_name with the
>      value "_Z3foo". A DW_TAG_subprogram that is a child of this "foo" class
>       inside this class could have another partial linkage name "3bari" that
>      could be put together with the parent "_Z3foo" for a function like:
>      Void foo::bar(int);
>      Since many mangled names often start with the same prefix it might help
>      reduce the string table size.
> 
>    It's a thought - though I'm not sure how much that would really generalize
>    across different mangling schemes that use different mechanisms for
>    backreferences, etc. Or whether the return type should be included (it's
>    included for function templates in itanium mangling, for instance -
>    presumably also in MSVC mangling, but maybe some manglings include it even
>    in non-templates? I'm not sure) - since the partial linkage name for a
>    type would be context-insensitive (since it'd be attached to the type
>    rather than any use of the type) it'd be up to the consumer to fix that
>    up, eg:
> 
>    [2]https://godbolt.org/z/TqYjeevqx
>    Itanium:
>      f1<>(): _Z2f1IJEEvv
>      f1<t1, t1>(): _Z2f1IJ2t1S0_S0_EEvv
>    MSVC:
>      f1<>(): ??$f1@$$V@@YAXXZ
>      f1<t1, t1>(): ??$f1 at Ut1@@U1 at U1@@@YAXXZ
>    I'm not sure how much less a consumer would know about mangling if it had
>    to know about how to assemble these things, insert backrefs, insert empty
>    list markers, etc - without having to know how to mangle a specific user
>    defined type or name, like "3foo" versus "@Ut1@"?
> 
>        On Nov 1, 2021, at 6:52 PM, Daniel Berlin via Dwarf-Discuss
>        <[3]dwarf-discuss at lists.dwarfstd.org> wrote:
>        Finally, a question i know the answer to!
>        It brings us all the way back to when I was the C++ maintainer for
>        GDB, which is the most ancient of history.
>        Unfortunately, this a trip to a horrible place
>        I actually spent a lot of time trying to make it so we didn't need
>        linkage names, because, even then, they took up a *lot* of space.
>        On Mon, Nov 1, 2021 at 8:35 PM Cary Coutant via Dwarf-Discuss
>        <[4]dwarf-discuss at lists.dwarfstd.org> wrote:
> 
>          >> I can't be sure about this exponential growth.  I don't have the
>          data to back it
>          >> up.  But I will say, when we created DWARF64, I was skeptical
>          that it would be
>          >> needed during my career.  And yet here we are...
>          >
>          > Yep, still got mixed feelings about DWARF64 - partly the pieces
>          that we're seeing with the need for some solutions for mixed
>          DWARF32/64, etc, makes it feel like maybe it's not got a bit of
>          "settling in" to do. And I'm still rather hopeful we might be able
>          to reduce the overheads enough to avoid widespread use of DWARF64 -
>          but it's not a sure thing by any means.
> 
>          Agreed. I'd like to explore as many avenues as we can to eliminate
>          the
>          need for DWARF64.
> 
>          >> Honestly, I've never been sure why gcc generates
>          DW_AT_linkage_name.  Our
>          >> debugger almost never uses it.  (There is one use to detect "GNU
>          indirect"
>          >> functions.)  I wonder if it would be possible to avoid them if
>          you provided
>          >> enough info about the template parameters, if the debugger had
>          its own name
>          >> mangler.  I had to write one for our debugger a couple years ago,
>          and it
>          >> definitely was a persnickety beast.  But doable with enough
>          information.  Mind
>          >> you, I'm not sure there is enough information to do it perfectly
>          with the state
>          >> of DWARF & gcc right now.
>          >
>          > Yeah, that was/is certainly my first pass - the way I've done the
>          DW_AT_name one is to have a feature in clang that produces the short
>          name "t1" but then also embeds the template argument list in the
>          name (like this: "_STNt1|<int>") - then llvm-dwarfdump will detect
>          this prefix, split up the name, rebuild the original name as it
>          would if it'd been given only the simple name ("t1") and compare it
>          to the one from clang. Then I can run this over large programs and
>          check everything round-trips correctly & in clang, classify any
>          names we can't roundtrip so they get emitted in full rather than
>          shortened.
>          > We could do something similar with linkage names - since to know
>          there's some prior art in your work there.
>          >
>          > I wouldn't be averse to considering what'd take to make DWARF
>          robust enough to always roundtrip simple and linkage names in this
>          way - I don't think it'd take a /lot/ of extra DWARF content.
> 
>          Fuzzy memory here, but as I recall, GCC didn't generate linkage
>          names
>          (or only did in some very specific cases) until the LTO folks
>          convinced us they needed it in order to relate profile data back to
>          the source. Perhaps if we came up with a better way of doing that,
>          we
>          could eliminate the linkage names.
> 
>        No, see, that's a mildly reasonable answer.
>        If you go far enough back, the linkage names exist for a few reasons:
>        1. Because the debug info wasn't always good enough, and so GDB used
>        to demangle the linkage names and parse them using a hacked up C++-ish
>        parser for type info.
>        2. Even when it didn't, it decoded linkage names to detect things like
>        destructors/constructors, etc.
>        3. Because It used it to do remangling properly and try to generate
>        method signatures to lookup (and for #1)
>        4. Because it was used to do symbol lookup of in the ELF/etc symbol
>        tables for static things/etc.
>        5. Because it saved space in STABS to do #1 (they predate DWARF by
>        far).
>        If you checkout gdb source code, circa 2001, and search for things
>        like check_stub_method, and follow all the things it calls (like
>        gdb_mangle_name), you can learn the history of linkage names (and
>        probably throw up in your mouth a little).
>         If you do a case insensitive search for things like "physname" and
>        "phys_name", you'll see all the places it used to use the linkage
>        names.
>        I spent a lot of time abstracting out things like the
>        constructor/destructor name testing, vptr name finding, etc, so that
>        someone later might have a chance to get rid of linkage names (it was
>        also necessary because of the gcc 2.95->3.0 ABI change).
> 
>        _______________________________________________
>        Dwarf-Discuss mailing list
>        [5]Dwarf-Discuss at lists.dwarfstd.org
>        [6]http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
> 
>      _______________________________________________
>      Dwarf-Discuss mailing list
>      [7]Dwarf-Discuss at lists.dwarfstd.org
>      [8]http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
> 
> References
> 
>    Visible links
>    1. mailto:dwarf-discuss at lists.dwarfstd.org
>    2. https://godbolt.org/z/TqYjeevqx
>    3. mailto:dwarf-discuss at lists.dwarfstd.org
>    4. mailto:dwarf-discuss at lists.dwarfstd.org
>    5. mailto:Dwarf-Discuss at lists.dwarfstd.org
>    6. http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
>    7. mailto:Dwarf-Discuss at lists.dwarfstd.org
>    8. http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org

> _______________________________________________
> Dwarf-Discuss mailing list
> Dwarf-Discuss at lists.dwarfstd.org
> http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org

-- 
Todd Allen
Concurrent Real-Time