[Dwarf-Discuss] imported_unit and reference identity

Tue Jan 20 17:54:43 GMT 2009

The short answer is:
If you want to combine type information that is in different parts
of the visible scope tree, then you really should disable it
if there are direct references to that type information.

I think most of the discussion in Appendix E relates to
combining type information that is actually the exact
same type, but it would otherwise be recorded multiple
times in executable.  In this sense, A::foo and B::foo are
not the exact same type.  They have the same shape, but
they are not the same type.

--chris

Roland McGrath wrote:
> Appendix E talks about particular ways of producing DWARF data using
> imported_unit and partial_unit.  But that is not part of the spec.  The
> descriptions of imported_unit and partial_unit themselves are part of the
> spec.  I'd like to talk about what we can agree on for what a DWARF
> encoding using imported_unit should mean (or the range of what it could
> mean) as a proper understanding of that format as it lies in a DWARF file,
> independent of the particular means of producing that file.
> 
> Consider a trivial C++ example:
> 
> main.cc:
> 	namespace A {
> 	#include "foo.cc"
> 	};
> 	namespace B {
> 	#include "foo.cc"
> 	};
> 	B::foo var;
> foo.cc:
> 	struct foo { int x, y; };
> 
> This yields the DIE tree (omitting lots of the irrelevant detail):
> 
> #a10: compile_unit{name="main.cc", ...}
> #a20:	namespace{name="A", decl_file=["main.cc"], decl_line=1}
> #a30:		structure_type{name="foo", decl_file=["foo.cc"], decl_line=1}
> 			...
> #a40:	namespace{name="B", decl_file=["main.cc"], decl_line=4}
> #a50:		structure_type{name="foo", decl_file=["foo.cc"], decl_line=1}
> 			...
> #a60:	variable{name="var", location=..., type=#a50}
> 
> Using imported_unit to reduce the duplication, this would become:
> 
> #b10: partial_unit
> #b20:	structure_type{name="foo", decl_file=["foo.cc"], decl_line=1}
> 		...
> <new CU header>
> #b30: compile_unit{name="main.cc", ...}
> #b40:	namespace{name="A", decl_file=["main.cc"], decl_line=1}
> #b50:		imported_unit{import=#b10}
> #b60:	namespace{name="B", decl_file=["main.cc"], decl_line=4}
> #b70:		imported_unit{import=#b10}
> #b80:	variable{name="var", location=..., type=#b20}
> 
> This is a "correct" transformation using imported_unit in the way that
> partial and imported units are described in the spec.  Appendix E mentions
> using partial_unit in this way particularly.
> 
> It's almost a reversible transformation.  But if I were transforming it
> back to the exploded form with no imported_unit tags, does "var" get
> type=#a30 or type=#a50?  How could I tell?
> 
> Put another way, if a debugger wants to show the user "the type of var",
> what would it do?  In the first form (no imported_unit), it presumably
> traverses the CU down to #a50 to build up the scoped name "B::foo".  In the
> second form (using imported_unit), it can do the same thing: following the
> spec, it treats each imported_unit as if the children of #b10 were grafted
> in place of the imported_unit; then that traversal hits #b20 twice, once in
> lieu of #b50 and called "A::foo", then once in lieu of #b70 and called
> "B::foo".  Which one is it?
> 
> In this example, the user won't have much trouble coping with the wrong
> answer.  The real example is probably some dismal horror in template
> instantiation where reporting an identical type from a different scope
> might be extremely confusing.  Even in this trivial example, the wrong
> answer can not only confuse a user asking, "What type is it?" but might
> also confuse a debugger's evaluation of overload choices in an expression
> and the like (not that such things aren't already fraught with other
> confusions, but the point here is that DIE identity might in practice
> matter in technical ways beyond simple human preferences).
> 
> The two subtrees are authentically identical, so they are interchangeable
> for purposes of knowing how to access a variable of that type, find its
> fields, etc.  But each DIE has a distinct identity that is significant to
> the semantics of debugging.  The only way we encode that identity is by the
> DIE's position in the file, so two DIEs' identities are merged by using
> imported_unit to reduce the duplication in the encoding.
> 
> A conservative answer would be to only ever generate imported_unit where no
> such ambiguities are possible (e.g., only move top-level children of
> compile_unit or a few similar constraints).  That might drastically limit
> the opportunities for reducing duplicate information.  It's clearly not
> what Appendix E contemplates.
> 
> What do people think?
> 
> 
> Thanks,
> Roland
> _______________________________________________
> Dwarf-Discuss mailing list
> Dwarf-Discuss at lists.dwarfstd.org
> http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org