[Dwarf-Discuss] imported_unit and reference identity

Tue Jan 20 03:23:27 GMT 2009

Appendix E talks about particular ways of producing DWARF data using
imported_unit and partial_unit.  But that is not part of the spec.  The
descriptions of imported_unit and partial_unit themselves are part of the
spec.  I'd like to talk about what we can agree on for what a DWARF
encoding using imported_unit should mean (or the range of what it could
mean) as a proper understanding of that format as it lies in a DWARF file,
independent of the particular means of producing that file.

Consider a trivial C++ example:

main.cc:
	namespace A {
	#include "foo.cc"
	};
	namespace B {
	#include "foo.cc"
	};
	B::foo var;
foo.cc:
	struct foo { int x, y; };

This yields the DIE tree (omitting lots of the irrelevant detail):

#a10: compile_unit{name="main.cc", ...}
#a20:	namespace{name="A", decl_file=["main.cc"], decl_line=1}
#a30:		structure_type{name="foo", decl_file=["foo.cc"], decl_line=1}
			...
#a40:	namespace{name="B", decl_file=["main.cc"], decl_line=4}
#a50:		structure_type{name="foo", decl_file=["foo.cc"], decl_line=1}
			...
#a60:	variable{name="var", location=..., type=#a50}

Using imported_unit to reduce the duplication, this would become:

#b10: partial_unit
#b20:	structure_type{name="foo", decl_file=["foo.cc"], decl_line=1}
		...
<new CU header>
#b30: compile_unit{name="main.cc", ...}
#b40:	namespace{name="A", decl_file=["main.cc"], decl_line=1}
#b50:		imported_unit{import=#b10}
#b60:	namespace{name="B", decl_file=["main.cc"], decl_line=4}
#b70:		imported_unit{import=#b10}
#b80:	variable{name="var", location=..., type=#b20}

This is a "correct" transformation using imported_unit in the way that
partial and imported units are described in the spec.  Appendix E mentions
using partial_unit in this way particularly.

It's almost a reversible transformation.  But if I were transforming it
back to the exploded form with no imported_unit tags, does "var" get
type=#a30 or type=#a50?  How could I tell?

Put another way, if a debugger wants to show the user "the type of var",
what would it do?  In the first form (no imported_unit), it presumably
traverses the CU down to #a50 to build up the scoped name "B::foo".  In the
second form (using imported_unit), it can do the same thing: following the
spec, it treats each imported_unit as if the children of #b10 were grafted
in place of the imported_unit; then that traversal hits #b20 twice, once in
lieu of #b50 and called "A::foo", then once in lieu of #b70 and called
"B::foo".  Which one is it?

In this example, the user won't have much trouble coping with the wrong
answer.  The real example is probably some dismal horror in template
instantiation where reporting an identical type from a different scope
might be extremely confusing.  Even in this trivial example, the wrong
answer can not only confuse a user asking, "What type is it?" but might
also confuse a debugger's evaluation of overload choices in an expression
and the like (not that such things aren't already fraught with other
confusions, but the point here is that DIE identity might in practice
matter in technical ways beyond simple human preferences).

The two subtrees are authentically identical, so they are interchangeable
for purposes of knowing how to access a variable of that type, find its
fields, etc.  But each DIE has a distinct identity that is significant to
the semantics of debugging.  The only way we encode that identity is by the
DIE's position in the file, so two DIEs' identities are merged by using
imported_unit to reduce the duplication in the encoding.

A conservative answer would be to only ever generate imported_unit where no
such ambiguities are possible (e.g., only move top-level children of
compile_unit or a few similar constraints).  That might drastically limit
the opportunities for reducing duplicate information.  It's clearly not
what Appendix E contemplates.

What do people think?

Thanks,
Roland