[Dwarf-Discuss] imported_unit and reference identity

Tue Jan 20 20:40:49 GMT 2009

Hmmm, this is most awkward. I think there are a couple of intertwined 
issues.

1) What does it mean for an imported unit to "logically belong at the 
place of the imported unit entry" (DIE)? The intent here is that there 
is logically a *copy* of the imported unit that occurs at the place of 
the import unit DIE.

2) Is one partial unit for a given source file always equivalent to 
every other partial unit for that same source file? Clearly the answer 
is no. Conditional compilation and other context can easily cause the 
same source to compile differently. As one entertaining example consider

     foox.cc:
         #if #defined(foox)
             structure foo { int a, b };
         #else
             structure foo ( int x, y };
         #endif
         #define foox 1

Using foox.cc instead of foo.cc in the original example makes it clear 
that there cannot be just one partial unit for the included source file.

3) How does one reference into a (particular copy of a) partial unit? 
Here I think is the real deficiency in DWARF as specified. The only 
defined way to reference from one unit into another is via 
DW_FORM_ref_addr (equivalently DW_FORM_ref_sig8). But as Roland has 
noted, this does not provide any means to disambiguate based on the 
context of the item referenced.

A quick off the cuff suggestion might be to introduce a new form
DW_FORM_unit_addr which takes two parameters:
  a) a pointer to the DW_TAG_imported_unit DIE that provides the context 
for the referenced entity
  b) a reference to the entity within the imported unit.

Of course, the cascading effects of multiple layers of nesting also 
needs to be considered--this casual suggestion has not been thought 
through to that extent. But hopefully it will stimulate discussion and 
ideas...

Ron

---------------------
Roland McGrath wrote:
> Appendix E talks about particular ways of producing DWARF data using
> imported_unit and partial_unit.  But that is not part of the spec.  The
> descriptions of imported_unit and partial_unit themselves are part of the
> spec.  I'd like to talk about what we can agree on for what a DWARF
> encoding using imported_unit should mean (or the range of what it could
> mean) as a proper understanding of that format as it lies in a DWARF file,
> independent of the particular means of producing that file.
> 
> Consider a trivial C++ example:
> 
> main.cc:
> 	namespace A {
> 	#include "foo.cc"
> 	};
> 	namespace B {
> 	#include "foo.cc"
> 	};
> 	B::foo var;
> foo.cc:
> 	struct foo { int x, y; };
> 
> This yields the DIE tree (omitting lots of the irrelevant detail):
> 
> #a10: compile_unit{name="main.cc", ...}
> #a20:	namespace{name="A", decl_file=["main.cc"], decl_line=1}
> #a30:		structure_type{name="foo", decl_file=["foo.cc"], decl_line=1}
> 			...
> #a40:	namespace{name="B", decl_file=["main.cc"], decl_line=4}
> #a50:		structure_type{name="foo", decl_file=["foo.cc"], decl_line=1}
> 			...
> #a60:	variable{name="var", location=..., type=#a50}
> 
> Using imported_unit to reduce the duplication, this would become:
> 
> #b10: partial_unit
> #b20:	structure_type{name="foo", decl_file=["foo.cc"], decl_line=1}
> 		...
> <new CU header>
> #b30: compile_unit{name="main.cc", ...}
> #b40:	namespace{name="A", decl_file=["main.cc"], decl_line=1}
> #b50:		imported_unit{import=#b10}
> #b60:	namespace{name="B", decl_file=["main.cc"], decl_line=4}
> #b70:		imported_unit{import=#b10}
> #b80:	variable{name="var", location=..., type=#b20}
> 
> This is a "correct" transformation using imported_unit in the way that
> partial and imported units are described in the spec.  Appendix E mentions
> using partial_unit in this way particularly.
> 
> It's almost a reversible transformation.  But if I were transforming it
> back to the exploded form with no imported_unit tags, does "var" get
> type=#a30 or type=#a50?  How could I tell?
> 
> Put another way, if a debugger wants to show the user "the type of var",
> what would it do?  In the first form (no imported_unit), it presumably
> traverses the CU down to #a50 to build up the scoped name "B::foo".  In the
> second form (using imported_unit), it can do the same thing: following the
> spec, it treats each imported_unit as if the children of #b10 were grafted
> in place of the imported_unit; then that traversal hits #b20 twice, once in
> lieu of #b50 and called "A::foo", then once in lieu of #b70 and called
> "B::foo".  Which one is it?
> 
> In this example, the user won't have much trouble coping with the wrong
> answer.  The real example is probably some dismal horror in template
> instantiation where reporting an identical type from a different scope
> might be extremely confusing.  Even in this trivial example, the wrong
> answer can not only confuse a user asking, "What type is it?" but might
> also confuse a debugger's evaluation of overload choices in an expression
> and the like (not that such things aren't already fraught with other
> confusions, but the point here is that DIE identity might in practice
> matter in technical ways beyond simple human preferences).
> 
> The two subtrees are authentically identical, so they are interchangeable
> for purposes of knowing how to access a variable of that type, find its
> fields, etc.  But each DIE has a distinct identity that is significant to
> the semantics of debugging.  The only way we encode that identity is by the
> DIE's position in the file, so two DIEs' identities are merged by using
> imported_unit to reduce the duplication in the encoding.
> 
> A conservative answer would be to only ever generate imported_unit where no
> such ambiguities are possible (e.g., only move top-level children of
> compile_unit or a few similar constraints).  That might drastically limit
> the opportunities for reducing duplicate information.  It's clearly not
> what Appendix E contemplates.
> 
> What do people think?
> 
> 
> Thanks,
> Roland