[Dwarf-Discuss] DWARF Issue 071227.1

Fri Apr 11 19:52:10 GMT 2008

Re: http://dwarfstd.org/ShowIssue.php?issue=071227.1

I see Bishop/Blandy elaborated on the earlier DW_OP_value extension
proposal.  Excellent, I like it.

I have a few concerns, though:

- smaller-than-word objects can't be handled efficiently if the
endianness isn't right.  For example, if the size of a variable is a
single byte, then something like:

  DW_OP_lit1
  DW_OP_value 1

will (unless I misunderstood something about the proposal) not give me
the address of a *byte* holding 1, but rather 0, on a big-endian
machine.  To get the address of the '1', I'd have to add N-1 to the
result of DW_OP_value, where N is the number of bytes in a word.

To make encoding more compact, I propose that DW_OP_value accepts a
number of bytes, rather than a number of words.  It is then up to the
debug information consumer to compute the number of words.

Now, since a specified number of bytes is probably only useful for
smaller-than-word sizes, maybe it would make sense to DW_OP_value to
treat positive arguments as a number of words, and negative arguments
as a number of bytes, or some such.  Certainly nothing that requires
more than one byte to encode the argument.

- the encoding of DW_OP_value's argument is not specified.  I assume
it was meant as an unsigned byte, because anything more than that is
excessive.  However, given the proposal above, it might have to be
signed.

- AFAIK it is not possible to carry data over across DW_OP_piece or
DW_OP_bit_piece.  It might be convenient to be able to perform some
complex computation into DW_OP_value (say a double-precision floating
point operation, using a DWARF FP library :-) and then hand over
multiple pieces thereof (say, to fix word endianness of doubles, as
required on some machines).  Although this specific example probably
doesn't make much sense, because the DW_OP_value would probably be
computed with the correct word endianness in the first place, I'm sure
other cases of non-contiguous or out-of-order values that need
correction could benefit from this.  I'm not sure it's worth taking
into account, though.  Just an idea I thought I'd propose, although I
have no idea whatsoever of how I'd come about introducing such a thing
other than extending DW_OP_*piece themselves so as to keep the address
it's given on the top of the stack.

- I don't understand the difference between the proposed DW_OP_fetch
and the existing DW_OP_deref.  What am I missing?  AFAICT, it would
only be useful to introduce DW_OP_fetch if it took a similar encoding
as that of DW_OP_value, such that one could fetch multiple words or
smaller-than-word objects.

- DW_OP_value, being a new code, is not backward-compatible, and it
would require debug information handlers that find it to discard the
entire expression.  In general, this is not such a big deal, since the
expression will be as simple as 'compute, compute, compute,
DW_OP_value', and discarding that location list entry won't make for
any loss.

However, when there are multiple pieces involved, and only some of
them involve DW_OP_value, then I'd add a backward-compatibility
recommendation of encoding the DW_OP_value-using pieces in separate
location list entries from those that don't use this extension, such
that debug information consumers can still get to the
backward-compatible pieces.

(This separation for backward compatibility was the motivation for the
proposal of DW_*AT*_value in this list, IIRC.  However, I now see
benefits in reusing DW_AT_location, and the backward compatibility
issue can be addressed by simply using separate loclist entries.)

- It occurs to me that it would be useful to be able to refer to the
value of other varibles in debug information.  It appears to me that
DW_OP_call* can be used for this purpose.  However, I don't quite see
how this works in case a variable is encoded in multiple location
pieces, or even in case multiple location list entries match for the
called variable.  Am I making some fundamental mistake in my
understanding of how these opcodes are supposed to be used?

Consider that I have a location list entry for a word-sized variable
that, throughout its range, can be computed by adding say constant 1
to another live variable.  I'd like to encode this entry like this:

/* x can be computed as other + 1: */
  DW_OP_call_ref <other>
  DW_OP_deref
  DW_OP_plus_uconst 1
  DW_OP_value 1

but then, what if DW_AT_location for the other variable is like this
(let's pretend this makes sense):

...
L1..end: DW_OP_addr <where>
L2..L4: DW_OP_reg1
L2..L4: DW_OP_lit1 DW_OP_value 1
L2..end: DW_OP_addr <zero> DW_OP_piece 3 DW_OP_addr <one> DW_OP_piece 1
L3..end: DW_OP_fbreg -16
...

given L1 < L2 < L3 < L4 < end.

Now, consider that we want to compute the value of the derived
variable at L3.  Given that all 5 listed loclist entries match, how is
DW_OP_call_ref supposed to behave?  And then, if it was to select the
DW_OP_reg entry, would DW_OP_deref still work?  And how about the
DW_OP_piece entry?  Is DW_OP_deref supposed to handle that correctly?

And then, even if DW_OP_deref works, what if we had to modify the
address before dereferencing it, say, to access the second 4-byte word
of a 2-word variable that was scalarized by the compiler?

I.e., given:

struct { int a, int b; } lother;
/* x can be computed as other.b + 1: */
  DW_OP_call_ref <other>
  DW_OP_plus_uconst 4
  DW_OP_deref
  DW_OP_plus_uconst 1
  DW_OP_value 1

is lother defined like this supposed to work?

  DW_OP_reg1 DW_OP_piece 4 DW_OP_reg7 DW_OP_piece 4

I guess not.  And if not, it would be great to specify (i) the way
DW_OP_*piece and multiple overlapping loclist entries affected the top
of the stack upon returning from a DW_call* (say, merge it all into a
single compound stack entry representing a composite location), (ii)
how address arithmetic and dereferencing are to deal with such
composite locations, and (iii) how the absence of any matching
location list entry is to be implemented (say, another composite
location).  Furthermore, it would be nice to specify that
DW_AT_const_value can make up for an absent DW_AT_location, behaving
like a DW_OP_value-terminated location expression.

Do these sound like reasonable ideas to propose formally, or are they
already addressed somehow in the current specification, and I just
don't quite get it?

Thanks,

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}