[Dwarf-Discuss] PROPOSAL: Constant expressions in location lists

Sun Jan 6 12:36:35 GMT 2008

I am somewhat confused by your reference to "source-language values".
That's not how I think about them.  I'd say they are target machine
values, i.e. raw byte/bit granularity data, as a value supplied by
DW_AT_const_value is, and as the values whose locations are described by
location expressions are.  No notion of "value" in DWARF deals in
source-language terms.  Rather, the values DWARF describes are target
machine values, bit-for-bit as would appear in memory/registers.  It's
the source-language constructs (subprogram, variable, etc.) and types
DWARF describes that both lead you to those values and tell you how to
interpret them in source-language terms (e.g. via data_member_location).

I am particularly confused by your reference to DIE trees.  These
describe declarations, and do not represent computations in any way I
understand.  They are our generic tree format of course, so you could
certainly define a whole new set of tags to describe expression trees.
But that seems rather far out in left field.  Am I misunderstanding you?

I work on the consumer side of the equation, myself (and I'm habitually
anal about specifications for corner cases).  So my own focus is on what
the format means and how a consumer can interpret it to cover every
possible expressible truth about the actual code generated that a
compiler might one day be clever enough both to claim and to generate
code that makes true.  That doesn't mean I expect any compiler any day
soon to produce any of the very complicated cases.  Indeed, DWARF
explicitly never speaks to expectations about the quality and detail of
the information a compiler produces, just about precisely what claims
it's expressing when it emits certain data in DWARF formats.

It is indeed true that in the general case producing expressions is
tantamount to code generation for DWARF's primitive stack machine.
But let's look at how this will really come up.  This is a value cell
that the compiler optimized away but still groks the derivation of.
Usually that's either because it's a compile-time constant, or because
it was CSE'd with another value already on hand at this point in the
computation.  In the latter case, it's most likely a small amount of
arithmetic on some registers or stack slots.  Something real
complicated is not so likely to have been optimized away.  It could
indeed be that something as hairy to compute as division of larger
than target word size could be easily CSE'd in this fashion (or
e.g. just any FP arithmetic at all).  Then indeed a really hard-core
producer could emit enormous DWARF expression subroutine libraries to
call from its expressions.  Or it could just punt on these hard cases,
(just like it now punts on this whole class of case that can't be
represented anyway).  The common case is:

	static inline void foo(long *p) { bar(*p); }
	void baz(long s[99], long i) { foo(&s[i]); s[0] = 1; }
	--->
	baz:
		pushq	%rbx
		movq	%rdi, %rbx
		movq	(%rdi,%rsi,8), %rdi
		xorl	%eax, %eax
		call	bar
		movq	$1, (%rbx)
		popq	%rbx
		ret

"p" has been optimized away.  But its location could be given as:

	DW_OP_lit8 DW_OP_reg4 DW_OP_mul DW_OP_reg3 DW_OP_plus DW_OP_value

This one is simple to produce, because it had to be simple enough to
be expressed with the machine's addressing modes to have been
optimized this way.  I don't really know the compiler internals, but I
assume it has RTL now and will one day have some tree form instead.
If it's RTL for an integer value computation without side effects (and
not wider than target word size), then that RTL expression tree
flattens into DWARF ops (usually using a value stack no more than two
deep), in a very straightforward way.

Thanks,
Roland