[Dwarf-Discuss] PROPOSAL: Constant expressions in location lists

Mon Jan 7 03:38:09 GMT 2008

Jim Blandy wrote:
> The problem here is that DWARF needs to be able to describe how to
> compute values of variables that have been optimized out but could be
> computed from the state of the program (at least some of the time).

That is a relatively simple task, one dependent only on the
compiler's ability to describe how to compute the missing value.

> There's no necessary restriction on what types of values a compiler
> might be able to optimize out; in the most general case, the values to
> be computed could be structures, large floating-point types, etc.  So
> DWARF needs to be able to describe how to compute values with
> arbitrary source types.

No, you only need to be able to interpret the instructions given
by the compiler to compute a machine value.  This does not require
that a debugger have a source language interpreter, and in particular,
it doesn't require DWARF to be able to describe arbitrary source types.

> DWARF does have DWARF expressions, but their semantics were
> deliberately restricted to computing address-sized values, to keep the
> specification simple and focussed.  We're now considering using DWARF
> expressions to compute (potentially) every type of value expressible
> in the source language.

This seems to exaggerate the problem.  You don't need to compute
"every value expressible in the source".  Instead you need to compute
every value expressible in the machine.  Machines have a very narrow
range of types, usually two: binary integer and floating point.  And
it may not be necessary to even compute that range -- all you care
about is bit patterns.

> Following this approach, If the compiler had optimized out a
> floating-point value, but knew how to recompute it if desired, and
> wished to express this in DWARF, the compiler would need to produce a
> DWARF expression that would yield the appropriate floating-point
> value, in target format.

That's true, but it doesn't necessarily follow that DWARF expressions
need to perform floating point computations.

Let's say you had the following snippet of code:

     float pi(void) {
        float val = 3.14159;
        return val;
     }

A reasonable compiler would discard 'val' resulting in the debugger
giving a message saying that the value was not available.  To
reconstruct the value of this discarded variable, all you need to
do is generate the bit pattern which has this value: 0x400921fa.

> My thought was that the easiest way to have a debugger compute values
> of potentially arbitrary type would be to use the evaluator for
> source-language expressions that every source-level debugging tool
> already has.  The debuggers have already implemented (say) IEEE
> floating-point arithmetic, if that's what the target uses.  We would
> extend DWARF with tags representing source language expression
> operators (*, +, []) and literals ("foo", 1.0, 'c'), and have
> compilers emit die trees representing the expression the debugger
> should compute to rematerialize the value of the missing variable.
> 
> At first blush, this might seem like a horrible bloat of the DWARF
> spec: would the spec have to describe (say) C's integral type
> promotion rules for binary operators, the proper handling of 0 when it
> appears as the second or third operand of a ?: expression, and so on?
> 
> But I don't think that, at least, is a problem.  It would be
> sufficient for the DWARF standard to say, "Some attributes refer to
> source-language expressions.  DWARF can encode source-language
> expressions for languages X, Y, and Z.  For language X, here is how we
> represent source expressions in DWARF:", and then list the
> correspondence of X operators and DWARF tags.  There's no need for the
> DWARF spec to describe the semantics of the operators at all: as long
> as DWARF could faithfully convey the expression from producer to
> consumer, along with the environment in which it was to be evaluated,
> it would be up to the consumer to properly evaluate the expression.
> As I say, most consumers already have evaluators for source-language
> expressions anyway, and have dealt with the subtleties; we're just
> adding a new way of representing such expressions.

Conveying the environment in which the compilation occurs is the
big problem which you gloss over.  Hiding the complexities of
source language semantics with a bit of hand-waving creates the
problems of unclear specification which DWARF was designed to avoid.

You seem to be under the impression that it is easy to represent all
of the possible variations in translation which a compiler can perform
on a source language.  I'm familiar with compilers which, depending on
user options, can generate significantly different data structures,
requiring significantly different code to evaluate.  It's simply not
a matter saying "+" means DW_add, because the "+" may be operating on
wide variety of data types.

Consider a C++ statement "a = b + c".  How do you represent this
if a, b, and c may be various size scalars, vectors, or template
types with user defined operations?

You might say "everyone knows what an int is" but this is exactly
the implicit assumptions which caused problems in DWARF version 1,
and which led to the development of the base types to explicitly
describe types.

> At the moment, the DWARF spec uses generic tags like
> DW_TAG_pointer_type for every supported language's pointer type, and
> it's left to the producers and consumers how exactly to fit the
> concepts in the language specification onto the set of tags provided
> by DWARF.  At times this process has needed some clarification, but
> it's worked well enough.  To carry this approach over to expressions,
> DWARF would specify tags like DW_TAG_expr_multiply and
> DW_TAG_expr_deference, describing each clearly enough to allow the
> authors of Pascal consumers and producers to understand that the
> latter tag is the right thing to use for Pascal's '^' operator, and to
> allow a C developer to recognize it as unary '*'.

There's really no parallel here, although I can see why you might
think there is.  DW_TAG_pointer_type is primarily a source marker.  What
little semantics there may be is embodied in the base type on which it
depends.  There's no great depth of meaning in saying that the value
of a variable with this type is the address of another value.
DW_TAG_pointer_type is a description, not an operation.

Your proposed DW_TAG_expr_multiply operation would need to be defined
for every possible data type and data structure.  Or you would need
to define some scheme which would have the breadth to be able to
describe all of the possible semantics of this operation.  That's
certainly much more complex than you seem to believe.

Essentially, your suggestion is to turn DWARF into a standardized
intermediate representation, describing all of the features of the
source.  That's beyond the scope of a debugging format.

> Perhaps this is "far out in left field".  But using DWARF expressions
> for this would be a pretty serious extension of their
> responsibilities, and in the non-trivial cases would result in exactly
> the sort of ridiculous complexity (I think implementing floating-point
> in DWARF expressions is ridiculous) that you'd expect from that.  We
> would, in practice if not in the letter of the spec, be restricting
> DWARF to supporting the easy cases.

As indicated above, it isn't clear that you need to implement
floating point operations in DWARF expressions.  But even if you
did, this is not by any means a task of "ridiculous complexity".
It would require perhaps a half-dozen well defined operations
which would operate on the top of the expression stack.

-- 
Michael Eager	 eager at eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077