[Dwarf-discuss] DW_OP_piece limits and assumptions

Thu Apr 28 12:49:14 GMT 2005

Almost a year ago, Bevin Brett, who leads idb development at
Intel, sent me the following design note, with permission to send
it to the DWARF community when it seemed appropriate or possibly
useful.

Since we are now talking about DW_OP_piece, this seems like a
good time to "publish" it.

Note that after some preliminary material, issues and proposals
regarding DW_OP_piece specifically begin in Section 4.3.2.

Also note that it was originally given to me as an .html file.
Without touching the content, I have taken the liberty of
converting it to plaintext for ease of transmission and
archiving.

Ron

======================================================================

    #1           1-JUN-2004 11:47:09.18
From:   SMTP%"bevin.brett@intel.com"
To:     <ron.brender at hp.com>
CC:
Subj:   DW_OP_bit_piece and related issues

Here is the DN discussing the subject in general, and the need for
DW_OP_bit_piece in particular

/Bevin

----------------------------------------------------------------------

Information in these pages is Publically Presentable so that it
can be shared with the Dwarf-3 committee and anyone else who
happens to be shown it.

1 Status

Work in progress - no yet completed or reviewed

2 Index

  Overview 3
    Problem statement 3.1
  Details 4
    Background Reading 4.1
      Principles Behind Optimizing Compilers 4.1.1
        Choosing Operations and Sequencing 4.1.1.1
        Choosing How To Represent Variables and Values 4.1.1.2
      Debugging Optimized Code 4.1.2
      Optimizing Compilers And Data Layout 4.1.3
        Desirable Possibility - Choose Efficient Representations 4.1.3.1
        Desirable Possibility - Choose Efficient Locations 4.1.3.2
      Can Compilers Really Produce These Effects? 4.1.4
      Debuggers in Optimized Code Can Not Support Assignment 4.1.5
    Theory-Backed Problem Statement 4.2
    Theory-Based Solution 4.3
      Examples 4.3.1
      Describing a Variable's Location (Linux) 4.3.2
        Dwarf Support (Linux) 4.3.2.1
        Add to Dwarf DW_OP_bit_piece (Linux) 4.3.2.2
        Compile-time Constant Pieces 4.3.2.3
        Expressions in the Debugging Information 4.3.2.4
      Describing a Variable's Location (Windows) 4.3.3
      Issues 4.3.4
        Described Arrays 4.3.4.1

3 Overview

3.1 Problem statement

Our goal is to have the debugger correctly show the user the
values of expressions involving variables that the optimizing
compiler has split into pieces, and placed those pieces in
unrelated places.

Highly optimizing compilers place the bits representing the
current value of a variable in different places at different
points in the program's execution, and do not always put them in
contiguous storage.

4 Details

4.1 Background Reading

4.1.1 Principles Behind Optimizing Compilers

The important principle of optimizing compilers is "if it gets a
possible answer, it doesn't matter how it does it". A program
does not specify a single calculation or a precise order of
operations - it specifies a set of possible calculations and
orders of execution and the compiler picks one of them.
Optimizing compilers spend more time to pick one that executes
faster.

4.1.1.1 Choosing Operations and Sequencing

Usually we think of this in terms of choosing and ordering the
instructions. For instance, consider how to do v*v*v*v*v*v*v*v.
Most program languages to not specify the exact order the
multiplication operations must be done it, so the compiler is
free to choose between

   1. ((((((((v*v)*v)*v)*v)... )
   2. ((v*v)*(v*v))*((v*v)*(v*v)) 

but any good optimizing compiler is going to choose the second
one.

4.1.1.2 Choosing How To Represent Variables and Values

But it also happens when the compiler chooses how to represent
and store a variable. For instance, consider the program
fragment...

struct s { float f1; float f2; } v;
v.f1 = 1.2; v.f2 = 5.0;
if (v.f1 * v.f2 == 6.0) put ("exact hit");

There is a very wide range of legal implementations. The program
can not tell whether the variable actually came into existence,
was stored into, fetched from, etc. Since the program can not
tell, the compiler is not required to place two floating point
numbers in consecutive storage. This will complicate debugging
when the user interrupts the execution in the call to put and
asks to examine v. The compiler did not include this possibility
when it decided how to generate the code, and indeed such a
consideration would have resulted in much slower code.

4.1.2 Debugging Optimized Code

The optimizing compiler assumes that no-one is watching the
executing program and hence only the end results are visible.

However, optimized code DOES contain bugs. Some of the set of
calculations represented by the source may not compute the result
the user intends, and so non-optimized compilation may choose one
that works, and optimized may choose one that does not.
Alternatively, the optimizing compiler may have generated bad
code.

The user needs to identify the problem in the sources and fix it.
In either situation examining the execution of the program can
help in identifying the mistake - but watching the execution is
exactly what the compiler assumed was not happening...

If the compiler creates sufficient debugging information, and if
the debugger uses all this information, and if the user
understands some of the limitations, it is possible to get a very
exact view of the calculation - and a somewhat blurred view of
how this relates to the original program source code.

4.1.3 Optimizing Compilers And Data Layout

Most programming languages have three basic concepts in their
data model

   1. Atomic values, such as pointers, integers, and floating
      point numbers
   2. Sets of component values of various types, such as structs
   3. Vectors of component values all of the same type, such as
      arrays 

Purely within the programming language, and especially within its
simpler portable features, it is impossible to determine where
the compiler has decided to place the bits containing the value.

A simple compiler will simply place the bits for each value into
contiguous addressable memory, aligning each component and the
whole value onto some natural boundary to make accessing the
components more efficient. However optimizing compilers can do
much better than this.

To understand the range of possible behaviors, it is necessary to
first explore some desirable possibilities, and then to show that
optimizing compilers can achieve them.

4.1.3.1 Desirable Possibility - Choose Efficient Representations

Consider the implementation of complex numbers and complex
arithmetic.

Mathematically there are two ways to represent complex numbers -
either as a real and imaginary part, or as an angle and a length.
The optimal choice depends on the operations being performed. The
real/imaginary representation is much more efficient for
addition, while the angle/length representation is much more
efficient for multiplication and magnitude.

If a variable is used in one part of the algorithm for addition,
and another for multiplication, the compiler may use different
representations for each part. If the variable is used for both
purposes in a part of the algorithm, but its invariant, the
compiler may decide to use both representations at the same time.

The Dwarf-3 specification does not have support for describing
lifetimes with different representations. This is a major
oversight - and we should keep in mind the possibility of it
being fixed as we study the other issues. If there are two or
more representations of the same value alive at once, it doesn't
matter which the user is shown. This is just one of many reasons
why debuggers can't support assignment in optimized code.

The rest of this discussion will assume that all lifetimes of a
variable have the same representation.

4.1.3.2 Desirable Possibility - Choose Efficient Locations

If the compiler can see all usage of a variable across part of
the program, then it can choose exactly where it wants the bits
to be stored, as well as exactly what representation to use. For
instance it can place the real and imaginary floating-point
components of the complex number in any of the following
locations

   1. Adjacent memory locations - this is the simplest possible
      way.

   2. Two floating-point registers - this is an obvious alternative,
      since they probably need to be there to do complex arithmetic.

   3. One in a floating point register, and the other in storage -
      this might happen if the code only uses one component.

   4. If the compiler can determine that one or other component is
      unused, or is a constant value, it could place just the used/
      variable component in either a floating-point register or in
      storage.

   5. If the complex number is just being copied into multiple
      destinations, it could use two 32-bit integer registers to
      hold the one of the two 64-bit floating-point values, and
      place the other in a floating-point register.

   6. One fourth of the way through the store, it would be correct
      to describe the value has having 32 bits of the first component
      in memory, 32 bits in an integer register, and 64 bits in a
      floating-point register.

   7. There are undoubtably many other ways to spread the bits
      around... 

Arrays present similar opportunities. Consider a compiler than is
trying to generate excellent code for a struct containing a
single 4 element floating-point vector - such as may be used to
implement a 4D vector. If it can see that there is no indexing
into the array in a portion of the code, it could place the array
in 4 floating-point scalar temporaries, and have them get
allocated in various places at various points in the program's
execution.

  ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
  + The important points to realize are                          +
  +                                                              +
  +   1. The value is spread across any of the places (including +
  +      nowhere) that the computer can store bits.              +
  +                                                              +
  +   2. The splitting between the places is arbitrary - it is   +
  +     NOT the component boundary.                              +
  ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Bit-fields introduce two further complications

   1. The splitting of the pieces is NOT the addressable boundary
      - either from the point of view of where they come from, or
      where they are placed.

   2. The bit-fields may be expanded out to a natural arithmetic
      size, using either unsigned or signed extension. 

4.1.4 Can Compilers Really Produce These Effects?

Yes - and here is how...

   1. The Front End can split structures and arrays into individual
      scalar temporaries which are then optimized independently of
      each other.

   2. The machine-code generation can generate code that assumes an
      infinite number of registers, and then those registers can be
      selected arbitrarily after the code has been scheduled and
      peephole-optimized. 

4.1.5 Debuggers in Optimized Code Can Not Support Assignment

The optimization of code depends heavily on finding all the
places where a variable is modified. This means a debugger will
not be able to act as though an assignment statement to a
variable existed in the user's program where one did not. This is
true even for volatile variables, because even in this case it is
not well-defined exactly where in the program the current
execution point is. For instance, the apparently simple...

volatile extern int v;
int f() {...
int sum = 0;
for (int i = 0; i<3; i++) sum += v;
return sum;
}

may actually be implemented as "return 3*v;" unless the
programming language explicitly insists that all the fetches are
done. Or it could be implemented as int t1=v; int t2=v; int t3=v;
return t1+t2+t3; and an assignment to the volatile variable at
the site of the first add will have no effect on the result.

4.2 Theory-Backed Problem Statement

Languages define a canonical layout for variables, the this
layout is visible to user programs via such constructs as pointer
arithmetic, sizeof and offsetof functions, interactions with code
from other languages, and machine-level programming.

However when a compiler can determine that the layout of a
specific variable is not detectable, it is desirable for the
compiler to choose a layout that generates better code than the
default.

This alternative layout needs to be described to the debugger, so
that the debugger can provide some information about the value of
the variable to the user.

4.3 Theory-Based Solution

The compiler will generate debugging information for structure
and array types assuming that the values are in stored in some
canonical layout of consecutive bytes.

The compiler will generate debugging information for each
variable that describes how its bytes are actually placed in
actual memory, registers, etc. at each instruction address.

The debugger will use the variable's debugging information to
find all the bytes, and use the type's debugging information to
understand these bytes.

4.3.1 Examples

Consider an array that the compiler has spread into four
unrelated registers.

Then, at debug time, the debugger may have to perform indexing
operations - and print the four elements correctly. Clearly the
indexing can not assume the array is in memory, or in contiguous
registers! It will do this operation by

    * Understanding the type information, that this is an array
      of 4 4-byte elements, spread 4-bytes apart.

    * Understanding the information about the variable, telling
      where its 16 bytes are.

    * Do the 4*index offset calculation, and recognize that the
      result is within the size of the variable.

    * Run this range through the map found in the the second step,
      and read just the needed bytes from the appropriate registers. 

4.3.2 Describing a Variable's Location (Linux)

4.3.2.1 Dwarf Support (Linux)

As described in Section 2.5 of the Dwarf-3 Specification, a
variable can have a location expression, which can contain
DW_OP_piece operators. There are three major issues with the
current definition of this operator, which cause us problems.

   1. There is no way of specifying the sizes of pieces that are
      not byte-sized.

   2. There is no way of specifying where the piece is stored
      if it is not at the start of a register or an addressable
      unit of memory.

   3. There is no way of specifying that the bits are compile-
      time constant and are known but not stored anywhere. 

4.3.2.2 Add to Dwarf DW_OP_bit_piece (Linux)

We should add another operator, DW_OP_bit_piece, with two
operands - the first being an unsigned LEB128 that gives the size
in bits of the field, and the second being an unsigned LEB128
that gives the offset in bits from the object whose result is at
the top of the stack. Such an operator would resolve the first
two issues.

  +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
  + Another possibility is to use expressions such as             +
  +                                                               +
  +   DW_OP_reg3                                                  +
  +   DW_OP_lit15                                                 +
  +   DW_OP_shl                                                   +
  +   DW_OP_piece                                                 +
  +                                                               +
  + Note: Dwarf3 disallows DW_OP_reg being combined with other    +
  + operations other than DW_OP_piece. This would be an extension +
  + to Dwarf3, and is more complex to describe and implement than +
  + DW_OP_bit_piece - especially the use of masking to establish  +
  + the length and position within the canonical layout.          +
  +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

4.3.2.3 Compile-time Constant Pieces

The compiler could put these out into the readonly section of the
program, and address them there. If this is not viable (eg:
memory-constrained systems) a proposal needs to be made by
someone for whom this is a real problem.

4.3.2.4 Expressions in the Debugging Information

The offsets of struct elements and the indexing of array elements
is described as expressions in the information for the type. This
information does not understand the non-canonical layout of a
specific variable.

Furthermore these expressions do, sometimes, fetch values from
the object itself - array bounds and strides, values from the
vtable, etc. These fetches assume the variable is laid out
canonically - and such an assertion is false.

DW_OP_push_object_address will not push some arbitrary address
onto the Dwarf expression stack, instead it will push an element
that can be used as an operand whose result never looses track of
which variable is in. This is soimewhat similar to the
DW_OP_xderef - each variable is treated like it is in its own
address space, and each component has an offset within that
address space.

DW_OP_deref{,_size} will fetch the value from the appropriate
variable, using the debuggers understanding of where this
variable's bytes are currently located.

Similarly the base address pushed onto the stack before
evaluation of a Location Expression will be encoded as being the
location in some variable.

4.3.3 Describing a Variable's Location (Windows)

TBD.

4.3.4 Issues

4.3.4.1 Described Arrays

This proposal runs into special issues when applied to described
arrays.

In this case the variable's debugging information has to deal
with the descriptor and the described data being spread across
the memory and the registers. This issue is unresolved at this
time.

One possibility is to have the location expressions describe by
the descriptor and the data, and having the canonical layout be
the descriptor immediately preceding the correctly aligned data.

Maintainer	Bevin Brett
Last Modified: 2004-May-30	Bevin Brett