[Dwarf-Discuss] implementing SFN, support for multiple views per PC

Fri Sep 9 19:42:55 GMT 2016

SFN stands for Statement Frontier Notes, a technique I specified several
years ago and presented at the GCC Summit to allow for finer-grained
location information, so as to enable debug information consumers to
compute/observe multiple views of an underlying program's state at the
same program counter.  More specifically, it would enable debuggers to
single-step over source code lines even though no actual instructions
were generated between the recommended breakpoints for those lines.
For some more details about the idea, please have a look at the paper
and the slides of the presentation at
http://people.redhat.com/aoliva/papers/sfn/

TL;DR: each state in the source program's execution in the target
language's virtual machine amounts to a view of the program state.  The
idea is to assign identifiers to (some) such views, and to add those
view identifiers to the line number table and to location lists.  This
message is about how to do accomplish that in a compact and
backward-compatible way.

Although we could have view identifiers global within a CU, if we took
the PC as part of the view identifier, we'd just need a new column to
discriminate different views associated with the same PC.  Depending on
details to be discussed below, we might need an explicit increment
operation, and an explicit reset-to-zero operation.

I don't intend to fundamentally change the format of location lists, so
I'm thinking of an indication that a location list is augmented by view
identifiers.  In the paper, I suggested view pairs corresponding to
address pairs to follow the location list (i.e., we'd have L0b, L0e,
<DWexpr0>, L1b, L1e, <DWexpr1>, ..., Lnb, Lne, <DWexprn>, 0, 0, V0b,
V0e, V1b, V1e, ..., Vnb, Vne) but I'm now reconsidering that detail, for
two reasons: (i) a location list entry is occasionally misinterpreted as
the end of the list, if it happens to be an empty range at the base
address for the list, and (ii) other extensions to location lists might
make sense, and this arrangement would stop more than one such extension
to follow the location list.

So I'm now inclined to use an offset attribute, rather than a boolean
attribute, to indicate the presence of views to augment a location list.
A leb128 offset from the location list address to the view list address
would suggest them to be placed close to each other, but without
mandating any specific relative placement, at a reasonably small cost:
probably one or two bytes at the user DIE of a typical location list.

(A radical departure from current location lists would be to introduce
another location list type with view identifier ranges rather than
address ranges.  That would suggest using global identifiers within a
CU, perhaps even an implicit counter that identifiers each explicitly
specified line in the line number table.  View identifiers would then
map to a PC, and look up in location lists would be somewhat indirect.
That might work, but such location lists would be unusable by existing
consumers, so I'm not inclined to explore this possibility any further.)

One of the challenges is to enable either the compiler or the assembler
to generate line number programs, while only the compiler can generate
location lists (and view lists).

Consider this: the compiler can't always know whether two labels are at
the same address, if they are separated by alignment padding that turns
out to be empty (even between different sections!), or by other
pseudo-instructions or asm statements that don't advance the PC.

So, if we were to mandate any opcodes that change the PC to reset the
view counter, and those that don't to increment the view counter after
adding a line to the table, we could end up with out-of-sync view
numbers in location lists, because the compiler could guess wrong
whether the view counter was reset, and it has to fill in the view
numbers itself.  This suggests that any opcode that advances the PC by
an offset that could be computed by the assembler should NOT reset the
view counter, whereas any one that requires the compiler to know the
exact offset on its own could do so if the offset is nonzero.

Unless we're speaking of VLIW: must the compiler be able to distinguish
between operation advances within the same instruction address, and
those that change the address?  I'm thinking view numbers should advance
rather than reset when we advance to another operation within the same
instruction pack (i.e., without an address change), but I'm not sure
compilers must always keep track of that.  I suspect so, given all the
other complexities of VLIW, but I'm still a bit concerned about getting
compiler and DWARF view numbers out of sync if the compiler advances one
operation expecting us to remain at the same address with a higher view
number, but the next operation happens to imply a different address, in
which case the view counter in the line table would get implicitly
reset.  Thoughts?  Should we even worry about this, considering that
line number tables can be handled by assemblers, and then the compiler
wouldn't have to worry about any of this?

Indeed, once we start using the assembler to deal with view counts and
addresses, things get a lot simpler.  We could still have view counts
handled entirely implicitly, and have the compiler refer to view numbers
of labels in augmented location lists, to denote the view assigned by
the assembler (or computed by the assembler given the implicit
calculations performed as part of the line number program).

(I've considered the possibility of having the compiler explicitly
supply view numbers to the assembler in .loc directives, to then use
them explicitly in location lists, but this seems to make little sense;
the only case in which it might be sensible would be to go back to a PC
for which we've already emitted line number table entries and reset the
view counter so that it doesn't overlap with already-emitted view
numbers.  I don't see that we might ever have to do this: if we're going
back to a PC, even if we've already emitted line entries for it, it must
have been as the end of a sequence, with a just-reset view count, so
starting over at view number zero, at a different sequence,
won't/shouldn't be a problem: it was used as one-past-the-end-of-a-range
before, and it's used as the beginning-of-a-range now.  Am I missing
anything?)

As for how to represent view numbers in augmented location lists...  We
could emit them as a sequence of uleb128-encoded view numbers and be
done with it.  However, we could make them even more compact if we
assumed that we won't have very many views at the same PC very often.
Say, we could allow a pair of view numbers to be encoded in a single
uleb128 octet, shifting left the second view count by four, the first
view count left by one, and setting the LSB to indicate this number
encodes a pair of view numbers whose first element fits in 3 bits.  If
it doesn't, then we just shift it left by one, leaving the LSB reset,
and output that amount as uleb128.  Now, does this approach make sense,
or am I overdoing it?

To sum it up, here's the design that I'm leaning towards in a smallish
picture:

Source program:

1 int f(int a, int b, int c, int d) {
2   int x = a + b;
3   int y = c * d;
4   x -= y;
5   return x;
6 }

Optimized asm, output by the compiler:

.Ltext:

[...]

f:
.LVU0: .loc 1 1 is_stmt 0 # view 0
        mov     r4 <- *(sp+12)
        mov     r5 <- *(sp+16)
        mov     r2 <- *(sp+4)
        mov     r3 <- *(sp+8)
.LVU1: .loc 1 3 is_stmt 0 # view 0
	mul     r6 <- r4, r5
.LVU2: .loc 1 2 is_stmt 1 # view 0
        add     r7 <- r2, r3
.LVU3: .loc 1 3 is_stmt 1 # view 0
.LVU4: .loc 1 4 is_stmt 1 # view 1
        sub     r1 <- r7, r6
.LVU5: .loc 1 5 is_stmt 1 # view 0
	ret
.LFE0:

[...]

.uleb128 <?> # DW_TAG_variable
.ascii "x\0" #   DW_AT_name x
.byte 1      #   DW_AT_decl_file
.byte 2      #   DW_AT_decl_line
.long ??     #   DW_AT_type
.long .LLST0 #   DW_AT_location
.leb128 .LVST0 - .LLST0  # DW_AT_locviews
.uleb128 <?> # DW_TAG_variable
.ascii "y\0" #   DW_AT_name y
.byte 1      #   DW_AT_decl_file
.byte 3      #   DW_AT_decl_line
.long ??     #   DW_AT_type
.long .LLST1  #   DW_AT_location
.leb128 .LVST1 - .LLST1  # DW_AT_locviews

[...]

.LVST0:  # it could be right before the corresponding LLST
.view .LVU3, .LVU5, .LVU5, .LVU6

.LLST0:
.long .LVU3 - .Ltext, .LVU5 - .Ltext
.byte ... # DW_OP_reg7
.long .LVU5 - .Ltext, .LVU6 - .Ltext
.byte ... # DW_OP_reg1
.long 0, 0

.LLST1:
.long .LVU4 - .Ltext, .LVU6 - .Ltext
.byte ... # DW_OP_reg6
.long 0, 0

.LVST1:  # or it could be right after the corresponding LLST (or anywhere)
.view .LVU4, .LVU6

Line number program generated by the assembler:

[Line = 1, is_stmt = 0]
XOp2: set PC to <.LVU0>           (resets View)
Copy (View++ = and then tentatively increment View for subsequent use)
Spec: advance PC by <.LVU1-.LVU0> (resets View) and Line by 2 (to 3) (View++)
Negate is_stmt (to 1)
Spec: advance PC by <.LVU2-.LVU1> (resets View) and Line by-1 (to 2) (View++)
Spec: advance PC by <.LVU3-.LVU2> (resets View) and Line by-1 (to 3) (View++)
Spec: advance PC by <.LVU4-.LVU3> (View is 1)   and Line by 1 (to 4) (View++)
Spec: advance PC by <.LVU5-.LVU4> (resets View) and Line by 1 (to 5) (View++)
Advance PC by <.LFE0-.LVU5>       (resets View)  (*)
XOp1: End of Sequence

(*) this is DW_LNS_advance_pc, not DW_LNS_fixed_advance_pc; the latter,
to be used by a compiler dealing with view computations internally in
situations of uncertainty about whether the offset is zero, would NOT
reset View.

Compact view encodings:

.LVST0:
.uleb128 (0<<4|0<<1|1), (0<<4|0<<1|1)

.LVST1:
.uleb128 (0<<4|1<<1|1)

Can anyone spot any problems with this proposal, particularly WRT the
fully implicit handling of view numbers in line number programs and
their use by compilers?

Is this (view numbers in line number tables, location list augmentation
with view numbers referenced by a new attribute, the compact encoding of
view numbers) something that DWARF might want to adopt in a future
standard (presumably not version 5)?  Are there any amendments that are
deemed necessary right away?

Thanks in advance,

-- 
Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer