[Dwarf-Discuss] Line table "no-op" sequence

Tue Apr 24 15:04:45 GMT 2018

Recently I had a chat with one of the linker developers on my team.
He was trying to work out how to insert what would effectively be a
no-op sequence into the line table.

One reason to do this is if a producer wanted to pad the line table
for a compilation unit, either for alignment purposes or to leave
room for expansion (e.g. to simplify incremental linking).

Another reason is if the linker decides to omit a function (e.g. if
nothing references it, the code can be dead-stripped) then it could
overwrite the related sequence(s) in the line-number program, rather
than remove then and shrink the entire line table.

Arguably you could just increase the length in the header, but then 
a dumper (or other consumer) could become confused by whatever is left 
after the last sequence.  I think the padding needs to make sense to a 
consumer; i.e., syntactically it needs to look like another sequence.

In order to look like a sequence, the padding would have to end with 
an end-sequence extended opcode, which is three bytes. Poking around 
in the spec for something that would effectively behave as a one-byte 
NOP, it looks to me like there are a few standard opcodes that take no 
operands and do not generate rows in the virtual line table:
DW_LNS_negate_stmt
DW_LNS_set_basic_block
DW_LNS_set_prologue_end
DW_LNS_set_epilogue_begin

Using one of the first two has the advantage that they are defined as
of DWARF v2, so the linker doesn't have to pay attention to the DWARF
version of the line table.  DW_LNS_set_basic_block is probably a tiny
bit more efficient than negate_stmt, as the former writes a flag while
the latter does a read+write.

The requirement to end the padding with an end-sequence does mean that
the padding has to be at least three bytes long, but padding using this
tactic can be any amount larger than that.

The specification says that DW_LNE_end_sequence does create a row in
the table, "whose address is that of the byte after the last target 
machine instruction of the sequence."  In general, this opcode can't 
know where the last instruction is, or how long that instruction is, 
therefore normally it would be preceded by some opcode that sets the 
address register.  That is, end-sequence doesn't modify the address 
register before emitting the row.  In the padding scenario, the address 
would be zero, giving us a zero-length sequence.  Hopefully this would 
not confuse any existing consumers too badly.

What do people think?  I'm happy to write up a short bit for the wiki
Best Practices page.

(I'll probably be embarrassed to find that this was discussed before
and I've forgotten, but it does seems worth a note on the wiki.)
Thanks,
--paulr