[Dwarf-Discuss] Line table "no-op" sequence

Wed Apr 25 06:15:33 GMT 2018

> Recently I had a chat with one of the linker developers on my team.
> He was trying to work out how to insert what would effectively be a
> no-op sequence into the line table.
>
> One reason to do this is if a producer wanted to pad the line table
> for a compilation unit, either for alignment purposes or to leave
> room for expansion (e.g. to simplify incremental linking).

One technique you haven't mentioned is to stretch out LEB128 numbers
with extra 0x80's. For example, you can represent 0 as a single byte
0x00, or as a string of bytes 0x80 80 80 80 ... 00. It's not advisable
to make the string arbitrarily long, as many LEB128 readers will have
sanity checks in them to stop reading after 5 bytes (for 32-bit
readers) or 10 bytes (for 64-bit readers). But you could certainly use
this technique to pad out the final end sequence opcode to a 4- or
8-byte boundary.

When doing an incremental link, gold will pad the .debug_line section
with a dummy line number program of appropriate length (minimum 29
bytes). Here are the relevant comments:

  // Version of the header.  We write a DWARF-3 header because it's smaller
  // and many tools have not yet been updated to understand the DWARF-4 header.

  // Write header fields: unit_length, version, header_length,
  // minimum_instruction_length, default_is_stmt, line_base, line_range,
  // opcode_base, standard_opcode_lengths[], include_directories, filenames.
  // We set the header_length field to cover the entire hole, so the
  // line number program is empty.

  // Some consumers don't check the header_length field, and simply
  // start reading the line number program immediately following the
  // header.  For those consumers, we fill the remainder of the free
  // space with DW_LNS_set_basic_block opcodes.  These are effectively
  // no-ops: the resulting line table program will not create any rows.

When doing an incremental update, if the replacement .debug_line
contribution is bigger than what it's replacing, I fill the old
contribution with the dummy header, and allocate the space I need from
another padding area. When allocating space out of a padding area, I'm
careful to make sure that the remaining space, if any, is at least 29
bytes, so it can again be filled with a dummy line number program.

I use a similar technique to pad the .debug_info and .debug_types
sections. Those are a bit easier, since we can simply pad the actual
data area with zeroes.

> Another reason is if the linker decides to omit a function (e.g. if
> nothing references it, the code can be dead-stripped) then it could
> overwrite the related sequence(s) in the line-number program, rather
> than remove then and shrink the entire line table.

Another thing you can do is "hide" stuff inside an undocumented
extended opcode. Because extended ops always declare their length, you
can make a single extended op cover whatever hole you have (as long as
it's at least 3 bytes). If you use an extended opcode of, say 0x7f,
which hopefully no one has implemented, any conforming DWARF reader
will simply skip over it without complaint. (I did find one reader at
Google that complained about unknown extended ops, but I was able to
fix that one.) Be wary of playing with the length field for a known
extended op, however, because some readers will simply assume they
know how to parse a known op, and will ignore the explicit length
field (just as they often ignore the standard_opcode_lengths array for
standard opcodes that they know about).

When I was prototyping two-level line tables, I used this trick to
hide the actuals table inside the logicals table, so legacy DWARF
readers would simply see the logicals table as the regular line table.
I even hid the extra prologue header fields inside that block -- the
first new field in the extended header was actually the "magic"
extended op that covered the rest of the header plus the actuals table
itself.

> Arguably you could just increase the length in the header, but then
> a dumper (or other consumer) could become confused by whatever is left
> after the last sequence.  I think the padding needs to make sense to a
> consumer; i.e., syntactically it needs to look like another sequence.
>
> In order to look like a sequence, the padding would have to end with
> an end-sequence extended opcode, which is three bytes. Poking around
> in the spec for something that would effectively behave as a one-byte
> NOP, it looks to me like there are a few standard opcodes that take no
> operands and do not generate rows in the virtual line table:
> DW_LNS_negate_stmt
> DW_LNS_set_basic_block
> DW_LNS_set_prologue_end
> DW_LNS_set_epilogue_begin

You can also use DW_LNS_advance_pc with an arbitrary length LEB128 "0".

-cary