[Dwarf-Discuss] DWARF and online-compiled programs

Simon Brand simon at codeplay.com
Thu Jun 9 07:12:01 PDT 2016

Hi everyone,

I'm a member of the HSA Foundation (http://www.hsafoundation.com/) tools 
working group. We're currently working on tools and standards for 
debugging heterogeneous systems. There are many new problems which we 
are coming across in designing these solutions and we are hoping to 
involve other standards bodies in our efforts.

I'm writing this email in particular to address the problem of 
referencing source files in DWARF for online-compiled programs. The 
issue is that programming models such as OpenCL can often have source 
generated at runtime,which is compiled online, with its output not 
written to file. This raises an issue for the compiler: in the generated 
DWARF, what should it put as the file name of the compile unit and 
associated line table information?

Common solutions to this problem include generating some temporary 
source file name and having a contract with the debugger to get the 
source somehow and write it out to that file. Since OpenCL and friends 
generally have quite small source files, it's actually quite reasonable 
to embed the entire source in the binary, then have the debugger look in 
a known section or address to extract the source. If there was a way to 
express this in DWARF, then runtime-generated source files could work 
without an additional contract between the compiler and debugger.

I have written up a possible way in which this could be specified in the 
standard and am hoping for some advice on how we can develop this idea 
to solve our problem in a standard, uniform manner. I'm completely open 
to other solutions which fulfil our aims, so please feel free to suggest 
alternatives or major changes to this approach.

Changes to compile unit sections:
These changes are pretty simple - I just add the possibility for the 
source to be identified by a DW_AT_location attribute instead of a 
DW_AT_name attribute.

Section 3.1.1:
Replace bullet 2 with this:
A DW_AT_name or DW_AT_location attribute identifying the primary source 
from which the compilation unit was derived. If a DW_AT_name attribute 
is used, its value is a nul-terminated string containing the full or 
relative path name of the source file. If a DW_AT_location attribute is 
used, its value is the virtual address of a null-terminated string 
containing the UTF-8 encoded source code.

Figure 42:
Add DW_AT_location to DW_TAG_compile_unit and DW_TAG_partial_unit entries.

Changes to line table sections:
I have based my modifications off of issue 140724.1. I don't know if 
this has since been modified, so there may be some inconsistencies.

These changes are a bit more complex, as there is currently the 
assumption that a given .debug_line section will only have a single 
file_name_entry_format. This would not support having a mix of usual 
source files and embedded source in the same program.

One solution would be to add the concept of a 'file name entry set', of 
which there can be more than one in a given header, and each can have 
its own file_name_entry_format. The header would contain a field 
specifying the number of file_name_entry_sets, then fields 17-21 would 
be repeated for each set. Another possibility would be to encode the 
sets in the same file_name_entry_format and file_names fields, but 
specify the sizes of each set. This is not quite as clear, but it seems 
desirable to avoid repeating the fields. I've sketched out the second 
option below.

Field Field Name Value(s)
1 Same as in Version 4 ...
2 version 5
3 Not present in Version 4 -
4 Not present in Version 4 -
5-12 Same as in Version 4 ...
13 directory_entry_format_count 1
14 directory_entry_format DW_LNCT_path, DW_FORM_string
15 directories_count <n+1>
16 directories <n+1>*<null terminated string>
17 file_name_entry_set_count 2
18 file_name_entry_format_set_counts 4,2
19 file_name_entry_format DW_LNCT_path, DW_FORM_string,
DW_LNCT_directory_index, DW_FORM_udata,
DW_LNCT_timestamp, DW_FORM_udata,
DW_LNCT_size, DW_FORM_udata,
DW_LNCT_location, DW_FORM_exprloc,
DW_LNCT_size, DW_FORM_udata

20 file_name_set_count <m>, <n>
21 file_names <m>*{<null terminated string>,
<index>, <timestamp>, <size>},
<n>*{<source location>, <size>}

Section 6.2.4:
Add bullets after "16. directories"
17. file_name_entry_set_count (ubyte)
A count of the number of file name entry sets that occur in the 
following fields. If this field is zero, then the 
file_name_entry_format_set_sizes field (see below) must also be zero.

18. file_name_entry_format_set_counts (sequence of ubytes)
A sequence of counts of the number of entry formats for each file name 
entry set.

Add bullet after DW_LNCT_MD5
6. DW_LNCT_location
The component is the virtual address of a null-terminated string in 
memory containing the UTF-8 encoded source code. It is paired with the 
form DW_FORM_exprloc. Only one of DW_LNCT_path and DW_LNCT_location will 
be specified for a given file_name_entry_format.

Append paragraph to bullet 1:
Only one of DW_LNCT_path and DW_LNCT_location will be specified for a 
given file_name_entry_format.

Add paragraph after the first paragraph of bullet 2:
The index is 0 if the source is identified by a virtual address.

Table 7.25:
Add DW_LNCT_location 0x6 to the table

The description for DW_LNE_define_file may also need updating, although 
I don't know where to find the current version of this.


Is this something which the committee would consider adopting? I'm happy 
to discuss any feedback you all have on this.


Simon Brand

