[Dwarf-Discuss] DWARF and source text embedding

Michael Eager eager@eagercon.com
Fri Feb 2 01:32:03 GMT 2018

On 02/01/2018 08:07 AM, scott at scottlinder.com wrote:
> Hi John,
> In the case where the files are actually available on disk, and the 
> source is simply being "cached", the attributes are exactly the same. In 
> the case where sources are generated, and so have no true path on disk, 
> I would suggest we might just leave the exact meaning to be 
> implementation defined; the producer can still provide valuable 
> information which will aid in locating where sources originate, such as 
> indicating the OpenCL kernel name. Consumers which are unaware of this 
> extension will simply fail to find the source (as before), while new 
> consumers can at least provide an identifier to distinguish sources.

Implementation-defined generally means that different implementations
will be incompatible.  Incompatible implementations are the antithesis
of a standard.

As a general DWARF principle, there should be no secret understandings
between producer and consumer. There should be no "secret handshake"
such as the one you describe where a producer provides "valuable
information" in some undefined manner usable only by a consumer which
is "in on the secret".  It's not that a different consumer doesn't
implement the extension, it's that a different consumer cannot implement
the extension.

Attributes which have a defined meaning, such as AT_name or AT_comp_dir,
should have a well defined meaning in all circumstances.

> The remaining attributes (DW_AT_language, DW_AT_producer, etc.) seem 
> pretty naturally orthogonal.
> Regards,
> Scott
> On 2018-01-31 14:40, John DelSignore wrote:
>> Hi Scott,
>> Question: What does the DW_TAG_compile_unit look like for an embedded
>> source file? For example, what does the DW_AT_name and DW_AT_comp_dir
>> look like?
>> Cheers, John D.
>> On 01/31/18 17:05, scott at scottlinder.com wrote:
>>> Hello all,
>>> I am a compiler engineer at AMD, working on tools for debugging 
>>> online-compiled
>>> programs. The problem I am attempting to solve was brought up 
>>> previously in the
>>> DWARF Standard issue 161018.1 titled "DWARF-embedded source for 
>>> online-compiled
>>> programs", and is the result of runtimes like OpenCL doing online 
>>> compilation
>>> in an environment where it is not desireable (or even feasible) to write
>>> sources to disk. In these cases, it would be useful to support 
>>> embedding the
>>> source directly in the resulting DWARF. I would like to propose a 
>>> similar
>>> solution to the one outlined in the above issue, but without 
>>> structural changes
>>> to the specification.
>>> ====
>>> Add two new optional fields to the file_names prologue of the line 
>>> table.
>>> Section
>>> Add two bullets after "5. DW_LNCT_MD5"
>>> 6. DW_LNCT_has_source
>>> ??? DW_LNCT_has_source indicates that the value is a boolean which 
>>> affects the
>>> ??? interpretation of an accompanying DW_LNCT_source value. When 
>>> present there
>>> ??? must be an accompanying DW_LNCT_source value. When true, 
>>> consumers may use
>>> ??? the embedded source instead of attempting to discover the source 
>>> on disk.
>>> ??? When false, consumers will ignore the DW_LNCT_source value. This 
>>> code point
>>> ??? is always paired with a flag form (e.g. DW_FORM_flag or
>>> ??? DW_FORM_flag_present).
>>> 7. DW_LNCT_source
>>> ??? DW_LNCT_source indicates that the value is a null-terminated 
>>> string which
>>> ??? is the original source text of the file. When present there must 
>>> be an
>>> ??? accompanying DW_LNCT_has_source value. The string will contain 
>>> the UTF-8
>>> ??? encoded source text with '\n' line endings. When the accompanying
>>> ??? DW_LNCT_has_source value is false, the value of DW_LNCT_source 
>>> will be the
>>> ??? empty string. This code point is always paired with a string form 
>>> (e.g.
>>> ??? DW_FORM_string, DW_FORM_line_strp, DW_FORM_strp).
>>> New type codes can be allocated for them in a backwards-compatible 
>>> way, or
>>> codes for these new content types can be added in the range of
>>> [DW_LNCT_lo_user, DW_LNCT_hi_user] to avoid changing the spec itself.
>>> Table 7.27:
>>> Add DW_LNCT_has_source? 0x6
>>> Add DW_LNCT_source????? 0x7
>>> Any DWARFv5 consumer which is unaware of this extension would 
>>> continue to
>>> operate as before, ignoring the new fields. Any consumer which is 
>>> aware of the
>>> extension would know to check DW_LNCT_has_source for each file_name 
>>> entry in
>>> order to determine whether the embedded source field (DW_LNCT_source) 
>>> contains
>>> the source text of the corresponding file.
>>> ====
>>> My team and I believe this simplifies the design by removing the need 
>>> for
>>> changes to the compile unit sections, and by avoiding the addition of 
>>> multiple
>>> file_name_entry_formats in a single program, all without sacrificing any
>>> information. We have a preliminary implementation in LLVM/Clang, 
>>> which supports
>>> embedding source (clang -gdwarf-5 -gembed-source) and inspecting it via
>>> llvm-dwarfdump and llvm-objdump (with the -source flag). The patches are
>>> available at https://reviews.llvm.org/D42765 (LLVM) and
>>> https://reviews.llvm.org/D42766 (Clang).
>>> I would like any and all feedback on the design, and want to see 
>>> about the
>>> possibility of adding the new content type codes outside of the 
>>> "user" range
>>> (i.e. adding new entries for them in Table 7.27) in the next version 
>>> of the
>>> specification.
>>> Regards,
>>> Scott Linder
>>> _______________________________________________
>>> Dwarf-Discuss mailing list
>>> Dwarf-Discuss at lists.dwarfstd.org
>>> http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
> _______________________________________________
> Dwarf-Discuss mailing list
> Dwarf-Discuss at lists.dwarfstd.org
> http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org

Michael Eager    eager at eagercon.com
1960 Park Blvd., Palo Alto, CA 94306

More information about the Dwarf-discuss mailing list