[Dwarf-Discuss] DWARF and source text embedding

scott@scottlinder.com scott
Tue Feb 13 17:40:11 GMT 2018


Michael,

In the case of this proposal, then, I suggest the CU fields
(AT_{name,comp_dir}) retain their exact current definitions. Language
implementations, regardless of whether they might want to support 
embedding
source, currently use the filesystem. This extension is essentially just
cacheing source which may become unavailable to the consumer by the time 
the
program is debugged. This means the producer can put standard values in 
each CU
field, and also embed source in the line table. If in the future there 
is a
need to add CU fields or modify existing ones to capture some other 
attribute,
that can be done in a different proposal.

Scott

On 2018-02-01 17:32, Michael Eager wrote:
> On 02/01/2018 08:07 AM, scott at scottlinder.com wrote:
>> Hi John,
>> 
>> In the case where the files are actually available on disk, and the 
>> source is simply being "cached", the attributes are exactly the same. 
>> In the case where sources are generated, and so have no true path on 
>> disk, I would suggest we might just leave the exact meaning to be 
>> implementation defined; the producer can still provide valuable 
>> information which will aid in locating where sources originate, such 
>> as indicating the OpenCL kernel name. Consumers which are unaware of 
>> this extension will simply fail to find the source (as before), while 
>> new consumers can at least provide an identifier to distinguish 
>> sources.
> 
> Implementation-defined generally means that different implementations
> will be incompatible.  Incompatible implementations are the antithesis
> of a standard.
> 
> As a general DWARF principle, there should be no secret understandings
> between producer and consumer. There should be no "secret handshake"
> such as the one you describe where a producer provides "valuable
> information" in some undefined manner usable only by a consumer which
> is "in on the secret".  It's not that a different consumer doesn't
> implement the extension, it's that a different consumer cannot 
> implement
> the extension.
> 
> Attributes which have a defined meaning, such as AT_name or 
> AT_comp_dir,
> should have a well defined meaning in all circumstances.
> 
>> 
>> The remaining attributes (DW_AT_language, DW_AT_producer, etc.) seem 
>> pretty naturally orthogonal.
>> 
>> Regards,
>> Scott
>> 
>> On 2018-01-31 14:40, John DelSignore wrote:
>>> Hi Scott,
>>> 
>>> Question: What does the DW_TAG_compile_unit look like for an embedded
>>> source file? For example, what does the DW_AT_name and DW_AT_comp_dir
>>> look like?
>>> 
>>> Cheers, John D.
>>> 
>>> 
>>> On 01/31/18 17:05, scott at scottlinder.com wrote:
>>>> Hello all,
>>>> 
>>>> I am a compiler engineer at AMD, working on tools for debugging 
>>>> online-compiled
>>>> programs. The problem I am attempting to solve was brought up 
>>>> previously in the
>>>> DWARF Standard issue 161018.1 titled "DWARF-embedded source for 
>>>> online-compiled
>>>> programs", and is the result of runtimes like OpenCL doing online 
>>>> compilation
>>>> in an environment where it is not desireable (or even feasible) to 
>>>> write
>>>> sources to disk. In these cases, it would be useful to support 
>>>> embedding the
>>>> source directly in the resulting DWARF. I would like to propose a 
>>>> similar
>>>> solution to the one outlined in the above issue, but without 
>>>> structural changes
>>>> to the specification.
>>>> 
>>>> ====
>>>> 
>>>> Add two new optional fields to the file_names prologue of the line 
>>>> table.
>>>> 
>>>> Section 6.2.4.1:
>>>> Add two bullets after "5. DW_LNCT_MD5"
>>>> 6. DW_LNCT_has_source
>>>> ??? DW_LNCT_has_source indicates that the value is a boolean which 
>>>> affects the
>>>> ??? interpretation of an accompanying DW_LNCT_source value. When 
>>>> present there
>>>> ??? must be an accompanying DW_LNCT_source value. When true, 
>>>> consumers may use
>>>> ??? the embedded source instead of attempting to discover the source 
>>>> on disk.
>>>> ??? When false, consumers will ignore the DW_LNCT_source value. This 
>>>> code point
>>>> ??? is always paired with a flag form (e.g. DW_FORM_flag or
>>>> ??? DW_FORM_flag_present).
>>>> 7. DW_LNCT_source
>>>> ??? DW_LNCT_source indicates that the value is a null-terminated 
>>>> string which
>>>> ??? is the original source text of the file. When present there must 
>>>> be an
>>>> ??? accompanying DW_LNCT_has_source value. The string will contain 
>>>> the UTF-8
>>>> ??? encoded source text with '\n' line endings. When the 
>>>> accompanying
>>>> ??? DW_LNCT_has_source value is false, the value of DW_LNCT_source 
>>>> will be the
>>>> ??? empty string. This code point is always paired with a string 
>>>> form (e.g.
>>>> ??? DW_FORM_string, DW_FORM_line_strp, DW_FORM_strp).
>>>> 
>>>> New type codes can be allocated for them in a backwards-compatible 
>>>> way, or
>>>> codes for these new content types can be added in the range of
>>>> [DW_LNCT_lo_user, DW_LNCT_hi_user] to avoid changing the spec 
>>>> itself.
>>>> 
>>>> Table 7.27:
>>>> Add DW_LNCT_has_source? 0x6
>>>> Add DW_LNCT_source????? 0x7
>>>> 
>>>> Any DWARFv5 consumer which is unaware of this extension would 
>>>> continue to
>>>> operate as before, ignoring the new fields. Any consumer which is 
>>>> aware of the
>>>> extension would know to check DW_LNCT_has_source for each file_name 
>>>> entry in
>>>> order to determine whether the embedded source field 
>>>> (DW_LNCT_source) contains
>>>> the source text of the corresponding file.
>>>> 
>>>> ====
>>>> 
>>>> My team and I believe this simplifies the design by removing the 
>>>> need for
>>>> changes to the compile unit sections, and by avoiding the addition 
>>>> of multiple
>>>> file_name_entry_formats in a single program, all without sacrificing 
>>>> any
>>>> information. We have a preliminary implementation in LLVM/Clang, 
>>>> which supports
>>>> embedding source (clang -gdwarf-5 -gembed-source) and inspecting it 
>>>> via
>>>> llvm-dwarfdump and llvm-objdump (with the -source flag). The patches 
>>>> are
>>>> available at https://reviews.llvm.org/D42765 (LLVM) and
>>>> https://reviews.llvm.org/D42766 (Clang).
>>>> 
>>>> I would like any and all feedback on the design, and want to see 
>>>> about the
>>>> possibility of adding the new content type codes outside of the 
>>>> "user" range
>>>> (i.e. adding new entries for them in Table 7.27) in the next version 
>>>> of the
>>>> specification.
>>>> 
>>>> Regards,
>>>> Scott Linder
>>>> 
>>>> _______________________________________________
>>>> Dwarf-Discuss mailing list
>>>> Dwarf-Discuss at lists.dwarfstd.org
>>>> http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
>>>> 
>> _______________________________________________
>> Dwarf-Discuss mailing list
>> Dwarf-Discuss at lists.dwarfstd.org
>> http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
>> 



More information about the Dwarf-discuss mailing list