[Dwarf-Discuss] DWARF and online-compiled programs (Simon Brand)

Simon Brand simon@codeplay.com
Thu Jul 21 08:36:06 GMT 2016


Apologies for the delay, but I have updated the original proposal to 
take Pedro's suggestion into account.

Instead of using DW_AT_location and DW_LNCT_location, it uses 
DW_AT_source and DW_LNCT_source so the source is embedded in the DWARF 
and doesn't need to be loaded. These are described below.

=========================================================================
Changes to compile unit sections:
I added a DW_AT_source attribute, which is a string attribute 
identifying the source code. The intention is for implementations to use 
DW_FORM_strp so that the string is held in the .debug_str section and 
referenced from both the compile unit DIE and line table.

Section 3.1.1:
Replace bullet 2 with this:
A DW_AT_name or DW_AT_source attribute identifying the primary source 
from which the compilation unit was derived. If a DW_AT_name attribute 
is used, its value is a null-terminated string containing the full or 
relative path name of the source file. If a DW_AT_source attribute is 
used, its value is a null-terminated string containing the full contents 
of the source code from which the compilation unit was derived is 
stored. The source code string is UTF-8 encoded and encodes line endings 
with `\n`.

Figure 2:
Add DW_AT_source, which identifies "Embedded source code".

Figure 20:
Add DW_AT_source, whose class is "string"

Figure 42:
Add DW_AT_source to DW_TAG_compile_unit and DW_TAG_partial_unit entries.

=========================================================================


=========================================================================
Changes to line table sections:
I have based my modifications off of issue 140724.1. I don't know if 
this has since been modified, so there may be some inconsistencies.

These changes are a bit more complex, as there is currently the 
assumption that a given .debug_line section will only have a single 
file_name_entry_format. This would not support having a mix of usual 
source files and source-in-memory in the same program.

One solution would be to add the concept of a file name entry set, of 
which there can be more than one in a given header, and each can have 
its own file_name_entry_format. The header would contain a field 
specifying the number of file_name_entry_sets, then fields 17-21 would 
be repeated for each set. Another possibility would be to encode the 
sets in the same file_name_entry_format and file_names fields, but 
specify the sizes of each set. This is not quite as clear, but it seems 
desirable to avoid repeating the fields. I've sketched out the second 
option below.

------------------------------------------------------------------------
     Field  Field Name Value(s)
     Number
     1        Same as in Version 4                         ...
     2        version 5
     3        Not present in Version 4  -
     4        Not present in Version 4  -
     5-12   Same as in Version 4                         ...
     13      directory_entry_format_count           1
     14      directory_entry_format DW_LNCT_path, DW_FORM_string
     15      directories_count <n+1>
     16      directories <n+1>*<null terminated string>
     17      file_name_entry_set_count                2
     18      file_name_entry_format_set_counts  4,2
     19      file_name_entry_format DW_LNCT_path, DW_FORM_string,
        DW_LNCT_directory_index, DW_FORM_udata,
       DW_LNCT_timestamp, DW_FORM_udata,
       DW_LNCT_size, DW_FORM_udata,
       DW_LNCT_source, DW_FORM_strp,
       DW_LNCT_size, DW_FORM_udata

     20      file_name_set_count <m>, <n>
     21      file_names <m>*{<null terminated string>,
<index>, <timestamp>, <size>},
                                             <n>*{<source offset>, <size>}
------------------------------------------------------------------------

Section 6.2.4:
Add bullets after "16. directories"
17. file_name_entry_set_count (ubyte)
A count of the number of file name entry sets that occur in the 
following fields. If this field is zero, then the 
file_name_entry_format_set_sizes field (see below) must also be zero.

18. file_name_entry_format_set_counts (sequence of ubytes)
A sequence of counts of the number of entry formats for each file name 
entry set.

Section 6.2.4.1:
Add bullet after "5. DW_LNCT_MD5"
6. DW_LNCT_source
    The component is an offset into the .debug_str section where a 
null-terminated string contains the source code from which the 
compilation unit was derived. The string is UTF-8 encoded and encodes 
line endings using '\n'. Only one of DW_LNCT_path and DW_LNCT_source 
will be specified for a given file_name_entry_format. This content code 
is paired with the form DW_FORM_strp.

Append paragraph to bullet 1:
Only one of DW_LNCT_path and DW_LNCT_source will be specified for a 
given file_name_entry_format.

Add paragraph after the first paragraph of bullet 2:
The index is 0 if the source is identified by a memory location.

Table 7.25:
Add DW_LNCT_source 0x6 to the table

The description for DW_LNE_define_file will also need updating with 
similar text.

=========================================================================

Thanks,

Simon

On 15/06/16 15:01, Simon Brand wrote:
>
> This approach seems promising; it's probably easier to specify and 
> keeps all the necessary information within DWARF. We'd definitely need 
> to allow the line table and compile unit information to reference the 
> same string so that there's no duplication.
>
> Thanks,
>
> Simon
>
>
> On 13/06/16 11:22, Pedro Alves wrote:
>> On 06/13/2016 09:07 AM, Simon Brand wrote:
>>
>>> Also, the source might not be "in memory", as it might be placed in a
>>> binary segment which is not loaded. The debugger should be interpreting
>>> the location as something like the ELF virtual address to locate the
>>> source in the object file.
>> In that case, wouldn't it be better to instead represent the source
>> as another debug string, with DW_FORM_string, or
>> .debug_str/DW_FORM_strp, or .debug_str_offsets/DW_FORM_strx, etc. ?
>>
>> For example, add a new attribute, e.g., DW_AT_source, and in 3.1.1
>> where DW_AT_name is described:
>>
>>   Compilation unit entries may have the following attributes:
>>   ...
>>   2. A DW_AT_name attribute whose value is a null-terminated string containing
>>   the full or relative path name of the primary source file from which the
>>   compilation unit was derived.
>>
>> Change it to say either DW_AT_name or DW_AT_source.  Something like:
>>
>>   2. Either a DW_AT_name attribute whose value is a null-terminated string
>>   containing the full or relative path name of the primary source file from
>>   which the compilation unit was derived, or a DW_AT_source attribute whose
>>   value is a null-terminated string containing the full contents of the source
>>   code from which the compilation unit was derived.
>>
>> Thanks,
>> Pedro Alves
>>
>
> -- 
> Simon Brand
> Staff Software Engineer
> Codeplay Software Ltd
> Level C, Argyle House, 3 Lady Lawson St, Edinburgh, EH3 9DR
> Tel: 0131 466 0503
> Fax: 0131 557 6600
> Website:http://www.codeplay.com
> Twitter:https://twitter.com/codeplaysoft
>
> This email and any attachments may contain confidential and /or privileged information and is for use by the addressee only. If you are not the intended recipient, please notify Codeplay Software Ltd immediately and delete the message from your computer. You may not copy or forward it, or use or disclose its contents to any other person. Any views or other information in this message which do not relate to our business are not authorized by Codeplay software Ltd, nor does this message form part of any contract unless so stated.
> As internet communications are capable of data corruption Codeplay Software Ltd does not accept any responsibility for any changes made to this message after it was sent. Please note that Codeplay Software Ltd does not accept any liability or responsibility for viruses and it is your responsibility to scan any attachments.
> Company registered in England and Wales, number: 04567874
> Registered office: 81 Linkfield Street, Redhill RH1 6BY
>
>
> _______________________________________________
> Dwarf-Discuss mailing list
> Dwarf-Discuss at lists.dwarfstd.org
> http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org

-- 
Simon Brand
Staff Software Engineer
Codeplay Software Ltd
Level C, Argyle House, 3 Lady Lawson St, Edinburgh, EH3 9DR
Tel: 0131 466 0503
Fax: 0131 557 6600
Website: http://www.codeplay.com
Twitter: https://twitter.com/codeplaysoft

This email and any attachments may contain confidential and /or privileged information and is for use by the addressee only. If you are not the intended recipient, please notify Codeplay Software Ltd immediately and delete the message from your computer. You may not copy or forward it, or use or disclose its contents to any other person. Any views or other information in this message which do not relate to our business are not authorized by Codeplay software Ltd, nor does this message form part of any contract unless so stated.
As internet communications are capable of data corruption Codeplay Software Ltd does not accept any responsibility for any changes made to this message after it was sent. Please note that Codeplay Software Ltd does not accept any liability or responsibility for viruses and it is your responsibility to scan any attachments.
Company registered in England and Wales, number: 04567874
Registered office: 81 Linkfield Street, Redhill RH1 6BY

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.dwarfstd.org/private.cgi/dwarf-discuss-dwarfstd.org/attachments/20160721/518accc3/attachment.htm>



More information about the Dwarf-discuss mailing list