[Dwarf-Discuss] DWARF on systems where memory is not byte addressable

Thu Jul 26 15:37:36 GMT 2012

On 07/25/2012 10:45 PM, Joeri van Ruth wrote:
> Hello all, I am wondering about how to deal with platforms with word
> memories, by which I mean that the smallest addressable unit in memory
> is (in our current case) 32 bits wide.  This means that at the C level,
>
> 	sizeof(char) == sizeof(short) == sizeof(int) == 1,
>
> so far so good.  However, we are having problems with gdb.  I am aware
> that this may be entirely gdb specific but I do note that the standard
> does not spend a lot of words on the issues that arise here, that's
> why I bring this up here.
>
> The standard does not seem to define anywhere how large a byte is
> supposed to be.  Historically, older architectures used anything out
> of 6, 7, 8 and 9 bits bytes which is why networking standards tend to
> speak of octets instead.  DWARF seems to assume 8 bit bytes, hence the
> LEB128 encoding, but it does not state so explicitly unless I
> overlooked something.

DWARF does not make any assumptions that a byte on the architecture
being described is any particular size.  It does assume that data files
containing the DWARF debug data are read or written in 8-bit bytes.
LEB128 is a method for encoding arbitrary length integers into a
sequence of 8-bit bytes in the DWARF data.

> A C oriented view might consider that sizeof(char) == sizeof(int), and
> as C does not distinguish clearly between byte and char, take a byte
> to be 32 bits wide.  But even that's not always the case as sometimes
> we see word oriented platforms which still take the arithmetic size of
> char to be 8 bits, requiring frequent sign- and zero-extension when
> assigning to a char or short variable.

Your point isn't clear.  The C Standard doesn't describe bytes.  The
size of an an addressing unit doesn't have any direct relationship with
the size of data which can be accessed.  Many RISC architectures only
permit register-sized memory transfers, while the address unit is
8-bit bytes.  There are some word-addressed architectures which allow
memory reference to shorter data sizes.

Word-oriented platforms which have byte-addressable memory seems be a
self-contradiction.

> However, I assume that if the DWARF standard were explicit about the
> size of a byte, it would define a byte to be 8 bits.

There a relevant sections in the standard:

   2.21 Byte and Bit Sizes

   Many debugging information entries allow either a DW_AT_byte_size attribute
   or a DW_AT_bit_size attribute, whose integer constant value (see Section 2.19)
   specifies an amount of storage. The value of the DW_AT_byte_size attribute is
   interpreted in bytes and the value of the DW_AT_bit_size attribute is
   interpreted in bits.

My understanding is that DW_AT_byte_size is in address units (or storage units)
in the target architecture.

> The problem we see with gdb hinges on the DW_AT_byte_size attribute of
> a type descriptor.  Gdb uses it for at least two purposes:
>
> 	- to perform address arithmetic
>
> 	- to determine the bit size of values
>
> If we set the DW_AT_byte_size of an integer to 1, gdb will do the
> address arithmetic correctly, that is, look for int_array[1] at
> address int_array + 1, not + 4, but if you ask for the value of an int
> variable it will only display the lower 8 bits.
>
> If we set the DW_AT_byte_size of int to 4, which indeed sounds
> more consistent given the name _byte_size, gdb will extract the full
> 32 bits of the value but get the address arithmetic wrong as
> int_array[1] now actually accesses int_array[4].
>
> It seems to me that the proper way would be to fix gdb to take the
> addressing size unit into account as general knowledge of the target
> platform, but I can't believe we're the first to come across this.  I
> wonder if anyone on this list has already faced similar issues and
> what they did about it.

Word-addressed architectures are pretty uncommon and I would not be
surprised if gdb did not handle address calculations correctly.
Gdb has the definition TARGET_CHAR_BIT which defines the number of
bits in a char (aka byte) on the target architecture.  It may not
be taking this into account with address arithmetic or memory access.

The DWARF standard may not be as clear as might be desired with the
description of DW_AT_byte_size.  Feel free to submit a comment or
proposed change at http://dwarfstd.org/Comment.php.

-- 
Michael Eager	 eager at eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077