[Dwarf-Discuss] DWARF on systems where memory is not byte addressable

Thu Jul 26 19:29:35 GMT 2012

________________________________________
From: dwarf-discuss-bounces at lists.dwarfstd.org [dwarf-discuss-bounces at lists.dwarfstd.org] on behalf of Michael Eager [eager@eagercon.com]
Sent: Thursday, July 26, 2012 8:37 AM
To: Joeri van Ruth
Cc: dwarf-discuss at lists.dwarfstd.org
Subject: Re: [Dwarf-Discuss] DWARF on systems where memory is not byte  addressable

Joeri van Ruth wrote:
> Hello all, I am wondering about how to deal with platforms with word
> memories, by which I mean that the smallest addressable unit in memory
> is (in our current case) 32 bits wide.  This means that at the C level,
>
>       sizeof(char) == sizeof(short) == sizeof(int) == 1,
>
> so far so good.  However, we are having problems with gdb.  I am aware
> that this may be entirely gdb specific but I do note that the standard
> does not spend a lot of words on the issues that arise here, that's
> why I bring this up here.
>
> The standard does not seem to define anywhere how large a byte is
> supposed to be.  Historically, older architectures used anything out
> of 6, 7, 8 and 9 bits bytes which is why networking standards tend to
> speak of octets instead.  DWARF seems to assume 8 bit bytes, hence the
> LEB128 encoding, but it does not state so explicitly unless I
> overlooked something.
>
> A C oriented view might consider that sizeof(char) == sizeof(int), and
> as C does not distinguish clearly between byte and char, take a byte
> to be 32 bits wide.  But even that's not always the case as sometimes
> we see word oriented platforms which still take the arithmetic size of
> char to be 8 bits, requiring frequent sign- and zero-extension when
> assigning to a char or short variable.

Michael Eager wrote:
>Word-oriented platforms which have byte-addressable memory seems be a
>self-contradiction.

The PDP-10 (my first machine) was a 36-bit word-addressable machine.
It had a "byte pointer" format that could specify an arbitrary byte within a
word. So, there was a hardware-defined bit pattern to specify any given
byte in memory, and instructions that could load and store just that byte.
Nobody ever described the PDP-10 as byte-addressable, but you could
make a pedantic argument for it.

I remember hearing about a C compiler for the PDP-10 that handled
"char" as bytes addressed with byte pointers, and pointer arithmetic
understood whether it was manipulating a byte pointer or a word address.
Typical practice of that era would have been five 7-bit chars per word.
I don't remember how it handled sizeof().

Making this trip down memory lane relevant to the topic at hand:
This predates DWARF, of course, so sadly I do not have any guidance
about how to use DWARF on a word-addressable machine.
If there's an existing body of software that handles text, on this machine,
the typical way that text is managed (1 char per word? seems
wasteful; 4 chars per word?) that might guide some of your choices
about how to think about "char".

--paulr