[Dwarf-Discuss] Segment selectors for Harvard architectures

Thu Mar 19 18:05:16 GMT 2020

This recently came up in the LLVM project.  Harvard architectures
put code and data into separate address spaces, but those spaces
are not explicit; instructions that load/store memory implicitly
use the data space, while things like taking a function address or 
doing indirect branches will implicitly use the code space.  This 
doubles the effective size of memory without consuming an address 
bit, as well as having other secondary benefits like not allowing
self-modifying code.

Nearly all of the DWARF information does not need to distinguish
between code and address spaces, because it's easy to derive that
from context.  Addresses in the line table or a range list will be
code addresses; in .debug_info, addresses of code elements will be
code addresses, while variables will be data addresses. And so on.

This only seems to break down in the .debug_aranges section, which
records both data and code addresses without any context to let a
consumer know which is what.  In a flat-address architecture, no
distinction is needed; in a segmented architecture, there will be
a segment selector as part of any address, and that includes the
.debug_aranges section.  What about for Harvard architectures?

What I suggested in the LLVM project is that .debug_aranges would
have a 1-byte segment selector and use some trivial scheme such as
0=code, 1=data to distinguish what kind of address it is.  Other
DWARF sections wouldn't need a selector because they can all use
context to figure it out; this avoids the size overhead of using
segment selectors everywhere else.

Pavel Labath pointed out that this seems inconsistent and might
make consumers unhappy; segment selectors are described as a
characteristic of the target architecture, so having them in one
place and not others might look suspicious.  IMO it's a reasonable 
"permissive" use of the existing DWARF structures, but it seemed
worth asking here.

Does this (segment selector only in .debug_aranges) sound okay?
Should there be non-normative text or a wiki description of this?
Do we want to codify the 0=code 1=data use of segment selectors
for all Harvard architectures (that don't otherwise have explicit
segements) so that this doesn't have to be set by ABI committees?

I'm willing to write up whatever needs writing up, either as a
proposal or as a wiki entry.

Thanks,
--paulr