[Dwarf-Discuss] Segment selectors for Harvard architectures

Mon Mar 23 21:42:00 GMT 2020

Paul,

I haven't needed to contend with this issue.  But as I was looking over the
standard, this was my initial gut reaction too: use the segment selectors.  This
use actually does seem like it's a characteristic of the target architecture to
me.  You started the discussion with "Harvard architectures".

DWARF does permit architectures to specify aspects of their DWARF description,
after all.  I can't recall it ever being done *formally*, but it's been done
informally for every architecture that uses DWARF.  At a bare minimum, register
encodings.  And usually you have to root around in somebody else's source code
to find it.

This one has a slightly higher chance of breaking a consumer, if that consumer
was written not to tolerate the segment selectors.  But I think it would be fair
to put any such blame on the consumer in that case.  If the consumer doesn't die
with a SIGSEGV, then it might ignore the segments.  And then it would be no
worse off than now.

On Thu, Mar 19, 2020 at 06:05:16PM +0000, Dwarf Discussion wrote:
> This recently came up in the LLVM project.  Harvard architectures
> put code and data into separate address spaces, but those spaces
> are not explicit; instructions that load/store memory implicitly
> use the data space, while things like taking a function address or 
> doing indirect branches will implicitly use the code space.  This 
> doubles the effective size of memory without consuming an address 
> bit, as well as having other secondary benefits like not allowing
> self-modifying code.
> 
> Nearly all of the DWARF information does not need to distinguish
> between code and address spaces, because it's easy to derive that
> from context.  Addresses in the line table or a range list will be
> code addresses; in .debug_info, addresses of code elements will be
> code addresses, while variables will be data addresses. And so on.
> 
> This only seems to break down in the .debug_aranges section, which
> records both data and code addresses without any context to let a
> consumer know which is what.  In a flat-address architecture, no
> distinction is needed; in a segmented architecture, there will be
> a segment selector as part of any address, and that includes the
> .debug_aranges section.  What about for Harvard architectures?
> 
> What I suggested in the LLVM project is that .debug_aranges would
> have a 1-byte segment selector and use some trivial scheme such as
> 0=code, 1=data to distinguish what kind of address it is.  Other
> DWARF sections wouldn't need a selector because they can all use
> context to figure it out; this avoids the size overhead of using
> segment selectors everywhere else.
> 
> Pavel Labath pointed out that this seems inconsistent and might
> make consumers unhappy; segment selectors are described as a
> characteristic of the target architecture, so having them in one
> place and not others might look suspicious.  IMO it's a reasonable 
> "permissive" use of the existing DWARF structures, but it seemed
> worth asking here.
> 
> Does this (segment selector only in .debug_aranges) sound okay?
> Should there be non-normative text or a wiki description of this?
> Do we want to codify the 0=code 1=data use of segment selectors
> for all Harvard architectures (that don't otherwise have explicit
> segements) so that this doesn't have to be set by ABI committees?
> 
> I'm willing to write up whatever needs writing up, either as a
> proposal or as a wiki entry.
> 
> Thanks,
> --paulr
> 
> _______________________________________________
> Dwarf-Discuss mailing list
> Dwarf-Discuss at lists.dwarfstd.org
> http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org

-- 
Todd Allen
Concurrent Real-Time