[Dwarf-Discuss] question on address spaces

Fri Oct 16 16:29:40 GMT 2015

On 10/05/2015 11:53 PM, Ashutosh Pal wrote:
> Hi Dwarf Experts,
>
> We have processor architectures that have multiple overlapping address spaces. Our compiler
> tool-chain can map data variables (locals and globals) to these address spaces. For example:
>
> int GMEMX array1[100];  // allocates to GMEMX
>
> int GMEMY array2[100];  // allocates to GMEMY
>
> void foo() {
>
>    vint8 vec1 = init_vec();  // allocates to stack on VMEM
>
>    int a = get_element(vec1, 1);   // allocates to stack on SMEM
>
> ..
>
> }
>
> In the above example, there are 4 overlapping address-spaces (GMEMX, GMEMY, VMEM and SMEM) each has
> one variable mapped to it respectively. To uniquely identify the variable inside debugging
> information, we would like to express the location of variables using a tuple <address-space-id,
> local-address> where the ?address-space-id? is an integer uniquely identifying the address-space,
> while ?local-address? is an expression that evaluates to an address with-in the corresponding
> address space.
>
> So, we are looking for ways using which the locations of these variables can be expressed in the
> dwarf information. We found two features in the dwarf4 specification, but with both we see drawbacks:
>
> 1.DW_AT_segment: This is an attribute that can be specified on top of variable dies. We can use this
> attribute to encode our ?address-space-id? and use it in tandem with DW_AT_location to describe the
> required tuple. But, with this we cannot cover those cases where a variable can reside in 2
> address-spaces in its whole life time; for this, we would need an operator describing the
> address-space-id in the location description itself.

DW_AT_segment was designed to represent i386-style addressing where a memory
address was represented by two pieces, a segment or page address and an offset
within the segment.  There are several 16-bit architectures which use similar
schemes to address more physical memory than can be addressed by a register-sized
pointer.  The understanding is that a physical memory address can be arithmetically
computed from the segment and offset.

There's no need for a segment to be encoded in a location expression with one
of these segmented architectures, since the locations are physical addresses.

While DW_AT_segment might be used to represent an address space id, this is
a bit different concept from a segment base address.  In general, a physical
address cannot be computed from an [ASID, address] pair.

I don't think it is made explicit in the DWARF standard, but any issues with
aliasing of addresses, where there are different [segment, offset] representations
of the same physical address, is something that is defined by the architecture
and there is no specific support for this in DWARF.  In the i386, it was easy to
create different [segment, offset] representations for the same physical address,
but these were similarly easily identified.  On other [page, offset] architectures,
this was less an issue.

> 2.DW_OP_xderef: The is the only operator that allows encoding an address-space-id along with an
> address. But we see 2 problems in expressing the location of say a local variable residing on stack
> at offset 20 on address-space-id 4:
>
> a. {DW_OP_bregx SP 20} {4} DW_OP_xderef: Evaluating this expression returns the value of the local
> variable and NOT its location.
>
> b.The specification also says that the size of the dereferenced value returned by DW_OP_xderef
> should be less than or equal to the size of the address on the target. While in our case the
> dereferenced values could bigger than the address. This further prohibits us in applying
> DW_OP_stack_value operator on the above expression

DWARF assumes that memory has unique physical addresses which are computable
and which can be used in a computation, for example, to index into an array.

DW_OP_xderef performs the implementation-defined computation to convert a
[segment, offset] pair into a physical memory address.  As such, the result
is a single value, limited in size to that of a physical memory address.

> Are there other possibilities in the dwarf4 standard that we overlooked for the above scenarios?
> Also, any pointers to where people might have already solved such issues earlier, would be useful to
> have.

One thought is that you expand your concept of address to incorporate
an ASID, rather look at the address as a tuple.  For example, if you have a
64-bit address, the top 8 or 16 bits might be the ASID, while the remaining
48 or 56 bits represent an address within that ASID.  As long as you do not
have overflow from the address into the ASID, there should be no trouble in
generating location expressions.  This doesn't address issues related to
aliasing or overlapping address spaces.

A more comprehensive representation of multiple address spaces, including
the possibility that they might overlap or that a variable might exist in
multiple address spaces at the same time, would seem to require a large number
of changes to DWARF.

-- 
Michael Eager	 eager at eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077