[Dwarf-Discuss] Multiple address space architectures and DW_AT_frame_base

Fri May 20 17:45:50 GMT 2011

John,
    Thanks for the reply. Your implication engine is working well. ;-) OpenCL source, AMD GPU target.
    In between, though, we have an intermediately language called AMD IL (published specs available on-line).
    Rather than answer your many excellent questions, I'll provide a bit more background on my specific problem so you can see what I'm trying to deal with.

    AMD IL describes the private memory space in terms of "indexed temporaries". This is just a syntactic device to defer actual locating of thread-local storage to the AMD IL compiler (yes, there's a compiler between AMD IL and the GPU-specific hardware instruction set. It knows nothing of DWARF or debugging.) Sometimes the OpenCL compiler will put variables in indexed temps (e.g., private arrays), sometimes in regular AMD IL registers (e.g., private scalars and vectors). While the hardware doesn't support the concept of a "stack", obviously the LLVM-based compiler assumes one, so we do our best using indexed temps and our rich register set.
    These indexed temps are referenced in AMD IL source as "x#[N]", where # and N can have very large ranges (at least 64K). # specifies which private memory space. N is the offset in that space. The AMD IL compiler (SC, short for shader compiler) will look at the need for registers throughout the "shader" (aka kernel) and decide if it can allocate general purpose registers to implement the indexed temps, or it has to resort to putting them in memory.
    The current OpenCL compiler for AMD IL puts indexed temps all in a single private memory space... #1. (Most variables end up in general purpose registers, not indexed temps.) The debugger, of course is oblivious to all this, but it uses an API provided by the debug agent to access objects. To access the objects, the debug agent wants to be as general as possible and assume as little as possible, since it wants to support more than just debugging OpenCL. The API it provides allows the debugger to specify both # and N.
    Right now, since the compiler does put all thread local variables in a single private memory space that pretty well mimics a conventional architecture's stack, I'm trying to leverage the DW_AT_frame_base attribute of DW_TAG_subprogram. The "correct" thing for the current compiler to do is to indicate, somehow, that "the stack" is in AMD IL private memory space #1.
    That's my "today" problem. But to allow SC to make better choices about which indexed temps to put in registers and which to put in memory, the OpenCL compiler would have to split variables out from the current monolithic "stack" space in to individual "variable" spaces. Even non-pointer variables would have to indicate which private memory space they reside in.

    In trying to "do the right thing" where specifying DWARF for AMD IL is concerned, I'm trying to allow description of as much of the AMD IL language's capabilities as possible so that the compiler making the decision can pass this information through the debugger to the debug API, where it is really needed. And avoid having to have the debugger or the debug agent assume such things.

    So I think I want a "location" to permit an "op" that specifies the memory space for a pending dereference - including the final implicit one - without mucking with offsets. Something like DW_AT_memory_class but in a location expression. Imagine needing to reference a pointer in one space, apply an index from another space, add an offset from another, to compute the address of an object in yet another space.

    But I'll settle for a solution to just my "today" problem... ;-)

Richard


> -----Original Message-----
> From: John DelSignore [mailto:John.DelSignore at roguewave.com]
> Sent: Friday, May 20, 2011 7:04 AM
> To: Relph, Richard
> Cc: DWARF
> Subject: Re: [Dwarf-Discuss] Multiple address space architectures and
> DW_AT_frame_base
> 
> Hi Richard,
> 
> I have experience with "imaginary" architectures like this :-). A
> couple of comments/questions...
> 
> Relph, Richard wrote:
> > Imagine an architecture with multiple disjoint memory spaces.
> 
> It would be helpful if you could be more specific.
> 
> I have to assume that this is GPU architecture (your email address gave
> it away) where at the very least there is a host memory space and a
> device memory space. The device memory space is further subdivided into
> a discrete number of memory spaces, such as code, global, local,
> constant, parameter, shared, texture, etc. Further, it's not a
> segmented architecture, so there is no segmented addressing (as on the
> x86), and there is exactly one of each kind of memory space.
> 
> What is the language? Is it OpenCL, which has address space qualifiers?
> From the OpenCL quick reference guide:
> 
> "
> Address Space Qualifiers [6.5]
> __global, global __local, local
> __constant, constant __private, private
> "
> 
> > Imagine
> > that the frame base could be in any of them.
> 
> Are you saying the the stack itself could be in more than one memory
> space?  Or is the stack always in, for example, the "local" memory
> space?
> 
> Does the architecture even have stacks or is everything inlined?
> 
> > Other than expanding the
> > size of an address, how can I do this?
> 
> Depends on what is actually going on.
> 
> > I've thought about DW_AT_address_class, but I can't figure out how to
> > associate it with a DW_TAG_subprogram's DW_AT_frame_base location.
> The
> > spec says DW_AT_address_class for DW_TAG_subprogram describes how to
> > access the subprogram itself.
> 
> Maybe it's true, but I find it hard to believe that the stacks
> themselves can be in more than one memory space.
> 
> I can believe that the stack is in a "local" memory space, function
> parameters are in a "parameter" memory space, function local variables
> can be in either local memory or probably more commonly registers, you
> can declare variables in "shared" memory space, you can have pointers
> to "global" memory, etc.
> 
> Is that true or do I have it all wrong?
> 
> > DW_AT_address_class isn't technically allowed in a
> DW_TAG_compile_unit
> > (it will be the same address space across the entire compilation
> unit).
> >
> > A location description doesn't have a mechanism to specify an address
> > class. It would be be implied by the type information associated with
> > the object being described (though if the object shifts address
> spaces,
> > that seems impossible to describe as well.)
> 
> If my above assumptions are true, I think the best way to represent
> this is with address space type qualifiers. For many objects, the
> address space is implied by the kind of object, but can also be
> explicitly expressed through type qualification. For example, assume
> you have the following device function:
> 
> void f(int *p)
> {
>   int *q = p;
> ...
> 
> The debugger can know that "p" is a parameter, so the address space of
> the pointer itself is implicitly "parameter", and the thing the pointer
> points to might be in the "global" address space. As with "const" and
> "volatile" and other kinds of type qualifiers, the compiler could
> introduce address space qualifiers such as "parameter", "global",
> "local", etc. So, "p" is os type "parameter pointer to global int", and
> "q" is of type "local pointer to global int".
> 
> This is exactly how TotalView represents address spaces in CUDA on
> NVidia hardware.
> 
> d1.-1> l
>   87   // Matrix multiplication kernel called by MatrixMul()
>   88   __global__ void MatMulKernel(Matrix A, Matrix B, Matrix C)
>   89   {
>   90     // Block row and column
>   91     int blockRow = blockIdx.y;
>   92@>   int blockCol = blockIdx.x;
>   93     // Each thread block computes one sub-matrix Csub of C
>   94     Matrix Csub = GetSubMatrix(C, blockRow, blockCol);
>   95     // Each thread computes one element of Csub
>   96     // by accumulating results into Cvalue
> d1.-1> dwhat A
> Name: A; Type: @parameter Matrix; Size: 24 bytes; Addr: 0x00000020
> d1.-1> dwhat Csub
> Name: Csub; Type: @local Matrix; Size: 24 bytes; Addr: 0x00fffb20
> d1.-1> dwhat Matrix
> Type name: struct Matrix; Size: 24 bytes; Category: Structure
>     Fields in type:
>     {
>     width        int          (32 bits)
>     height       int          (32 bits)
>     stride       int          (32 bits)
>     elements     float @generic * (64 bits)
>     }
> d1.-1> dwhat blockRow
> Name: blockRow; Type: @register int; Size: 4 bytes; Addr: R0
> d1.-1>
> 
> As you might have observed above, the reason it is important to express
> the address space in the type is for composability. In what address
> space does the "elements" member of Matrix reside? The answer is, in
> the address space an object of that type. So for A.elements is in
> "@parameter" space and Csub.elements is in "@local" space. In both
> cases, Matrix::elements points to "@generic" (aka global) space. Also,
> by using type qualification, casts of arbitrary addresses can be
> constructed:
> 
> d1.-1> p &A
>  &A = 0x00000020 -> (Matrix @parameter)
> d1.-1> p A
>  A = {
>    width = 0x00000002 (2)
>    height = 0x00000002 (2)
>    stride = 0x00000002 (2)
>    elements = 0xfb00000000 -> 0
>  }
> d1.-1> p {*(Matrix @parameter *)0x20}
>  *(Matrix @parameter *)0x20 = {
>    width = 0x00000002 (2)
>    height = 0x00000002 (2)
>    stride = 0x00000002 (2)
>    elements = 0xfb00000000 -> 0
>  }
> d1.-1>
> 
> The debugger can follow the type qualification chain to determine in
> which address space the object/address resides.
> 
> > Unfortunately, there is no type information for DW_AT_frame_base.
> >
> > Any ideas?
> 
> Get a debugger expert involved.
> 
> Hope this helped!
> 
> Cheers, John D.
> 
> 
> > *Richard Relph
> > *MTS | Stream Compute SW | AMD*
> > *o. 408.749.6659
> >
> > 2.jpg
> >
> >
> >