8. Graphics Processing Unit (GPU)

This block is used for complex 2D graphics processing. The GPU is connected via a high-performance bus to either the internal RAM or any other memory mapped peripherals such as an external PSRAM and O/QSPI Flashes. Synchronization between the CPU and the GPU is either accomplished through interrupts or through polling mechanism.

The GPU uses four external interfaces during image processing and rendering, a single token slave interface for configuration writes and status reads, a read-only master interface for display list reads, a read-only master interface for texel reads and a read/write interface for the frame buffer pixel data.

The CPU writes in a memory block the display list, containing instructions of how to configure the GPU registers, which is read by the display list reader. The rendering process starts by determining the alpha coverage, in the pixel selection unit, continues by fetching the required texel, in the texture unit, and by calculating the pixel color in the color unit. At this point the GPU reads data from the FB, combines them in the blending unit with the colorized pixels from the color unit, and finally writes the result to the FB. The described operation of the render pipeline is shown in Figure 3.

D/AVE 2D Block Diagram

Figure 3 D/AVE 2D Block Diagram

8.1. HW Features

The GPU is developed to support high quality rendering operations. The HW features are used to accomplish high performance demanding graphic operations. The GPU driver allows direct access to all hardware features. Functionality which is not directly supported by the hardware is not offered (emulated) by the driver. The HW features that the GPU supports are:

  • Resolutions up to 2048x2048

  • Max pixel rate 1 pixel / clock

  • Base Clock up to 160MHz

  • Extended Rendering Primitives

    • Lines

    • Simple or with different start/end widths

    • Caps (Butt, Round, Square)

    • Joins (Bevel, Miter, Round)

    • Boxes

    • Circles, Circle Rings & Wedges

    • Triangles & Quadrangles

    • Polylines (simple & with multiple widths)

    • Triangle Lists, Strips & Fans

    • Polygons

    • BLIT

  • Textures up to 2048x1024 (Blending, CLUT, RLE, Color keying)

    • Render Primitives with Texture

    • Stretching

    • Rotation

    • U/V Clamp and Repeat modes

    • Texture Blending

    • CLUT 256x32bit for indexed textures

    • RLE textures

    • Color keying

    • No-, Linear-, Bilinear- Filtering

    • Perspective warping

  • Fast High-Quality Antialiasing

    • With antialiasing control on every edge

    • Blurring effect with over antialiasing setting

  • Subpixel accuracy

  • Patterns and Gradients with Alpha channel on all Primitives

  • No cost HW Clipping

  • 16 Blending modes for RGB and Alpha channels

  • Render lists enable CPU preparation of next frame (parallel processing) while rendering in progress.

  • Flexible Input and Output Formats for Framebuffer and Textures.

Table 1 GPU Input/Output Color Formats

≤ 8bit

16bit

32bit

Input

A1, A2, A4, A8

I1, I2, I4, I8,

AI44

RGB565,

ARGB4444, RGBA4444,

ARGB1555, RGBA5551

ARGB8888, RGBA8888

Output

A8

RGB565,

ARGB4444, RGBA4444

ARGB8888, RGBA8888

8.2. GPU API

The basic GPU object is called a device. This device pointer is used in all functions as first parameter. The material settings like color, texture, blending etc… are stored in a context. A device holds at all-time 3 contexts. The selected context, which is an active context modified by material functions, the solid context, which is the source context when rendering interior regions and the outline context, which is the source context when rendering outlines or shadows. All shapes rendered by the rendering functions use the current context(s). The rendering does not happen immediately but fills a render buffer. The render buffers can be executed totally in parallel (without any CPU interaction).

Table 2 GPU Types

Name

Description

d2_device

The application uses pointers of this type to hold the address of a device structure

without knowing its internal layout.

d2_context

The application uses pointers of this type to hold the address of a context structure

without knowing its internal layout.

d2_renderbuffer

The application uses pointers of this type to hold the address of a render buffer

structure without knowing its internal layout.

d2_color

Upper 8bits are ignored but should be set to zero. All colors are passed to the

driver in this format regardless of the framebuffer format.

d2_alpha

Alpha information is passed as 8bit values. 255 representing fully opaque and 0

totally transparent colors.

d2_width

Width is defined as an unsigned 10:4 fixed point number (4 bits fraction).

So, the maximum width is 1023 and the smallest nonzero width is 1/16.

d2_point

Point defines a vertex component (e.g. the x coordinate of an endpoint) pixel

position and is specified as a signed 1:11:4 fixed point number (1bit sign,

11 bits integer, 4 bits fraction). So, the integer range is 2047 to -2048

and the smallest positive value is 1/16.

d2_border

The border type is used only when setting clip borders. In contrast to points,

borders do not contain any fractional information (no subpixel clipping) and are

simple 11bit signed integers.

d2_pattern

Patterns are Nbit bitmasks (N is 32 at most so they are passed as longs)

d2_blitpos

Blitpos defines an integer position in the source bitmap of a blit rendering

operation. The allowed range is 0 to 1023.

Table 3 GPU Function Categories

Category

Description

Basic functions

Driver device management and hardware initialization / shutdown.

Viewport functions

Framebuffer and view specific functions.

Context functions

Modify material settings

Texture functions

Modify texture mapping settings

Rendering functions

There is a rendering function for each supported geometric shape.

Blit functions

Blits are special Rendering Functions to copy one rectangle part of the video

memory into another part of the video memory.

Render Buffers

Render buffers (similar in concept to OpenGL display lists) are the main

interface between driver and hardware.

Profiling

Performance measurement counter functions

Utility functions

Triangle mapping and perspective warp operations

Note

Calling the GPU API functions from interrupt service routines or from different tasks is not recommended.

For further information regarding the GPU driver API, the user can refer to the DA1470x GPU API Manual.

8.3. Usage

The use of the GPU can be split into the following set of commands:

8.3.1. Initialization of the hardware

Initializing the GPU is a simple process of opening the GPU device and initializing the HW. The device handle has to be maintained since it is a parameter to all GPU related functions.

d2_handle = d2_opendevice(0);
d2_inithw(d2_handle, 0);

8.3.2. Setup a Frame Buffer using the low-level driver.

The setup of the frame buffer requires that its geometry is provided (i.e. location, stride, width, height, color mode).

d2_framebuffer(d2_handle, framebuffer, 640, 640, 480, d2_mode_rgb888);

8.3.3. Render Buffer Manipulation

The render buffers can be executed either manually, by sending the render buffer to the hardware and wait its execution, or automatically, by letting the driver handle render buffer execution and flipping automatically.

  • Automatic Management:

    Render buffers can be handled automatically by calling the start frame and end frame functions. These functions use two internal render buffers in turn. The internal render buffers can be accessed by d2_getrenderbuffer().

    // Repeat in every frame
    // Start HW render of previous frame. Switch to new frame.
    d2_startframe(d2_handle);
    
    d2_clear(d2_handle, 0x000000);
    d2_rendercircle(d2_handle, D2_FIX4(x0), D2_FIX4(y0), D2_FIX4(r), D2_FIX4(w));
    
    // Close render buffer. Wait for rendering or previous frame to complete.
    d2_endframe(d2_handle);
    
  • Manual Management:

    Manual management requires the allocation of a render buffer once, the selection of the render buffer to issue render commands and as a final step to execute it. Before execution can be called again the application has to wait for GPU to be finished by flushing the frame.

    // Initialize once
    static d2_renderbuffer *renderbuffer;
    renderbuffer = d2_newrenderbuffer(d2_handle, 20, 20);
    
    d2_selectrenderbuffer(d2_handle, renderbuffer);
    
    // Repeat in every frame
    d2_clear(d2_handle, 0x000000);
    d2_rendercircle(d2_handle, D2_FIX4(x0), D2_FIX4(y0), D2_FIX4(r), D2_FIX4(w));
    d2_executerenderbuffer(d2_handle, renderbuffer, 0);
    
    // Wait for current rendering to end.
    d2_flushframe(d2_handle);
    

8.3.4. Context Modification

Context changes are not translated in render buffer commands until a render command is issued. A subset of the context commands is shown in the following code block.

// Set color of color index 0
d2_setcolor(d2_handle, 0, color);

// Change blend mode
d2_setalphablendmodeex(d2_handle, d2_bm_one, d2_bm_zero, d2_blendf_blenddst);

// Set global alpha
d2_setalphamode(d2_handle, d2_am_constant);
d2_setalpha(d2_handle, intens);

8.3.5. Rendering Shapes

Various rendering shapes are supported. The pixel content is controlled by the active context settings.

d2_renderline(d2_handle, D2_FIX4(x0), D2_FIX4(y0),
               D2_FIX4(x1), D2_FIX4(y1),
               D2_FIX4(pen_size), d2_le_exclude_none);

d2_rendercircle(d2_handle, D2_FIX4(x0), D2_FIX4(y0),
                  D2_FIX4(r), D2_FIX4(w));

d2_renderpolygon(d2_handle, points, points_num, d2_le_closed);

d2_renderwedge(d2_handle, D2_FIX4(x0), D2_FIX4(y0),
                  D2_FIX4(rx), D2_FIX4(pen_size),
                  D2_FIX16(nx0), D2_FIX16(ny0),
                  D2_FIX16(nx1), D2_FIX16(ny1),
                  0);

8.3.6. BLIT Operations

The BLIT operations are performed using texture mapping and box rendering. The BLIT functions provide abstraction in settings and context restoration. The d2_setblitsrc() function is used to describe the source image geometry only. the d2_blitcopy() function is used to perform the actual copy, to select the frame of the source image and the frame of the destination. In any case that the dimensions do not match, the GPU will stretch/shrink the image and finally convert it to the destination format.

d2_setblitsrc(d2_handle, src, pitch, x_size, y_size, format);

d2_blitcopy(d2_handle, srcwidth, srcheight,
               srcx, srcy,
               D2_FIX4(dstwidth), D2_FIX4(dstheight),
               D2_FIX4(dstx), D2_FIX4(dsty),
               flags);

8.3.7. Frame Buffer Copy

d2_framebuffer(d2_handle, write_buffer, XSIZE_PHYS,
               XSIZE_PHYS, YSIZE_PHYS, d2_mode_argb8888);

d2_setblitsrc(d2_handle, read_buffer, XSIZE_PHYS,
               XSIZE_PHYS, YSIZE_PHYS, d2_mode_argb8888);

d2_blitcopy(d2_handle, XSIZE_PHYS, YSIZE_PHYS,
            0, 0,
            (D2_FIX4(XSIZE_PHYS)), (D2_FIX4(YSIZE_PHYS)),
            (D2_FIX4(0)), (D2_FIX4(0)),
            d2_bf_no_blitctxbackup);