8. Graphics Processing Unit (GPU)
This block is used for complex 2D graphics processing. The GPU is connected via a high-performance bus to either the internal RAM or any other memory mapped peripherals such as an external PSRAM and O/QSPI Flashes. Synchronization between the CPU and the GPU is either accomplished through interrupts or through polling mechanism.
The GPU uses four external interfaces during image processing and rendering, a single token slave interface for configuration writes and status reads, a read-only master interface for display list reads, a read-only master interface for texel reads and a read/write interface for the frame buffer pixel data.
The CPU writes in a memory block the display list, containing instructions of how to configure the GPU registers, which is read by the display list reader. The rendering process starts by determining the alpha coverage, in the pixel selection unit, continues by fetching the required texel, in the texture unit, and by calculating the pixel color in the color unit. At this point the GPU reads data from the FB, combines them in the blending unit with the colorized pixels from the color unit, and finally writes the result to the FB. The described operation of the render pipeline is shown in Figure 3.

Figure 3 D/AVE 2D Block Diagram
8.1. HW Features
The GPU is developed to support high quality rendering operations. The HW features are used to accomplish high performance demanding graphic operations. The GPU driver allows direct access to all hardware features. Functionality which is not directly supported by the hardware is not offered (emulated) by the driver. The HW features that the GPU supports are:
Resolutions up to 2048x2048
Max pixel rate 1 pixel / clock
Base Clock up to 160MHz
Extended Rendering Primitives
Lines
Simple or with different start/end widths
Caps (Butt, Round, Square)
Joins (Bevel, Miter, Round)
Boxes
Circles, Circle Rings & Wedges
Triangles & Quadrangles
Polylines (simple & with multiple widths)
Triangle Lists, Strips & Fans
Polygons
BLIT
Textures up to 2048x1024 (Blending, CLUT, RLE, Color keying)
Render Primitives with Texture
Stretching
Rotation
U/V Clamp and Repeat modes
Texture Blending
CLUT 256x32bit for indexed textures
RLE textures
Color keying
No-, Linear-, Bilinear- Filtering
Perspective warping
Fast High-Quality Antialiasing
With antialiasing control on every edge
Blurring effect with over antialiasing setting
Subpixel accuracy
Patterns and Gradients with Alpha channel on all Primitives
No cost HW Clipping
16 Blending modes for RGB and Alpha channels
Render lists enable CPU preparation of next frame (parallel processing) while rendering in progress.
Flexible Input and Output Formats for Framebuffer and Textures.
≤ 8bit |
16bit |
32bit |
|
---|---|---|---|
Input |
A1, A2, A4, A8 I1, I2, I4, I8, AI44 |
RGB565, ARGB4444, RGBA4444, ARGB1555, RGBA5551 |
ARGB8888, RGBA8888 |
Output |
A8 |
RGB565, ARGB4444, RGBA4444 |
ARGB8888, RGBA8888 |
8.2. GPU API
The basic GPU object is called a device. This device pointer is used in all functions as first parameter. The material settings like color, texture, blending etc… are stored in a context. A device holds at all-time 3 contexts. The selected context, which is an active context modified by material functions, the solid context, which is the source context when rendering interior regions and the outline context, which is the source context when rendering outlines or shadows. All shapes rendered by the rendering functions use the current context(s). The rendering does not happen immediately but fills a render buffer. The render buffers can be executed totally in parallel (without any CPU interaction).
Name |
Description |
---|---|
d2_device |
The application uses pointers of this type to hold the address of a device structure without knowing its internal layout. |
d2_context |
The application uses pointers of this type to hold the address of a context structure without knowing its internal layout. |
d2_renderbuffer |
The application uses pointers of this type to hold the address of a render buffer structure without knowing its internal layout. |
d2_color |
Upper 8bits are ignored but should be set to zero. All colors are passed to the driver in this format regardless of the framebuffer format. |
d2_alpha |
Alpha information is passed as 8bit values. 255 representing fully opaque and 0 totally transparent colors. |
d2_width |
Width is defined as an unsigned 10:4 fixed point number (4 bits fraction). So, the maximum width is 1023 and the smallest nonzero width is 1/16. |
d2_point |
Point defines a vertex component (e.g. the x coordinate of an endpoint) pixel position and is specified as a signed 1:11:4 fixed point number (1bit sign, 11 bits integer, 4 bits fraction). So, the integer range is 2047 to -2048 and the smallest positive value is 1/16. |
d2_border |
The border type is used only when setting clip borders. In contrast to points, borders do not contain any fractional information (no subpixel clipping) and are simple 11bit signed integers. |
d2_pattern |
Patterns are Nbit bitmasks (N is 32 at most so they are passed as longs) |
d2_blitpos |
Blitpos defines an integer position in the source bitmap of a blit rendering operation. The allowed range is 0 to 1023. |
Category |
Description |
---|---|
Basic functions |
Driver device management and hardware initialization / shutdown. |
Viewport functions |
Framebuffer and view specific functions. |
Context functions |
Modify material settings |
Texture functions |
Modify texture mapping settings |
Rendering functions |
There is a rendering function for each supported geometric shape. |
Blit functions |
Blits are special Rendering Functions to copy one rectangle part of the video memory into another part of the video memory. |
Render Buffers |
Render buffers (similar in concept to OpenGL display lists) are the main interface between driver and hardware. |
Profiling |
Performance measurement counter functions |
Utility functions |
Triangle mapping and perspective warp operations |
Note
Calling the GPU API functions from interrupt service routines or from different tasks is not recommended.
For further information regarding the GPU driver API, the user can refer to the DA1470x GPU API Manual.
8.3. Usage
The use of the GPU can be split into the following set of commands:
8.3.1. Initialization of the hardware
Initializing the GPU is a simple process of opening the GPU device and initializing the HW. The device handle has to be maintained since it is a parameter to all GPU related functions.
d2_handle = d2_opendevice(0);
d2_inithw(d2_handle, 0);
8.3.2. Setup a Frame Buffer using the low-level driver.
The setup of the frame buffer requires that its geometry is provided (i.e. location, stride, width, height, color mode).
d2_framebuffer(d2_handle, framebuffer, 640, 640, 480, d2_mode_rgb888);
8.3.3. Render Buffer Manipulation
The render buffers can be executed either manually, by sending the render buffer to the hardware and wait its execution, or automatically, by letting the driver handle render buffer execution and flipping automatically.
Automatic Management:
Render buffers can be handled automatically by calling the start frame and end frame functions. These functions use two internal render buffers in turn. The internal render buffers can be accessed by d2_getrenderbuffer().
// Repeat in every frame // Start HW render of previous frame. Switch to new frame. d2_startframe(d2_handle); d2_clear(d2_handle, 0x000000); d2_rendercircle(d2_handle, D2_FIX4(x0), D2_FIX4(y0), D2_FIX4(r), D2_FIX4(w)); // Close render buffer. Wait for rendering or previous frame to complete. d2_endframe(d2_handle);
Manual Management:
Manual management requires the allocation of a render buffer once, the selection of the render buffer to issue render commands and as a final step to execute it. Before execution can be called again the application has to wait for GPU to be finished by flushing the frame.
// Initialize once static d2_renderbuffer *renderbuffer; renderbuffer = d2_newrenderbuffer(d2_handle, 20, 20); d2_selectrenderbuffer(d2_handle, renderbuffer); // Repeat in every frame d2_clear(d2_handle, 0x000000); d2_rendercircle(d2_handle, D2_FIX4(x0), D2_FIX4(y0), D2_FIX4(r), D2_FIX4(w)); d2_executerenderbuffer(d2_handle, renderbuffer, 0); // Wait for current rendering to end. d2_flushframe(d2_handle);
8.3.4. Context Modification
Context changes are not translated in render buffer commands until a render command is issued. A subset of the context commands is shown in the following code block.
// Set color of color index 0
d2_setcolor(d2_handle, 0, color);
// Change blend mode
d2_setalphablendmodeex(d2_handle, d2_bm_one, d2_bm_zero, d2_blendf_blenddst);
// Set global alpha
d2_setalphamode(d2_handle, d2_am_constant);
d2_setalpha(d2_handle, intens);
8.3.5. Rendering Shapes
Various rendering shapes are supported. The pixel content is controlled by the active context settings.
d2_renderline(d2_handle, D2_FIX4(x0), D2_FIX4(y0),
D2_FIX4(x1), D2_FIX4(y1),
D2_FIX4(pen_size), d2_le_exclude_none);
d2_rendercircle(d2_handle, D2_FIX4(x0), D2_FIX4(y0),
D2_FIX4(r), D2_FIX4(w));
d2_renderpolygon(d2_handle, points, points_num, d2_le_closed);
d2_renderwedge(d2_handle, D2_FIX4(x0), D2_FIX4(y0),
D2_FIX4(rx), D2_FIX4(pen_size),
D2_FIX16(nx0), D2_FIX16(ny0),
D2_FIX16(nx1), D2_FIX16(ny1),
0);
8.3.6. BLIT Operations
The BLIT operations are performed using texture mapping and box rendering. The BLIT functions provide abstraction in settings and context restoration. The d2_setblitsrc() function is used to describe the source image geometry only. the d2_blitcopy() function is used to perform the actual copy, to select the frame of the source image and the frame of the destination. In any case that the dimensions do not match, the GPU will stretch/shrink the image and finally convert it to the destination format.
d2_setblitsrc(d2_handle, src, pitch, x_size, y_size, format);
d2_blitcopy(d2_handle, srcwidth, srcheight,
srcx, srcy,
D2_FIX4(dstwidth), D2_FIX4(dstheight),
D2_FIX4(dstx), D2_FIX4(dsty),
flags);
8.3.7. Frame Buffer Copy
d2_framebuffer(d2_handle, write_buffer, XSIZE_PHYS,
XSIZE_PHYS, YSIZE_PHYS, d2_mode_argb8888);
d2_setblitsrc(d2_handle, read_buffer, XSIZE_PHYS,
XSIZE_PHYS, YSIZE_PHYS, d2_mode_argb8888);
d2_blitcopy(d2_handle, XSIZE_PHYS, YSIZE_PHYS,
0, 0,
(D2_FIX4(XSIZE_PHYS)), (D2_FIX4(YSIZE_PHYS)),
(D2_FIX4(0)), (D2_FIX4(0)),
d2_bf_no_blitctxbackup);