Debugging a Shared Memory Problem in a multi-core design with virtual hardware
By Marc Serughetti
Embedded.com
(05/20/08, 12:20:00 PM EDT)
With multicore systems becoming the norm, software developers have found the debugging of such systems, using a physical hardware development board, to be very challenging, particularly when it comes to the integration of applications sharing data across multiple cores.

Over the past few years, the virtual hardware platform concept has emerged as a key new capability for software developers to improve their ability to debug software applications.

Virtual platforms are simulations of the device hardware and the environment it evolves in. They represent a new solution for software developers to improve their productivity. The benefits of virtual platforms for software development come from three major areas.

First, they remove the dependency on the physical silicon availability. Second, they provide a far superior solution for debug and analysis. Third, they provide a simplified and more easily sharable environment to the users.

Traditionally, software developers have used three different types of environments for the execution of the software under development: native compilation to the host development system or an OS simulator, reference development boards, or instruction set simulators. Each of them has been used successfully in the context of a simple hardware platform.

However, as hardware platform capabilities are increasing with multi-core support, these approaches are exhibiting some significant limitations, including limited observability and controllability of the hardware, poor representation of the final device hardware, and limited scalability. This article will demonstrate how a virtual platform can be used to debug a shared memory problem on a multi-core platform.

Platform description
The system under consideration is depicted in Figure 1 below. It includes two processor cores and several peripheral elements. One core (ARM926) is used to boot and execute the Linux operating systems and a variety of applications. The second core (ARM968) is used to execute an H.264 decoding algorithm.

Figure 1: A shared memory design with two processor cores and several peripheral elements, one used to boot and execute the Linux operating system and a variety of applications and the second to execute an H.264 decoding algorithm.

Peripherals included in the platform include, interrupt controller, touch screen controller, display controller, ATAPI controller, UART, programmable I/O, timer, clock and memory.

An AHB multi-layer bus is also used to allow mapping different address regions to the two bus masters. A model of the platform has been created using SystemC a standard hardware modeling language and TLM (Transaction Level Modeling), a standard based modeling methodology.

In addition to the hardware model, the platform comes with host application programs that enable the interactive I/O user interface in a realistic device environment.

Each of these applications can directly communicate with the platform model and display the desired information. They include a graphical user interface, connectivity to the host memory file system, and a terminal window showing, for example, the Linux boot sequence.

Software development environment
The software development environment provided to the software developer contains:

* The simulation of the hardware platform on which the software can be downloaded and executed
* A virtual platform debugger" unlike most software debuggers that only examine the state of the processor, a virtual platform debugger can set breakpoints and watchpoints on every memory element and signal of the entire platform.
* Integration with source code-level debugging software development tools such as gdb and Lauterbach.

Figure 2 below provides an overview of this environment for multi-core debugging, providing a non-intrusive, deterministic and fully controllable development environment.

The virtual platform simulation performances are such that a few seconds are needed to simulate the operating system boot and the movie stream is executed at a speed near or faster than real-time. These performances demonstrate that SystemC, a C++ based language perfectly scales to the performance requirements of software developers.

Figure 2: Use of a virtual platform multicore debugging environment can be used to resolve possible shared memory problems illustrated in Figure 1.

As the initial version of our software is compiled and downloaded to the virtual platform, we quickly observe through our user interface that the video stream being displayed only goes for a small period of time and appears to skip significant sections of the movie being decoded.

Using the Lauterbach software debugger and gdb each connected to an ARM processor, we can quickly identify that each core is alive, leaving the potential problem to the H.264 algorithm or the use of the shared memory between the two processors. Since this decoder previously worked properly on a single processor core architecture, we suspect that the problem is in the use of the shared memory.

Shared memory architecture
A circular buffer (Figure 3 below) is being used in this architecture. The ARM926 reading the video file passes the data to the ARM968 where the video stream is decoded and sent to the display device. The circular buffer presents another challenge for the software developer.

At any point in time, the ARM926 can write on a portion of the buffer and the ARM968 can read from another portion. If the ARM926 were to write in the wrong portion, then it would overwrite data that the ARM968 had not yet read, and accordingly, would have the effect of skipping part of the movie being decoded. A valid and an invalid access to the circular buffer is depicted below.

Figure 3: Valid and an invalid access to the circular buffer

The challenge facing the software developer is that the watchpoints have to be changed after every read or write since the memory area keeps changing. In addition, watchpoints need to be placed on the circular buffer itself to gain the right visibility in the behavior.

Debugging Capabilities and Tracing of the Problem
The advantage of using a simulated environment such as that shown in Figure 4 below is that the user can have full control of the execution of the hardware. This is, of course, as long as this proper control is offered to him. In this case, we are using the capabilities of a virtual platform debugger, which provide visibility and controllability into any memory element and signals.

In addition, the tool shown in Figure 4 provides a scripting capability (based on tcl) enabling its user to take specific actions when a breakpoint or watch point is hit. For the debugging of our problem, a script that automatically updates the watchpoints after each read and write will be used.

It creates a sandbox where the ARM926 is authorized to write and the ARM968 to read. The script will alert the developer and stop the platform execution when a memory violation occurs (i.e write and read are done outside the sandbox).

In addition, a very visual and intuitive user interface can be created with the script. The picture below provides the structure of the script and the graphical visualization provided by the virtual platform debugger.

Figure 4: Using a simulated environment provides the developer full control of the execution of the hardware.

As we now execute the software using the script, we hit a watchpoint and can start using our software debugging capabilities provided by the Lauterbach debugger (Figure 5, below) to trace the source of the problem:

* First, we locate the function that was called when the memory access occurred.
* Looking at the stack frame, we identify that the read and write pointer are pointing to the same memory location, and a write was performed.
* The function stack enables the identification of the calling function whose source code can now be viewed.

Figure 5. The Lauterbach debugger can be used to trace the source of the memory conflict problems.

We quickly identify that the buffer count has been hard-coded rather than being calculated. The problem is fixed and the re-compiled software is downloaded to the virtual platform showing the proper execution of the software.

Simplifying the Edit-Debug-Compile Cycle
The debugging example provided above demonstrates the key capability of using a virtual platform to accelerate the edit-debug-compile cycle. The virtual platform allowed us to:

1. Identify the problem earlier and without silicon availability
2. Trace its source by allowing a "watch" to be established inside a memory block (not to be confused with CPU watchpoint!) and to dynamically create a "sandbox" to catch buffer overflow errors
3. Solve the problem by quickly pinpointing the source code error for correction, leveraging the integration of existing software development tools such as Lauterbach or gdb
4. Validate the solution through the execution of the updated software.

<>Virtual platform tools provide a powerful solution with fast simulation performance and integration of existing software development tools such as debuggers.

In our specific environment we also have access to a powerful tool acting a virtual platform, which unlocks the unique capabilities of virtual platforms including, controllability and observability of the software and the platform, as well as, a scriptable solution based on a tcl interface.

The integrated solution provides a non-intrusive debugging environment enabling debugging that could not have been efficiently done with physical hardware.

Marc Serughetti is Vice President Marketing & Business Development at CoWare