New Versions of AMD CodeXL, Bolt and AMD APP SDK

November 13th, 2013

AMD CodeXL is a free set of tools for GPU debugging, GPU profiling, static analysis of OpenCL kernels, and CPU profiling, including support for remote servers. For more information and download links, see:

Bolt is an STL compatible C++ template library for creating data-parallel applications using C++ (no C++ AMP / OpenCL code required). For more information about the Bolt template library and download links, see:

AMD APP SDK has everything needed to get started with OpenCL and parallel programming. It includes OpenCL samples that are very easy to compile, as well as the Bolt and other libraries. For more information about AMD APP SDK and download links, see:

Allinea DDT with support for NVIDIA CUDA 5.5 and CUDA on ARM

November 13th, 2013

Allinea DDT is part of Allinea Software’s unified tools platform, which provides a single powerful and intuitive environment for debugging and profiling of parallel and multithreaded applications. It is widely used by computational scientists and scientific programmers to fix software defects of parallel applications running on hybrid GPU clusters and supercomputers. DDT 4.1.1 supports CUDA 5.5, C++11 and the GNU 4.8 compilers. Also introduced with Allinea DDT 4.1.1 is CUDA toolkit debugging support for ARMv7 architectures. More information:

AMD CodeXL: comprehensive developer tool suite for heterogeneous compute

October 9th, 2012

AMD CodeXL is a new unified developer tool suite that enables developers to harness the benefits of CPUs, GPUs and APUs. It includes powerful GPU debugging, comprehensive GPU and CPU profiling, and static OpenCL™ kernel analysis capabilities, enhancing accessibility for software developers to enter the era of heterogeneous computing. AMD CodeXL is available for free, both as a Visual Studio® extension and a standalone user interface application for Windows® and Linux®.

AMD CodeXL increases developer productivity by helping them identify programming errors and performance issues in their application quickly and easily. Now developers can debug, profile and analyze their applications with a full system-wide view on AMD APU, GPU and CPUs.

AMD CodeXL user group (requires registration) allows users to interact with the CodeXL team, provide feedback, get support and participate in the beta surveys.

Panoptes: A Binary Translation Framework for CUDA

May 22nd, 2012

Traditional CPU-based computing environments offer a variety of binary instrumentation frameworks. Instrumentation and analysis tools for GPU environments to date have been more limited. Panoptes is a binary instrumentation framework for CUDA that targets the GPU. By exploiting the GPU to run modified kernels, computationally-intensive programs can be run at the native parallelism of the device during analysis. To demonstrate its instrumentation capabilities, we currently implement a memory addressability and validity checker that targets CUDA programs.

Panoptes traces targeted programs by library interposition at runtime. Read the rest of this entry »

CUDA 4.1 Released

January 26th, 2012

Today NVIDIA released CUDA 4.1, including a new CUDA Toolkit, SDK, Visual Profiler, Parallel Nsight IDE and NVIDIA device driver.

CUDA 4.1 makes it easier to accelerate scientific research with GPUs with key features including

  • a redesigned Visual Profiler with automated performance analysis and expert guidance;
  • a new LLVM-based compiler that generates up to 10% faster code; and
  • 1000+ new imaging and signal processing functions in the NPP library.

The CuSparse library included with CUDA 4.1 has a new tridiagonal solver and 2x faster sparse matrix-vector multiplication using the ELL hybrid format, and the CuRand library included with CUDA 4.1 has two new random number generators. Read the rest of this entry »

GPU-Ocelot 2.0 Released

February 8th, 2011

GPU-Ocelot LogoOcelot 2.0.969 brings CUDA 3.2 and Fermi support to a stable release. Ocelot is a BSD-licensed open source implementation of the CUDA runtime, a PTX emulator, and a mid-level PTX compiler.

Here is a feature list for 2.0.969:

  • PTX 2.2 and Fermi device support: Floating point results should be within the ULP limits in the PTX ISA manual. Over 500 unit tests verify that the behaviour matches NVIDIA devices.
  • Four target device types: A functional PTX emulator. A PTX to LLVM to x86/ARM JIT. A PTX to CAL JIT for AMD devices (beta). A PTX to PTX JIT for NVIDIA devices.
  • A full-featured PTX 2.2 IR: An analysis/optimization pass interface over PTX (Control flow graph, dataflow graph, dominator/postdominator trees, structured control tree). Optimizations can be plugged in as modules.
  • Correctness checking tools: A memory checker (detects unaligned and out of bounds accesses). A race detector. An interactive debugger (allows stepping through PTX instructions).
  • An instruction trace analyzer interface: Allows user-defined modules to receive callbacks when PTX instructions are executed. Can be used to compute metrics over applications or perform correctness checks.
  • A CUDA API frontend: Existing CUDA programs can be directly linked against Ocelot. Device pointers can be shared across host threads. Multiple devices can be controlled from the same host thread (cudaSetDevice can be called multiple times).

Ocelot is available under a BSD license at

    NVIDIA Parallel Nsight Now Shipping

    July 21st, 2010

    NVIDIA today announced the release of NVIDIA Parallel Nsight software, the industry’s first development environment for GPU-accelerated applications that work with Microsoft Visual Studio.  “By adding functionality specifically for GPU Computing developers, Parallel Nsight makes the power of the GPU more accessible than ever before,” said Sanford Russell, GM of GPU Computing at NVIDIA. NVIDIA Parallel NSight features a CUDA C/C++ debugger and application performance analyzer, and a graphics debugger and inspector.  NVIDIA Parallel Nsight supports Windows HPC Server 2008, Windows 7 and Windows Vista.  Download Parallel Nsight here.

    CUDA 3.0 toolkit released

    March 20th, 2010

    NVIDIA has released version 3.0 of the CUDA Toolkit, providing developers with tools to prepare for the upcoming Fermi-based GPUs. Highlights of this release include:

    • Support for the new Fermi architecture, with:
      • Native 64-bit GPU support
      • Multiple Copy Engine support
      • ECC reporting
      • Concurrent Kernel Execution
      • Fermi HW debugging support in cuda-gdb
      • Fermi HW profiling support for CUDA C and OpenCL in Visual Profiler
    • C++ Class Inheritance and Template Inheritance support for increased programmer productivity
    • A new unified interoperability API for Direct3D and OpenGL, with support for:
      • OpenGL texture interop
      • Direct3D 11 interop support
      • CUDA Driver / Runtime Buffer Interoperability, which allows applications using the CUDA Driver API to also use libraries implemented using the CUDA C Runtime such as CUFFT and CUBLAS.
    • Read the rest of this entry »

    gDebugger v5.5: AMD (ATI) GPU Performance Counters Integration

    February 21st, 2010

    Graphic Remedy is proud to announce the release of gDEBugger Version 5.5 for Windows, Linux, Mac OS X and iPhone.

    This version introduces a powerful AMD GPU performance counters integration, displaying AMD graphic hardware and driver performance counters in gDEBugger’s Performance Graph and Performance Dashboard views, allowing developers to optimize their application over AMD (ATI) graphics hardware.

    AMD Performance counters are available on Windows, when using ATI Radeon (TM) HD 2000 series or newer with Catalyst (TM) 9.12 or newer.

    This version also includes a large number of bug fixes and stability improvements.

    Read the rest of this entry »

    gDEBugger for OpenCL – Beta Program

    February 10th, 2010

    Graphic Remedy is proud to announce the upcoming release of gDEBugger for OpenCL on Windows, Mac OS X and Linux. This new product will bring gDEBugger’s advanced Debugging, Profiling and Memory Analysis abilities to the OpenCL developer’s world, helping OpenCL developers find bugs and optimize parallel computing application performance and memory consumption.

    To join the Free Beta Program, see screenshots and more details, please visit

    gDEBugger CL enables OpenCL developers to: