Intel Releases Knights Corner

June 2nd, 2010

At ISC’10, Intel demonstrated their co-processor approach to HPC (formerly known as Larrabee, now codenamed Knights Corner). A prototype of the Intel Many Integrated Core (MIC) architecture with 32 in-order cores, each equipped with a 512-wide vector unit and connected via an on-chip coherent cache, delivered more than half a Teraflop performance for LU decomposition in a live demonstration during a keynote by Kirk Skaugen.

The full press release from ISC’10 is available here.

Australia GPU Users Groups

June 1st, 2010

The Australia GPU Users groups are informal special interest groups founded to bring together GPU users from all fields and experience levels to learn and share their ideas and creations at friendly meetings.  There are currently GPU users groups forming in Brisbane, Sydney, and Perth.

The groups will discuss general GPU computing, including GPGPU, CUDA, OpenCL, DirectCompute, DirectX and OpenGL and related technologies. There will be short presentations during the meetings, as well as informal discussions on a range of subjects, including core fundamentals, hardware architectures, parallel programming as well as specific optimisations and also examples of applications from different fields of industry, science and multimedia.

Sign up today: the meetings will allow you to meet others who share your interest in GPUs. is maintaining a list of GPU Users groups.  If you have a local GPU users group, please tell us about it!

New NVIDIA Research & Certification Progams for CUDA/GPGPU

June 1st, 2010

At the ISC 2010 conference in Hamburg, Germany, this week, NVIDIA announced new programs for the growing CUDA/GPGPU developer community:

  • CUDA Certification Program – Driven by demand for qualified GPGPU engineers, this is the first program to certify expertise in massively parallel programming on GPUs.
  • CUDA Research Centers – Recognizes institutions that embrace GPU Computing across multiple research fields.
  • CUDA Teaching Centers – Recognizes institutions that have integrated GPU Computing techniques into their mainstream computer programming curriculum.

These programs complement the existing CUDA Center of Excellence program, which has recognized 10 premier institutions around the world. More details are available here:

Introductory Tutorial to OpenCL™ for HPC at SAAHPC’10

May 30th, 2010

AMD is offering an introductory tutorial to OpenCL™ that will be held alongside the 2010 Symposium on Application Accelerators in High Performance Computing (SAAHPC’10). The tutorial is a “programmer’s introduction” which covers the ideas behind OpenCL™ and their translation to source code. Read the rest of this entry »

ATI Stream Profiler v1.3 Released

May 20th, 2010

Advanced Micro Devices (AMD) recently released ATI Stream Profiler version 1.3. ATI Stream Profiler is a Microsoft® Visual Studio® integrated runtime profiler that gathers performance data from the GPU as your OpenCL™ application runs. This information can then be used by developers to discover where the bottlenecks are in their OpenCL™ application and find ways to optimize their application’s performance.

Features of the tool include:

  • Measure the execution time of an OpenCL kernel
  • Query the hardware performance counters on ATI Radeon graphics cards
  • Display the memory traffic from and to GPU
  • Compare multiple runs (sessions) of the same or different programs
  • Store the profile data for each run in a csv file
  • Display the IL and ISA (hardware disassembly) code of the OpenCL kernel

HOOMD-blue 0.9.0 released

May 20th, 2010

HOOMD-blue stands for Highly Optimized Object-oriented Many-particle Dynamics — Blue Edition. It performs general-purpose particle dynamics simulations on a single workstation, taking advantage of  NVIDIA GPUs to attain a level of performance equivalent to dozens of processor cores on a fast cluster.

HOOMD-blue 0.9.0 is a major new release. Highlights include:

  • Support for Fermi generation GPUs
  • Performance enhancements
  • New pair potentials
  • Particle data is now accessible from hoomd scripts
  • Binary format dump files for simulation restarts
  • Numerous small enhancements to enable easily restartable jobs
  • 2D simulations are now possible
  • Integration methods can now be applied to specified groups of particles
  • All IMD commands issued by VMD are now understood
  • and more

HOOMD-blue 0.9.0 is available for download under an open source license.

Scalable HeterOgeneous Computing (SHOC) Benchmark Suite

May 4th, 2010

The Scalable Heterogeneous Computing Benchmark Suite (SHOC) is a collection of benchmark programs testing the performance and stability of systems using computing devices with non-traditional architectures for general-purpose computing, and the software used to program them. Its initial focus is on systems containing Graphics Processing Units (GPUs) and multi-core processors, and on the OpenCL programming standard. It can be used on clusters as well as individual hosts.

(Danalis, A., Marin, G., McCurdy, C., Meredith, J., Roth, P., Spafford, K., Tipparaju, V., Vetter, J. (2010). The Scalable HeterOgeneous Computing (SHOC) Benchmark Suite.Proceedings of the Third Workshop on General-Purpose Computation on Graphics Processors (GPGPU 2010)PDF. Mar 2010.)

CUDPP 1.1.1

April 29th, 2010

The CUDA Data Parallel Primitives Library (CUDPP) is a cross-platform, open-source library of data-parallel algorithm primitives such as parallel prefix-sum (“scan”), parallel sort and parallel reduction. Primitives such as these are important building blocks for a wide variety of data-parallel algorithms, including sorting, stream compaction, and building data structures such as trees and summed-area tables. CUDPP runs on processors that support CUDA.

CUDPP release 1.1.1 is a bugfix release with fixes for scan, segmented scan, stream compaction, and radix sort on the NVIDIA Fermi (sm_20) architecture, including GeForce 400 series and Tesla 20 series GPUs.  It also includes improvements and bugfixes for radix sorts on 64-bit OSes, and fixes for 64-bit builds on MS Windows OSes and Apple OS X 10.6 (Snow Leopard).  Change Log.

Programming and Tuning Massively Parallel Systems Summer School

April 26th, 2010

Barcelona Computing Week
BSC/UPC, Barcelona, Spain
July 5-9, 2010

The Programming and Tuning Massively Parallel Systems Summer School (PUMPS) is aimed at enriching the skills of researchers, graduate students and teachers with cutting-edge techniques and hands-on experience in developing applications for many-core processors with massively parallel computing resources like GPU accelerators.


  • Wen-mei Hwu, University of Illinois at Urbana-Champaign
  • David Kirk, NVIDIA Fellow, former Chief Scientist, NVIDIA Corporation


  • Mateo Valero (BSC/UPC)
  • Wen-mei Hwu (UIUC)

Read the rest of this entry »

CLyther 0.1 Beta Released

April 25th, 2010

GeoSpin has released the first version of CLyther for beta testing. Please visit the CLyther SourceForge website for more information.  CLyther enables developers to seamlessly write GPGPU code completely in python with no additional syntax. CLyther’s core driver contains a python compiler to convert Python functions and types to OpenCL during runtime.

CLyther currently only supports a subset of the Python language definition but adds many new features to OpenCL such as:

  • OpenCL interface similar to PyOpenCL
  • Dynamic compilation of OpenCL code at runtime
  • Fast prototyping of OpenCL code
  • Create OpenCL code using the Python language definition
  • Passing functions as arguments to OpenCL kernels
  • Pure Python emulation mode of kernel functions

Read the rest of this entry »

Page 20 of 40« First...10...1819202122...3040...Last »