New OpenCL back-end in CAPS HMPP 2.3 hybrid compiler

June 6th, 2010

CAPS has recently added an OpenCL code generator to the just released 2.3 version of its HMPP directive-based hybrid compiler. Also, the CUDA back-end generator has been enhanced with Fermi capabilities and this new release brings support for more native compilers with Intel ifort/icc, GNU gcc/gfortran and PGI pgcc/pgfort compilers, enabling developers to freely use their favorite compiler with HMPP 2.3.

Based on GPU programming and tuning directives, HMPP offers an incremental programming model that allows developers with different levels of expertise to fully exploit GPU hardware accelerators in their legacy code. Read the rest of this entry »

Nexiwave 2.0 GPU-accelerated Speech Indexing

June 3rd, 2010

nexiwave.com, a Speech Indexing Cloud Service company based in Boston MA, announces that it has completed the GPU-acceleration of its speech indexing service, Nexiwave 2.0. Without sacrificing accuracy of its service, nexiwave enjoys over 75% relative speed improvement (comparing a stock Sphinx4 running on a 2.5Ghz/8 core/24GB RAM server to a Sphinx4 on 2.5Ghz/Quad Core/4GB with NVIDIA GTX 470 GPU). Read the rest of this entry »

GPGPU Wrapper for R Statistical Computing Environment

June 2nd, 2010

Jaideep Singh and Ipseeta Aruni present a GPGPU wrapper for the R statistical computing environment at the R user conference 2010. Their approach is to overload datatypes using R’s simplified wrapper and the SWIG Interface Generator functionality. A full page summary of the approach is available at the conference web site (PDF link).

Mellanox and NVIDIA introduce GPUDirect Technology

June 2nd, 2010

Mellanox and NVIDIA have teamed up to create a solution that enables data sharing (without expensive memory copies) between CUDA-managed host memory and Mellanox Infiniband cards. NVIDIA GPUDirect technology allows application and middleware developers to improve performance by up to 30%, by providing a shared, RDMA-accessible address space between the GPU and the interconnect.

The full press release is available here.

Intel Releases Knights Corner

June 2nd, 2010

At ISC’10, Intel demonstrated their co-processor approach to HPC (formerly known as Larrabee, now codenamed Knights Corner). A prototype of the Intel Many Integrated Core (MIC) architecture with 32 in-order cores, each equipped with a 512-wide vector unit and connected via an on-chip coherent cache, delivered more than half a Teraflop performance for LU decomposition in a live demonstration during a keynote by Kirk Skaugen.

The full press release from ISC’10 is available here.

Australia GPU Users Groups

June 1st, 2010

The Australia GPU Users groups are informal special interest groups founded to bring together GPU users from all fields and experience levels to learn and share their ideas and creations at friendly meetings.  There are currently GPU users groups forming in Brisbane, Sydney, and Perth.

The groups will discuss general GPU computing, including GPGPU, CUDA, OpenCL, DirectCompute, DirectX and OpenGL and related technologies. There will be short presentations during the meetings, as well as informal discussions on a range of subjects, including core fundamentals, hardware architectures, parallel programming as well as specific optimisations and also examples of applications from different fields of industry, science and multimedia.

Sign up today: the meetings will allow you to meet others who share your interest in GPUs.

GPGPU.org is maintaining a list of GPU Users groups.  If you have a local GPU users group, please tell us about it!

New NVIDIA Research & Certification Progams for CUDA/GPGPU

June 1st, 2010

At the ISC 2010 conference in Hamburg, Germany, this week, NVIDIA announced new programs for the growing CUDA/GPGPU developer community:

  • CUDA Certification Program – Driven by demand for qualified GPGPU engineers, this is the first program to certify expertise in massively parallel programming on GPUs.
  • CUDA Research Centers – Recognizes institutions that embrace GPU Computing across multiple research fields.
  • CUDA Teaching Centers – Recognizes institutions that have integrated GPU Computing techniques into their mainstream computer programming curriculum.

These programs complement the existing CUDA Center of Excellence program, which has recognized 10 premier institutions around the world. More details are available here: http://www.nvidia.com/object/io_1275409333119.html

White Paper: “Many-Core Processors Report Ready for Duty”

June 1st, 2010

From a white paper by GE Intelligent Platforms (Link):

This white paper describes how GPGPU technology can allow system designers to fit an unprecedented amount of processing power into a very compact package. For example, it describes four GE Intelligent Platforms 3U VPX boards with a floating point performance of 766 GFLOPS in less than 0.4 cubic feet. With configuration control and lifecycle management from a leading COTS supplier, these technologies are clearly ready for duty.


CFP: First International Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures (ADMS’10), Colocated with VLDB 2010

June 1st, 2010

The objective of this one-day workshop is to investigate opportunities in accelerating data management systems and workloads (which include traditional OLTP, data warehousing/OLAP, ETL, Streaming/Realtime, and XML/RDF Processing) using various processor architectures  (e.g., commodity and specialized Multi-core CPUs, Many-core GPUs, and FPGAs), storage systems (e.g., Storage-class Memories like SSDs and Phase-change Memory), and multicore programming strategies like OpenCL.

More information and the full call can be found here: http://www.adms-conf.org/

Read the rest of this entry »

GPU Supercomputer #2 in Top500

May 31st, 2010

The June 2010 Top500 list of the world’s fastest supercomputers was released this week at ISC 2010.  While the US Jaguar supercomputer (located at the Department of Energy’s Oak Ridge Leadership Computing Facility) retained the top spot in Linpack performance, a Chinese cluster called Nebulae, built from a Dawning TC3600 Blade system with Intel X5650 processors and NVIDIA Tesla C2050 GPUs is now the fastest in theoretical peak performance at 2.98 PFlop/s and No. 2 with a Linpack performance of 1.271 PFlop/s. This is the highest rank a GPU-accelerated system, or a Chinese system, has ever achieved on the Top500 list.

For more information, visit www.TOP500.org.

Page 39 of 98« First...102030...3738394041...506070...Last »