September 22nd, 2012
September 22nd, 2012
The “Ludwig” lattice Boltzmann fluid dynamics application is a versatile application capable of simulating the hydrodynamics of complex fluids, (e.g. mixtures, surficants, liquid crystals, particle suspensions) to allow cutting-edge research into condensed matter physics. On October 3, Dr. Alan Gray from the University of Edinburgh presents a webinar on his team’s experiences in scaling the application on the Cray XK6 hybrid supercomputer. The presentation will cover:
- A review of excellent scaling up to O(1000) GPUs
- Steps taken to maximize performance on each GPU
- Designing the communication to allow efficient usage of many GPUs in parallel, including the overlapping of several stages using CUDA stream functionality
- Advanced functionality, including how to include colloidal particles in the simulation while minimizing data transfer overheads
Register at http://www.gputechconf.com/page/gtc-express-webinar.html.
September 20th, 2012
Recognizing the growing interest and demand from NSF researchers for education on GPU computing, leading centers in NSF’s Extreme Science and Engineering Discovery Environment (XSEDE) program are working together to host a free two-day, hands-on workshop to share tips and best practices for accelerating scientific applications on GPUs using OpenACC. More information: http://blogs.nvidia.com/2012/09/u-s-scientists-nsf-to-host-nationwide-gpu-computing-workshop/
September 4th, 2012
The Vrije Universiteit Brussel, Erasmus Hogeschool Brussel and Lessius Hogeschool have the pleasure to invite you to a symposium on Personal High-Performance Computing. The symposium aims at bringing together academia and industry to discuss recent advances in using accelerators such as GPUs or FPGAs for speeding up computational-intensive applications. We target single systems such as PCs, laptops or processor boards, hence the name ‘personal’ HPC.
Scientists are encouraged to submit abstracts to be presented at the poster session. All information can be found at https://sites.google.com/site/phpc2012bxl.
August 20th, 2012
In this work, we evaluate OpenCL as aprogramming tool for developing performance-portable applications for GPGPU. While the Khronos group developed OpenCL with programming portability in mind, performance is not necessarily portable. OpenCL has required performance-impacting initializations that do not exist in other languages such as CUDA. Understanding these implications allows us to provide a single library with decent performance on a variety of platforms. We choose triangular solver (TRSM) and matrix multiplication (GEMM) as representative level 3 BLAS routines to implement in OpenCL. We profile TRSM to get the time distribution of the OpenCL runtime system. We then provide tuned GEMM kernels for both the NVIDIA Tesla C2050 and ATI Radeon 5870, the latest GPUs offered by both companies. We explore the benefits of using the texture cache, the performance ramifications of copying data into images, discrepancies in the OpenCL and CUDA compilers’ optimizations, and other issues that affect the performance. Experimental results show that nearly 50% of peak performance can be obtained in GEMM on both GPUs in OpenCL. We also show that the performance of these kernels is not highly portable. Finally, we propose the use of auto-tuning to better explore these kernels’ parameter space using search harness.
(Peng Du, Rick Weber, Piotr Luszczek, Stanimire Tomov, Gregory Peterson, Jack Dongarra, “From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming”, Parallel Computing 38(8):391–407, Aug. 2012. [DOI] [early techreport])
August 11th, 2012
The Computing Language Utility (CLU) is a lightweight API designed to help programmers explore, learn, and rapidly prototype programs with OpenCL. This API reduces the complexity associated with initializing OpenCL devices, contexts, kernels and parameters, etc. while preserving the ability to drop down to the lower level OpenCL API at will when programmers wants to get their hands dirty. The CLU release includes an open source implementation along with documentation and samples that demonstrate how to use CLU in real applications. It has been tested on Windows 7 with Visual Studio.
August 10th, 2012
This workshop is concerned with the comparison of high-performance computing systems through performance modeling, benchmarking or the use of tools such as simulators. We are particularly interested in research which reports the ability to measure and make tradeoffs in software/hardware co-design to improve sustained application performance. We are also keen to capture the assessment of future systems, for example through work that ensures continued application scalability through peta- and exa-scale systems.
Read the rest of this entry »
August 9th, 2012
The MOSIX group announces the release of the Virtual OpenCL (VCL) cluster platform version 1.14. This version includes the SuperCL extension that allows micro OpenCL programs to run efficiently on devices of remote nodes. VCL provides an OpenCL platform in which all the cluster devices are seen as if they are located in the hosting-node. This platform benefits OpenCL applications that can use many devices concurrently. Applications written for VCL benefit from the reduced programming complexity of a single computer, the availability of shared-memory, multi-threads and lower granularity parallelism, as well as concurrent access to devices in many nodes. With SuperCL, a programmable sequence of kernels and/or memory operations can be sent to remote devices in cluster nodes, usually with just a single network round-trip. SuperCL also offers asynchronous communication with the host, to avoid the round-trip waiting time, as well as direct access to distributed file-systems. The VCL package can be downloaded from mosix.org.
August 9th, 2012
Graphics Core Next Architecture Overview
GCN is Designed to push not only the boundaries of DirectX® 11 gaming, the GCN Architecture is also AMD’s first design specifically engineered for general computing. Equipped with up to 32 compute units (2048 stream processors), each containing a scalar coprocessor, AMD’s 28nm GPUs are more than capable of handling workloads-and programming languages-traditionally exclusive to the processor. Coupled with the dramatic rise of GPU-aware programming languages like C++ AMP and OpenCL™, the GCN Architecture is truly the right architecture for the right time. Participate in this webinar to learn how you can take advantage of this new architecture in your GPGPU programs (North America – August 14, 2012 10AM Pacific Daylight savings Time; India- August 21, 2012, 5:30PM India Standard Time).
Performance Evaluation of AMD APARAPI Using Real World Applications
Read the rest of this entry »
August 6th, 2012
The fall schedule for Acceleware’s training courses is now available.
- OpenCL: August 21-24, 2012, Houston, TX
- CUDA: October 2-5, 2012, San Jose, CA
- OpenCL: October 16-19, 2012, Calgary, AB
- CUDA: November 6-9, 2012, Houston, TX
- CUDA: December 4-7, 2012, New York, NY – Finance Focus
- AMP: December 11-14, 2012, Chicago, IL
More information: http://www.acceleware.com/training
The 2012 International Workshop on GPU Computing in Clouds (GPU-Cloud 2012) will he held December 03-06 2012 in Taipei, Taiwan, in conjunction with the 4th International Conference on Cloud Computing Technology and Science. Important Dates:
- Submission Deadline: August 17, 2012
- Authors Notification: September 11, 2012
- Final Manuscript Due: September 28, 2012
- Workshop: December 04, 2012
Submission site: http://www.easychair.org/conferences/?conf=gpucloud2012