This webinar provides an overview of the improved analysis performance tools available in CUDA 6.0 and key optimization strategies for compute, latency and memory bound problems. The webinar includes techniques for ensuring peak utilization of CUDA cores, how to improve branching efficiency, intrinsic functions and loop unrolling. Optimal access patterns for global and shared memory are presented, including a comparison between the Fermi and Kepler architectures. To view the webinar go to: http://acceleware.com/blog/webinar-essential-cuda-optimization-techniques
The course on Antenna Synthesis (with elements of GPU computing) is organized in the framework of the European School of Antennas. The course will take place at the Partenope Conference Center of the Università di Napoli Federico II, Napoli, Italy, on October 13-17, 2014. It faces three topics corresponding to the two main aspects of Antenna Synthesis, namely external and internal synthesis, and to numerical and implementation issues on High Performance Computing (HPC) platforms of synthesis algorithms. For details about the course please see this brochure and http://www.antennasvce.org/Community/Education/Courses?id_folder=533.
Partnering with NVIDIA, this four day CUDA training course, held in Houston is designed for programmers in the oil and gas industry who are looking to develop comprehensive skills in writing and optimizing applications that fully leverage the many-core processing capabilities of the GPU. Commonly used algorithms such as filtering and FFTs will be used and profiled in the examples. The case study on day 4 focuses on efficient implementation of a finite difference algorithm which is highly applicable to reverse time migration. However a background in oil and gas is not necessary. For more information and to view a copy of the course outline please visit: http://acceleware.com/training/987
This tutorial by Dan Cyca outlines the shared memory configurations for NVIDIA Fermi and Kepler architectures, and demonstrates how to rewrite kernels to take advantage of the changes in Kepler’s shared memory architecture.
This webinar recording provides an overview of the profiling techniques and the tools available to help you optimize your code. It examines NVIDIA’s Visual Profiler and cuobjdump and highlight the various methods available for understanding the performance of CUDA program. The second part of the session focuses on debugging techniques and the tools available to help identify issues in kernels. The debugging tools provided in CUDA 5.5 including NSight and cuda-memcheck are discussed. The webinar recording can be accessed here.
The Virtual School of Computational Science and Engineering is hosting two upcoming webinars.
- Introduction to HOOMD-blue, December 10, 2013, 11:00 EST.
- Using HOOMD-blue for Polymer Simulations and Big Systems, January 21, 2014, 11:00 EST.
More information and registration: http://www.vscse.org/
One of the keys to achieving maximum performance in CUDA is taking advantage of the various memory spaces. Part II of Acceleware’s tutorial has now been published. The tutorial uses a simple encryption kernel to test and compare read-only cache, constant cache and global memory. Read the full tutorial…
This blog takes a closer look at constant cache and read-only cache. It highlights the differences between the two memory types and what circumstances they perform best in. Read the whole story here.
Acceleware recently announced a couple of courses:
- CUDA for Finance: December 10 – 13, 2013, New York, NY [Details]
- OpenCL: October 22 – 25, 2013, Houston, TX [details]
- CUDA: September 24-27, [Details]
- C++ AMP: September 10-13, [Details]