<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>GPGPU &#187; Tag: High-Performance Computing :: GPGPU.org</title>
	<atom:link href="http://gpgpu.org/tag/high-performance-computing/feed" rel="self" type="application/rss+xml" />
	<link>http://gpgpu.org</link>
	<description>General-Purpose Computation on Graphics Hardware</description>
	<lastBuildDate>Tue, 22 May 2012 08:44:05 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>5th Workshop on UnConventional High Performance Computing 2012</title>
		<link>http://gpgpu.org/2012/05/19/5th-workshop-on-unconventional-high-performance-computing-2012</link>
		<comments>http://gpgpu.org/2012/05/19/5th-workshop-on-unconventional-high-performance-computing-2012#comments</comments>
		<pubDate>Sat, 19 May 2012 09:49:18 +0000</pubDate>
		<dc:creator>dom</dc:creator>
				<category><![CDATA[Events]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Call for Papers]]></category>
		<category><![CDATA[High-Performance Computing]]></category>
		<category><![CDATA[Workshops]]></category>

		<guid isPermaLink="false">http://gpgpu.org/?p=4705</guid>
		<description><![CDATA[Together with EuroPar-12, the 5th Workshop on UnConventional High Performance Computing 2012 (UCHPC 2012) will take place on August 27/28 at Rhodes Island, Greece. The workshop tries to capture solutions for HPC which are unconventional today but could become conventional and significant tomorrow. While GPGPU is already used a lot in HPC, there still are [...]]]></description>
			<content:encoded><![CDATA[<p>Together with EuroPar-12, the 5th Workshop on UnConventional High Performance Computing 2012 (UCHPC 2012) will take place on August 27/28 at Rhodes Island, Greece. The workshop tries to capture solutions for HPC which are unconventional today but could become conventional and significant tomorrow. While GPGPU is already used a lot in HPC, there still are all kind of issues around best exploitation and productivity for the programmer. Submission deadline: June 6, 2012. For more details, see<br />
<a title="UCHPC website" href="http://www.lrr.in.tum.de/~weidendo/uchpc12" target="_blank">http://www.lrr.in.tum.de/~weidendo/uchpc12</a></p>
]]></content:encoded>
			<wfw:commentRss>http://gpgpu.org/2012/05/19/5th-workshop-on-unconventional-high-performance-computing-2012/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>HOOMD-blue 0.10.0 release</title>
		<link>http://gpgpu.org/2011/12/19/hoomd-blue-0-10-0-release</link>
		<comments>http://gpgpu.org/2011/12/19/hoomd-blue-0-10-0-release#comments</comments>
		<pubDate>Mon, 19 Dec 2011 07:44:41 +0000</pubDate>
		<dc:creator>dom</dc:creator>
				<category><![CDATA[Developer Resources]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[High-Performance Computing]]></category>
		<category><![CDATA[Molecular Dynamics]]></category>
		<category><![CDATA[NVIDIA CUDA]]></category>
		<category><![CDATA[Open Source]]></category>

		<guid isPermaLink="false">http://gpgpu.org/?p=4279</guid>
		<description><![CDATA[HOOMD-blue performs general-purpose particle dynamics simulations on a single workstation, taking advantage of NVIDIA GPUs to attain a level of performance equivalent to many cores on a fast cluster. Flexible and configurable, HOOMD-blue is currently being used for coarse-grained molecular dynamics simulations of nano-materials, glasses, and surfactants, dissipative particle dynamics simulations (DPD) of polymers, and [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://codeblue.umich.edu/hoomd-blue/">HOOMD-blue</a> performs general-purpose particle dynamics simulations on a single workstation, taking advantage of NVIDIA GPUs to attain a level of performance equivalent to many cores on a fast cluster. Flexible and configurable, HOOMD-blue is currently being used for coarse-grained molecular dynamics simulations of nano-materials, glasses, and surfactants, dissipative particle dynamics simulations (DPD) of polymers, and crystallization of metals.</p>
<p>HOOMD-blue 0.10.0 adds many new features. Highlights include:<span id="more-4279"></span></p>
<ul>
<li>Added <strong>pair.dpdlj</strong> which uses the <span class="caps">DPD </span>thermostat and the Lennard-Jones potential. In previous versions, this could be accomplished by using two pair commands but at the cost of reduced performance.</li>
<li>Additional example scripts are now present in the documentation. The example scripts are cross-linked to the commands that are used in them.</li>
<li>Most dump commands now accept the form: <strong>dump.ext(filename=&#8221;filename.ext&#8221;)</strong> which immediately writes out filename.ext.</li>
<li>Specify rigid bodies in <span class="caps">XML </span>input files</li>
<li>Simulations that contain rigid body constraints applied to groups of particles in <span class="caps">BDNVT, NVE, NVT, </span>and <span class="caps">NPT </span>ensembles.</li>
<li>Energy minimization of rigid bodies ( <strong>integrate.mode_minimize_rigid_fire</strong> )</li>
<li>Existing commands are now rigid-body aware</li>
<li><span class="caps">NVT </span>integration using the Berendsen thermostat ( <strong>integrate.berendsen</strong> )</li>
<li>Bonds, angles, dihedrals, and impropers can now be created and deleted with the python data access <span class="caps">API.</span></li>
<li>and <a href="http://codeblue.umich.edu/hoomd-blue/">more</a></li>
</ul>
<p>HOOMD-blue 0.10.0 is available for <a href="http://codeblue.umich.edu/hoomd-blue/download.html">download</a> under an open source license. Check out the <a href="http://codeblue.umich.edu/hoomd-blue/doc/page_quick_start.html">quick start tutorial</a> to get started, or check out the <a href="http://codeblue.umich.edu/hoomd-blue/doc/index.html">full documentation</a> to see everything it can do.</p>
]]></content:encoded>
			<wfw:commentRss>http://gpgpu.org/2011/12/19/hoomd-blue-0-10-0-release/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>On the Acceleration of Wavefront Applications using Distributed Many-Core Architectures</title>
		<link>http://gpgpu.org/2011/12/14/acceleration-of-wavefront-applications</link>
		<comments>http://gpgpu.org/2011/12/14/acceleration-of-wavefront-applications#comments</comments>
		<pubDate>Wed, 14 Dec 2011 09:26:00 +0000</pubDate>
		<dc:creator>dom</dc:creator>
				<category><![CDATA[Research]]></category>
		<category><![CDATA[Clusters]]></category>
		<category><![CDATA[High-Performance Computing]]></category>
		<category><![CDATA[Linear Algebra]]></category>
		<category><![CDATA[NVIDIA CUDA]]></category>
		<category><![CDATA[Papers]]></category>

		<guid isPermaLink="false">http://gpgpu.org/?p=4264</guid>
		<description><![CDATA[Abstract: In this paper we investigate the use of distributed graphics processing unit (GPU)-based architectures to accelerate pipelined wavefront applications—a ubiquitous class of parallel algorithms used for the solution of a number of scientific and engineering applications. Specifically, we employ a recently developed port of the LU solver (from the NAS Parallel Benchmark suite) to [...]]]></description>
			<content:encoded><![CDATA[<p>Abstract:</p>
<blockquote><p>In this paper we investigate the use of distributed graphics processing unit (GPU)-based architectures to accelerate pipelined wavefront applications—a ubiquitous class of parallel algorithms used for the solution of a number of scientific and engineering applications. Specifically, we employ a recently developed port of the LU solver (from the NAS Parallel Benchmark suite) to investigate the performance of these algorithms on high-performance computing solutions from NVIDIA (Tesla C1060 and C2050) as well as on traditional clusters (AMD/InfiniBand and IBM BlueGene/P).</p>
<p>Benchmark results are presented for problem classes A to C and a recently developed performance model is used to provide projections for problem classes D and E, the latter of which represents a billion-cell problem. Our results demonstrate that while the theoretical performance of GPU solutions will far exceed those of many traditional technologies, the sustained application performance is currently comparable for scientific wavefront applications. Finally, a breakdown of the GPU solution is conducted, exposing PCIe overheads and decomposition constraints. A new k-blocking strategy is proposed to improve the future performance of this class of algorithm on GPU-based architectures.</p></blockquote>
<p>(Pennycook, S.J., Hammond, S.D., Mudalige, G.R., Wright, S.A. and Jarvis, S.A.: <em>&#8220;On the Acceleration of Wavefront Applications using Distributed Many-Core Architectures&#8221;</em>,  The Computer Journal (in press) [<a href="http://dx.doi.org/10.1093/comjnl/bxr073" target="_blank">DOI</a>] [<a href="http://eprints.dcs.warwick.ac.uk/787/" target="_blank">PREPRINT</a>])</p>
]]></content:encoded>
			<wfw:commentRss>http://gpgpu.org/2011/12/14/acceleration-of-wavefront-applications/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>rCUDA 3.1 Released</title>
		<link>http://gpgpu.org/2011/10/20/rcuda-3-1</link>
		<comments>http://gpgpu.org/2011/10/20/rcuda-3-1#comments</comments>
		<pubDate>Thu, 20 Oct 2011 10:49:26 +0000</pubDate>
		<dc:creator>dom</dc:creator>
				<category><![CDATA[Developer Resources]]></category>
		<category><![CDATA[Clusters]]></category>
		<category><![CDATA[High-Performance Computing]]></category>
		<category><![CDATA[Libraries]]></category>
		<category><![CDATA[NVIDIA CUDA]]></category>
		<category><![CDATA[Open Source]]></category>

		<guid isPermaLink="false">http://gpgpu.org/?p=4055</guid>
		<description><![CDATA[The new version 3.1 of rCUDA (Remote CUDA), the Open Source package that allows performing CUDA calls to remote GPUs, is now available. Release highlights: Fully updated API to CUDA 4.0 (added support for modules &#8220;Peer Device Memory Access&#8221; and &#8220;Unified Addressing&#8221;). Fixed low level Surface Reference management functions. For further information, please visit the [...]]]></description>
			<content:encoded><![CDATA[<p>The new version 3.1 of rCUDA (Remote CUDA), the Open Source package that allows performing CUDA calls to remote GPUs, is now available. Release highlights:</p>
<ul>
<li>Fully updated API to CUDA 4.0 (added support for modules &#8220;Peer Device Memory Access&#8221; and &#8220;Unified Addressing&#8221;).</li>
<li>Fixed low level Surface Reference management functions.</li>
</ul>
<p>For further information, please visit the rCUDA webpage  at <a href="http://www.gap.upv.es/rCUDA" target="_blank">http://www.gap.upv.es/rCUDA</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://gpgpu.org/2011/10/20/rcuda-3-1/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>CfP: 20th High Performance Computing Symposium 2012</title>
		<link>http://gpgpu.org/2011/10/07/20th-hpc-2012</link>
		<comments>http://gpgpu.org/2011/10/07/20th-hpc-2012#comments</comments>
		<pubDate>Fri, 07 Oct 2011 09:48:52 +0000</pubDate>
		<dc:creator>dom</dc:creator>
				<category><![CDATA[Events]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Conferences]]></category>
		<category><![CDATA[High-Performance Computing]]></category>
		<category><![CDATA[Scientific Computing]]></category>

		<guid isPermaLink="false">http://gpgpu.org/?p=4021</guid>
		<description><![CDATA[The 2012 Spring Simulation Multi-conference will feature the 20th High Performance Computing Symposium (HPC 2012), devoted to the impact of high performance computing and communications on computer simulations. Topics of interest include: high performance/large scale application case studies, GPUs for general purpose computations (GPGPU) multicore and many-core computing, power aware computing, large scale visualization and [...]]]></description>
			<content:encoded><![CDATA[<p>The 2012 Spring Simulation Multi-conference will feature the <a title="link to conference" href="http://www.ncsu.edu/itd/hpc/hpc2012/hpc2012.html" target="_blank">20th High Performance Computing Symposium (HPC 2012)</a>, devoted to the impact of high performance computing and communications on computer simulations. Topics of interest include:</p>
<ul>
<li>high performance/large scale application case studies,</li>
<li>GPUs for general purpose computations (GPGPU)</li>
<li>multicore and many-core computing,</li>
<li>power aware computing,</li>
<li>large scale visualization and data management,</li>
<li>tools and environments for coupling parallel codes,</li>
<li>parallel algorithms and architectures,</li>
<li>high performance software tools,</li>
<li>component technologies for high performance computing.</li>
</ul>
<p>Important dates: Paper submission due: December 2, 2011; Notification of acceptance: January 13, 2012; Revised manuscript due: January 27, 2012; Symposium: March 26&#8211;29, 2012.</p>
]]></content:encoded>
			<wfw:commentRss>http://gpgpu.org/2011/10/07/20th-hpc-2012/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>rCUDA 3.0a released</title>
		<link>http://gpgpu.org/2011/07/17/rcuda-3-0a-released</link>
		<comments>http://gpgpu.org/2011/07/17/rcuda-3-0a-released#comments</comments>
		<pubDate>Mon, 18 Jul 2011 00:02:13 +0000</pubDate>
		<dc:creator>dom</dc:creator>
				<category><![CDATA[Developer Resources]]></category>
		<category><![CDATA[Clusters]]></category>
		<category><![CDATA[High-Performance Computing]]></category>
		<category><![CDATA[Libraries]]></category>
		<category><![CDATA[NVIDIA CUDA]]></category>
		<category><![CDATA[Virtualisation]]></category>

		<guid isPermaLink="false">http://gpgpu.org/?p=3747</guid>
		<description><![CDATA[A new alpha release of rCUDA 3.0 (Remote CUDA), the Open Source package that allows performing CUDA calls to remote GPUs, has been released. Major improvements included in this new version are: Partially updated API to 4.0 Added compatibility support with CUDA 4.0 environment Updated CUBLAS API to 4.0 for the most common CUBLAS routines [...]]]></description>
			<content:encoded><![CDATA[<p>A new alpha release of rCUDA 3.0 (Remote CUDA), the Open Source package that allows performing CUDA calls to remote GPUs, has been released. Major improvements included in this new version are:</p>
<ul>
<li>Partially updated API to 4.0</li>
<li>Added compatibility support with CUDA 4.0 environment</li>
<li>Updated CUBLAS API to 4.0 for the most common CUBLAS routines</li>
<li>Fixed some bugs</li>
<li>General performance improvements</li>
</ul>
<p>For further information, please visit the <a href="http://www.gap.upv.es/rCUDA" target="_blank">rCUDA webpage</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://gpgpu.org/2011/07/17/rcuda-3-0a-released/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>CheCL: Transparent Checkpointing and Process Migration of OpenCL Applications</title>
		<link>http://gpgpu.org/2011/06/26/checl-checkpointing-process-migration-opencl</link>
		<comments>http://gpgpu.org/2011/06/26/checl-checkpointing-process-migration-opencl#comments</comments>
		<pubDate>Sun, 26 Jun 2011 23:09:04 +0000</pubDate>
		<dc:creator>dom</dc:creator>
				<category><![CDATA[Research]]></category>
		<category><![CDATA[High-Performance Computing]]></category>
		<category><![CDATA[OpenCL]]></category>
		<category><![CDATA[Papers]]></category>

		<guid isPermaLink="false">http://gpgpu.org/?p=3648</guid>
		<description><![CDATA[Abstract: We propose a new transparent checkpoint/restart (CPR) tool, named CheCL, for high performance and dependable GPU computing. CheCL can perform CPR on an OpenCL application program without any modification and recompilation of its code. A conventional checkpointing system fails to checkpoint a process if the process uses OpenCL. Therefore, in CheCL, every API call [...]]]></description>
			<content:encoded><![CDATA[<p>Abstract:</p>
<blockquote><p>We propose a new transparent checkpoint/restart (CPR) tool, named CheCL, for high performance and dependable GPU computing. CheCL can perform CPR on an OpenCL application program without any modification and recompilation of its code. A conventional checkpointing system fails to checkpoint a process if the process uses OpenCL. Therefore, in CheCL, every API call is forwarded to another process called an API proxy, and the API proxy invokes the API function; two processes, an application process and an API proxy, are launched for an OpenCL application. In this case, as the application process is not an OpenCL process but a standard process, it can be safely checkpointed. While CheCL intercepts all API calls, it records the information necessary for restoring OpenCL objects. The application process does not hold any OpenCL handles, but CheCL handles to keep such information. Those handles are automatically converted to OpenCL handles and then passed to API functions. Upon restart, OpenCL objects are automatically restored based on the recorded information. This paper demonstrates the feasibility of transparent checkpointing of OpenCL programs including MPI applications, and quantitatively evaluates the runtime overheads. It is also discussed that CheCL can enable process migration of OpenCL applications among distinct nodes, and among different kinds of compute devices such as a CPU and a GPU.</p></blockquote>
<p>(Hiroyuki Takizawa, Kentaro Koyama, Katuto Sato, Kazuhiko Komatsu, and Hiroaki Kobayashi: <em>&#8220;CheCL: Transparent Checkpointing and Process Migration of OpenCL Applications&#8221;</em>, Proceedings of International Parallel and Distributed Processing Symposium (IPDPS11), 2011. [<a href="http://www.sc.isc.tohoku.ac.jp/~tacky/papers/htakizawa_ipdps2011.pdf" target="_blank">PDF</a>])</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://gpgpu.org/2011/06/26/checl-checkpointing-process-migration-opencl/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>HOOMD-blue 0.9.2 release</title>
		<link>http://gpgpu.org/2011/04/06/hoomd-blue-0-9-2</link>
		<comments>http://gpgpu.org/2011/04/06/hoomd-blue-0-9-2#comments</comments>
		<pubDate>Wed, 06 Apr 2011 23:39:25 +0000</pubDate>
		<dc:creator>dom</dc:creator>
				<category><![CDATA[Research]]></category>
		<category><![CDATA[High-Performance Computing]]></category>
		<category><![CDATA[Molecular Dynamics]]></category>
		<category><![CDATA[NVIDIA CUDA]]></category>
		<category><![CDATA[Open Source]]></category>

		<guid isPermaLink="false">http://gpgpu.org/?p=3452</guid>
		<description><![CDATA[HOOMD-blue performs general-purpose particle dynamics simulations on a single workstation, taking advantage of NVIDIA GPUs to attain a level of performance equivalent to many cores on a fast cluster. Flexible and configurable, HOOMD-blue is currently being used for coarse-grained molecular dynamics simulations of nano-materials, glasses, and surfactants, dissipative particle dynamics simulations (DPD) of polymers, and [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://codeblue.umich.edu/hoomd-blue/" target="_blank">HOOMD-blue</a> performs general-purpose particle dynamics simulations on a single workstation, taking advantage of NVIDIA GPUs to attain a level of performance equivalent to many cores on a fast cluster. Flexible and configurable, HOOMD-blue is currently being used for coarse-grained molecular dynamics simulations of nano-materials, glasses, and surfactants, dissipative particle dynamics simulations (DPD) of polymers, and crystallization of metals.</p>
<p>HOOMD-blue 0.9.2 adds many new features. Highlights include:</p>
<ul>
<li>Long-ranged electrostatics via PPPM</li>
<li>Support for CUDA 3.2 and 4.0</li>
<li>New neighbor list option to exclude by particle diameter (for pair.slj)</li>
<li>New syntax to specify multiple pair coefficients at once</li>
<li>Improved documentation</li>
<li>Significant performance boosts for small simulations</li>
<li>RPM and .deb packaging for CentOS, Fedora, and Ubuntu</li>
<li>and <a href="https://codeblue.umich.edu/hoomd-blue/trac/wiki/ChangeLog" target="_blank">more</a></li>
</ul>
<p>HOOMD-blue 0.9.2 is available for <a href="http://codeblue.umich.edu/hoomd-blue/download.html" target="_blank">download</a> under an open source license. Check out the <a href="http://codeblue.umich.edu/hoomd-blue/doc/page_quick_start.html" target="_blank">quick start tutorial</a> to get started, or check out the <a href="http://codeblue.umich.edu/hoomd-blue/doc/index.html">full documentation</a> to see everything it can do.</p>
]]></content:encoded>
			<wfw:commentRss>http://gpgpu.org/2011/04/06/hoomd-blue-0-9-2/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>AMD Fusion Developer Summit</title>
		<link>http://gpgpu.org/2011/03/29/amd-fusion-developer-summit</link>
		<comments>http://gpgpu.org/2011/03/29/amd-fusion-developer-summit#comments</comments>
		<pubDate>Wed, 30 Mar 2011 00:51:18 +0000</pubDate>
		<dc:creator>dom</dc:creator>
				<category><![CDATA[Developer Resources]]></category>
		<category><![CDATA[Events]]></category>
		<category><![CDATA[AMD]]></category>
		<category><![CDATA[Computer Graphics]]></category>
		<category><![CDATA[Conferences]]></category>
		<category><![CDATA[Heterogeneneous Computing]]></category>
		<category><![CDATA[High-Performance Computing]]></category>
		<category><![CDATA[OpenCL]]></category>
		<category><![CDATA[Tools]]></category>
		<category><![CDATA[Tutorials & Courses]]></category>

		<guid isPermaLink="false">http://gpgpu.org/?p=3368</guid>
		<description><![CDATA[Heterogeneous computing is moving into the mainstream, and a broader range of applications are already on the way. As the provider of world-class CPUs, GPUs, and APUs, AMD offers unique insight into these technologies and how they interoperate. We’ve been working with industry and academia partners to help advance real-world use of these technologies, and [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://gpgpu.org/wp/wp-content/uploads/2011/03/afds_logo.png"><img class="alignright size-medium wp-image-3369" style="background-color: red; padding: 3px;" title="afds_logo" src="http://gpgpu.org/wp/wp-content/uploads/2011/03/afds_logo-300x77.png" alt="" width="300" height="77" /></a>Heterogeneous computing is moving into the mainstream, and a broader range of applications are already on the way. As the provider of world-class CPUs, GPUs, and APUs, AMD offers unique insight into these technologies and how they interoperate.  We’ve been working with industry and academia partners to help advance real-world use of these technologies, and to understand the opportunities that lie ahead. It’s time to share what we’ve learned so far.</p>
<p>With tutorials, hands-on labs, and sessions that span a range of topics from HPC to multimedia, you’ll have the opportunity to expand your view of what heterogeneous computing currently offers and where it is going. You’ll hear from industry innovators and academic pioneers who are exploring different ways of approaching problems, and utilizing new paradigms in computing to help identify solutions. You’ll meet AMD experts with deep knowledge of hardware architectures and the software techniques that best leverage those platforms. And you’ll connect with other software professionals who share your passion for the future of technology.</p>
<p>Learn more at <a href="http://developer.amd.com/afds" target="_blank">developer.amd.com/afds</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://gpgpu.org/2011/03/29/amd-fusion-developer-summit/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>CUDA 4.0 Release Aims to Make Parallel Programming Easier</title>
		<link>http://gpgpu.org/2011/03/01/cuda-4-0-release</link>
		<comments>http://gpgpu.org/2011/03/01/cuda-4-0-release#comments</comments>
		<pubDate>Tue, 01 Mar 2011 07:55:01 +0000</pubDate>
		<dc:creator>Mark Harris</dc:creator>
				<category><![CDATA[Business]]></category>
		<category><![CDATA[Developer Resources]]></category>
		<category><![CDATA[Press]]></category>
		<category><![CDATA[High-Performance Computing]]></category>
		<category><![CDATA[Multi-GPU]]></category>
		<category><![CDATA[NVIDIA CUDA]]></category>
		<category><![CDATA[Parallel Algorithms]]></category>
		<category><![CDATA[Parallel Computing]]></category>
		<category><![CDATA[Programming Languages]]></category>
		<category><![CDATA[Tools]]></category>

		<guid isPermaLink="false">http://gpgpu.org/?p=3309</guid>
		<description><![CDATA[Today NVIDIA announced the upcoming 4.0 release of CUDA.  While most of the major CUDA releases accompanied a new GPU architecture, 4.0 is a software-only release, but that doesn&#8217;t mean there aren&#8217;t a lot of new features.  With this release, NVIDIA is aiming to lower the barrier to entry to parallel programming on GPUs, with [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://gpgpu.org/wp/wp-content/uploads/2011/01/NVLogo_2D-e1298965986472.jpg"><img class="alignright size-full wp-image-3194" title="NVLogo_2D" src="http://gpgpu.org/wp/wp-content/uploads/2011/01/NVLogo_2D-e1298965986472.jpg" alt="" width="150" height="111" /></a>Today NVIDIA announced the upcoming 4.0 release of CUDA.  While most of the major CUDA releases accompanied a new GPU architecture, 4.0 is a software-only release, but that doesn&#8217;t mean there aren&#8217;t a lot of new features.  With this release, NVIDIA is aiming to lower the barrier to entry to parallel programming on GPUs, with new features including easier multi-GPU programming, a unified virtual memory address space, the powerful Thrust C++ template library, and automatic performance analysis in the Visual Profiler tool.  Full details follow in the quoted press release below.</p>
<p><span id="more-3309"></span></p>
<blockquote><p>SANTA CLARA, CA &#8212; (Marketwire) &#8212; 02/28/2011 &#8211; NVIDIA today announced the latest version of the NVIDIA® CUDA® Toolkit for developing parallel applications using NVIDIA GPUs.</p>
<p>The NVIDIA CUDA 4.0 Toolkit was designed to make parallel programming easier, and enable more developers to port their applications to GPUs. This has resulted in three main features:</p>
<ul>
<li>NVIDIA GPUDirect™ 2.0 Technology &#8211; Offers support for peer-to-peer communication among GPUs within a single server or workstation. This enables easier and faster multi-GPU programming and application performance.</li>
<li>Unified Virtual Addressing (UVA) &#8211; Provides a single merged-memory address space for the main system memory and the GPU memories, enabling quicker and easier parallel programming.</li>
<li>Thrust C++ Template Performance Primitives Libraries &#8211; Provides a collection of powerful open source C++ parallel algorithms and data structures that ease programming for C++ developers. With Thrust, routines such as parallel sorting are 5X to 100X faster than with Standard Template Library (STL) and Threading Building Blocks (TBB).</li>
</ul>
<p>&#8220;Unified virtual addressing and faster GPU-to-GPU communication makes it easier for developers to take advantage of the parallel computing capability of GPUs,&#8221; said John Stone, senior research programmer, University of Illinois, Urbana-Champaign.</p>
<p>&#8220;Having access to GPU computing through the standard template interface greatly increases productivity for a wide range of tasks, from simple cashflow generation to complex computations with Libor market models, variable annuities or CVA adjustments,&#8221; said Peter Decrem, director of Rates Products at Quantifi. &#8221;The Thrust C++ library has lowered the barrier of entry significantly by taking care of low-level functionality like memory access and allocation, allowing the financial engineer to focus on algorithm development in a GPU-enhanced environment.&#8221;</p>
<p>The CUDA 4.0 architecture release includes a number of other key features and capabilities, including:</p>
<ul>
<li>MPI Integration with CUDA Applications &#8211; Modified MPI implementations automatically move data from and to the GPU memory over Infiniband when an application does an MPI send or receive call.</li>
<li>Multi-thread Sharing of GPUs &#8211; Multiple CPU host threads can share contexts on a single GPU, making it easier to share a single GPU by multi-threaded applications.</li>
<li>Multi-GPU Sharing by Single CPU Thread &#8211; A single CPU host thread can access all GPUs in a system. Developers can easily coordinate work across multiple GPUs for tasks such as &#8220;halo&#8221; exchange in applications.</li>
<li>New NPP Image and Computer Vision Library &#8211; A rich set of image transformation operations that enable rapid development of imaging and computer vision applications.</li>
<li>New and Improved Capabilities
<ul>
<li>Auto performance analysis in the Visual Profiler</li>
<li>New features in cuda-gdb and added support for MacOS</li>
<li>Added support for C++ features like new/delete and virtual functions</li>
<li>New GPU binary disassembler</li>
</ul>
</li>
</ul>
<p>A release candidate of CUDA Toolkit 4.0 will be available free of charge beginning March 4, 2011, by enrolling in the CUDA Registered Developer Program at: <a href="http://www.nvidia.com/paralleldeveloper" target="_blank">www.nvidia.com/paralleldeveloper</a>. The CUDA Registered Developer Program provides a wealth of tools, resources, and information for parallel application developers to maximize the potential of CUDA.</p>
<p>For more information on the features and capabilities of the CUDA Toolkit and on GPGPU applications, please visit:<a href="http://www.nvidia.com/cuda" target="_blank">www.nvidia.com/cuda</a>.</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://gpgpu.org/2011/03/01/cuda-4-0-release/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

