<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>GPGPU &#187; Tag: NVIDIA CUDA :: GPGPU.org</title>
	<atom:link href="http://gpgpu.org/tag/nvidia-cuda/feed" rel="self" type="application/rss+xml" />
	<link>http://gpgpu.org</link>
	<description>General-Purpose Computation on Graphics Hardware</description>
	<lastBuildDate>Tue, 22 May 2012 08:44:05 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Panoptes:  A Binary Translation Framework for CUDA</title>
		<link>http://gpgpu.org/2012/05/22/panoptes-a-binary-translation-framework-for-cuda</link>
		<comments>http://gpgpu.org/2012/05/22/panoptes-a-binary-translation-framework-for-cuda#comments</comments>
		<pubDate>Tue, 22 May 2012 08:44:05 +0000</pubDate>
		<dc:creator>dom</dc:creator>
				<category><![CDATA[Developer Resources]]></category>
		<category><![CDATA[Debugging]]></category>
		<category><![CDATA[NVIDIA CUDA]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Profiling]]></category>
		<category><![CDATA[Tools]]></category>

		<guid isPermaLink="false">http://gpgpu.org/?p=4724</guid>
		<description><![CDATA[Traditional CPU-based computing environments offer a variety of binary instrumentation frameworks. Instrumentation and analysis tools for GPU environments to date have been more limited. Panoptes is a binary instrumentation framework for CUDA that targets the GPU. By exploiting the GPU to run modified kernels, computationally-intensive programs can be run at the native parallelism of the [...]]]></description>
			<content:encoded><![CDATA[<p>Traditional CPU-based computing environments offer a variety of binary instrumentation frameworks. Instrumentation and analysis tools for GPU environments to date have been more limited. <a title="source code" href="http://github.com/ckennelly/panoptes" target="_blank">Panoptes</a> is a binary instrumentation framework for CUDA that targets the GPU. By exploiting the GPU to run modified kernels, computationally-intensive programs can be run at the native parallelism of the device during analysis. To demonstrate its instrumentation capabilities, we currently implement a memory addressability and validity checker that targets CUDA programs.</p>
<p>Panoptes traces targeted programs by library interposition at runtime. Interactions with the GPU are intercepted, annotated as necessary, and are then sent to the actual CUDA library for execution on the device. This approach gives an analysis tool built on Panoptes a complete view of the state of the GPU without additional developer effort. In contrast, developer-added instrumentation may be incomplete due to errors of omission or cause maintenance difficulties, particularly for large code bases.</p>
<p>By directing annotated instructions to the GPU for execution rather than relying on the host for emulation, Panoptes is able to analyze programs at scale. The rift in parallel execution capabilities between modern GPUs and CPUs carries into testing and debugging as well. For computationally intensive tasks brought to the GPU explicitly for its parallelism, resorting to host-based emulation may necessitate reduced or simplified inputs for analysis. More details: <a title="source code" href="http://github.com/ckennelly/panoptes" target="_blank">http://github.com/ckennelly/panoptes</a></p>
]]></content:encoded>
			<wfw:commentRss>http://gpgpu.org/2012/05/22/panoptes-a-binary-translation-framework-for-cuda/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>NVIDIA Kepler GK110 Architecture White Paper</title>
		<link>http://gpgpu.org/2012/05/20/nvidia-kepler-gk110-paper</link>
		<comments>http://gpgpu.org/2012/05/20/nvidia-kepler-gk110-paper#comments</comments>
		<pubDate>Mon, 21 May 2012 00:35:21 +0000</pubDate>
		<dc:creator>Mark Harris</dc:creator>
				<category><![CDATA[Business]]></category>
		<category><![CDATA[Developer Resources]]></category>
		<category><![CDATA[GPUs]]></category>
		<category><![CDATA[NVIDIA CUDA]]></category>

		<guid isPermaLink="false">http://gpgpu.org/?p=4718</guid>
		<description><![CDATA[This white paper describes the new Kepler  GK110 Architecture from NVIDIA. Comprising 7.1 billion transistors, Kepler GK110 is not only the fastest, but also the most architecturally complex microprocessor ever built. Adding many new innovative features focused on compute performance, GK110 was designed to be a parallel processing powerhouse for Tesla® and the HPC market. Kepler GK110 will provide over [...]]]></description>
			<content:encoded><![CDATA[<div id="attachment_4719" class="wp-caption alignright" style="width: 160px"><a href="http://gpgpu.org/wp/wp-content/uploads/2012/05/nvidia_kepler2_die_shot.jpg"><img class="size-thumbnail wp-image-4719" title="nvidia_kepler2_die_shot" src="http://gpgpu.org/wp/wp-content/uploads/2012/05/nvidia_kepler2_die_shot-150x150.jpg" alt="" width="150" height="150" /></a><p class="wp-caption-text">NVIDIA Kepler GK110 Die Shot</p></div>
<p>This <a title="NVIDIA Kepler GK110 White Paper" href="http://www.nvidia.com/content/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf" target="_blank">white paper</a> describes the new Kepler  GK110 Architecture from NVIDIA.</p>
<blockquote><p>Comprising 7.1 billion transistors, Kepler GK110 is not only the fastest, but also the most architecturally complex microprocessor ever built. Adding many new innovative features focused on compute performance, GK110 was designed to be a parallel processing powerhouse for Tesla® and the HPC market.</p>
<p>Kepler GK110 will provide over 1 TFlop of double precision throughput with greater than 80% DGEMM efficiency versus 60‐65% on the prior Fermi architecture.</p>
<p>In addition to greatly improved performance, the Kepler architecture offers a huge leap forward in power efficiency, delivering up to 3x the performance per watt of Fermi.</p></blockquote>
<p>The paper describes features of the Kepler GK110 architecture, including</p>
<ul>
<li>Dynamic Parallelism;</li>
<li>Hyper-Q;</li>
<li>Grid Management Unit;</li>
<li>NVIDIA GPUDirect™;</li>
<li>New SHFL instruction and atomic instruction enhancements;</li>
<li>New read-only data cache previously only accessible to texture;</li>
<li>Bindless Textures;</li>
<li>and much more.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://gpgpu.org/2012/05/20/nvidia-kepler-gk110-paper/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>CUSHAW: a CUDA compatible short read aligner to large genomes based on the Burrows-Wheeler transform</title>
		<link>http://gpgpu.org/2012/05/11/cushaw</link>
		<comments>http://gpgpu.org/2012/05/11/cushaw#comments</comments>
		<pubDate>Fri, 11 May 2012 07:25:34 +0000</pubDate>
		<dc:creator>dom</dc:creator>
				<category><![CDATA[Research]]></category>
		<category><![CDATA[Bioinformatics]]></category>
		<category><![CDATA[NVIDIA CUDA]]></category>
		<category><![CDATA[Papers]]></category>
		<category><![CDATA[Sequence Alignment]]></category>

		<guid isPermaLink="false">http://gpgpu.org/?p=4689</guid>
		<description><![CDATA[Abstract: Motivation: New high-throughput sequencing technologies have promoted the production of short reads with dramatically low unit cost. The explosive growth of short read datasets poses a challenge to the mapping of short reads to reference genomes, such as the human genome, in terms of alignment quality and execution speed. Results: We present CUSHAW, a [...]]]></description>
			<content:encoded><![CDATA[<p>Abstract:</p>
<blockquote><p>Motivation: New high-throughput sequencing technologies have promoted the production of short reads with dramatically low unit cost. The explosive growth of short read datasets poses a challenge to the mapping of short reads to reference genomes, such as the human genome, in terms of alignment quality and execution speed.</p>
<p>Results: We present CUSHAW, a parallelized short read aligner based on the compute unified device architecture (CUDA) parallel programming model. We exploit CUDA-compatible graphics hardware as accelerators to achieve fast speed. Our algorithm employs a quality-aware bounded search approach based on the Burrows- Wheeler transform (BWT) and the Ferragina Manzini (FM)-index to reduce the search space and achieve high alignment quality. Performance evaluation, using simulated as well as real short read datasets, reveals that our algorithm running on one or two graphics processing units (GPUs) achieves significant speedups in terms of execution time, while yielding comparable or even better alignment quality for paired-end alignments compared to three popular BWT-based aligners: Bowtie, BWA and SOAP2. CUSHAW also delivers competitive performance in terms of SNP calling for an E.coli test dataset.</p>
<p>Availability: <a title="source code link" href="http://cushaw.sourceforge.net" target="_blank">http://cushaw.sourceforge.net</a>.</p></blockquote>
<p>(Y. Liu, B. Schmidt, D. Maskell: <em>&#8220;CUSHAW: a CUDA compatible short read aligner to large genomes based on the Burrows-Wheeler transform&#8221;</em>, Bioinformatics, 2012. [<a title="link to publication" href="http://dx.doiorg/10.1093/bioinformatics/bts276" target="_blank">DOI</a>])</p>
]]></content:encoded>
			<wfw:commentRss>http://gpgpu.org/2012/05/11/cushaw/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Acceleware CUDA™ Training &#8211; Life Science Focus</title>
		<link>http://gpgpu.org/2012/05/02/acceleware-cuda-training-life-science-focus</link>
		<comments>http://gpgpu.org/2012/05/02/acceleware-cuda-training-life-science-focus#comments</comments>
		<pubDate>Wed, 02 May 2012 06:50:37 +0000</pubDate>
		<dc:creator>dom</dc:creator>
				<category><![CDATA[Events]]></category>
		<category><![CDATA[Life Sciences]]></category>
		<category><![CDATA[NVIDIA CUDA]]></category>
		<category><![CDATA[Training]]></category>
		<category><![CDATA[Tutorials & Courses]]></category>

		<guid isPermaLink="false">http://gpgpu.org/?p=4679</guid>
		<description><![CDATA[Partnering with NVIDIA and Microsoft, this four day CUDA training course is designed for Researchers and Programmers in the life science industries who are looking to develop comprehensive skills in writing and optimizing applications that fully leverage the many-core processing capabilities of the GPU. It is held in Boston, MA, on June 4-7, 2012. This [...]]]></description>
			<content:encoded><![CDATA[<p>Partnering with NVIDIA and Microsoft, this four day CUDA training course is designed for Researchers and Programmers in the life science industries who are looking to develop comprehensive skills in writing and optimizing applications that fully leverage the many-core processing capabilities of the GPU. It is held in Boston, MA, on June 4-7, 2012. This course will have a life science theme. Commonly used algorithms such as Monte Carlo methods, FFT and filtering will be used and profiled in examples. The case study on day 4 focuses on the efficient implementation of a molecular dynamics simulation. More information: <a title="Link to course information" href="http://www.acceleware.com/jun4boston" target="_blank">http://www.acceleware.com/jun4boston</a></p>
]]></content:encoded>
			<wfw:commentRss>http://gpgpu.org/2012/05/02/acceleware-cuda-training-life-science-focus/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>2 Day CUDA Workshop, May 5-6 2012, Berlin, Germany</title>
		<link>http://gpgpu.org/2012/04/21/cuda-berlin-may-workshop</link>
		<comments>http://gpgpu.org/2012/04/21/cuda-berlin-may-workshop#comments</comments>
		<pubDate>Sat, 21 Apr 2012 08:59:29 +0000</pubDate>
		<dc:creator>dom</dc:creator>
				<category><![CDATA[Business]]></category>
		<category><![CDATA[Events]]></category>
		<category><![CDATA[Courses]]></category>
		<category><![CDATA[NVIDIA CUDA]]></category>

		<guid isPermaLink="false">http://gpgpu.org/?p=4647</guid>
		<description><![CDATA[A 2 day CUDA workshop is taking place in Berlin, Germany on May 5 and 6 2012. Course details, outline and prices are available at http://cuda.eventbrite.com.]]></description>
			<content:encoded><![CDATA[<p>A 2 day CUDA workshop is taking place in Berlin, Germany on May 5 and 6 2012. Course details, outline and prices are available at <a title="course website" href="http://cuda.eventbrite.com/" target="_blank">http://cuda.eventbrite.com</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://gpgpu.org/2012/04/21/cuda-berlin-may-workshop/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>New rCUDA version beta testing</title>
		<link>http://gpgpu.org/2012/04/18/rcuda</link>
		<comments>http://gpgpu.org/2012/04/18/rcuda#comments</comments>
		<pubDate>Wed, 18 Apr 2012 06:05:22 +0000</pubDate>
		<dc:creator>dom</dc:creator>
				<category><![CDATA[Developer Resources]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Libraries]]></category>
		<category><![CDATA[Multi-GPU]]></category>
		<category><![CDATA[NVIDIA CUDA]]></category>
		<category><![CDATA[Programming Environments]]></category>
		<category><![CDATA[Tools]]></category>

		<guid isPermaLink="false">http://gpgpu.org/?p=4640</guid>
		<description><![CDATA[The rCUDA Team is proud to announce a new version of the rCUDA framework which will include many new functionalities as well as boosted performance. This new version, cooked for over a year, will incorporate pipelined transfers, full multi-thread and multi-node capabilities, CUDA 4.1 support, global scheduler integration, support for CUDA C extensions, and native [...]]]></description>
			<content:encoded><![CDATA[<p>The rCUDA Team is proud to announce a new version of the rCUDA framework which will include many new functionalities as well as boosted performance. This new version, cooked for over a year, will incorporate pipelined transfers, full multi-thread and multi-node capabilities, CUDA 4.1 support, global scheduler integration, support for CUDA C extensions, and native InfiniBand support. A closed beta teting program has been started. See the complete text at <a href="http://www.rcuda.net/index.php/news/19-new-revolutionary-version-of-rcuda-to-be-launched.html" target="_blank">http://www.rcuda.net/index.php/news/19-new-revolutionary-version-of-rcuda-to-be-launched.html</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://gpgpu.org/2012/04/18/rcuda/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Scalable GPU graph traversal</title>
		<link>http://gpgpu.org/2012/04/17/scalable-gpu-graph-traversal</link>
		<comments>http://gpgpu.org/2012/04/17/scalable-gpu-graph-traversal#comments</comments>
		<pubDate>Tue, 17 Apr 2012 16:19:41 +0000</pubDate>
		<dc:creator>dom</dc:creator>
				<category><![CDATA[Research]]></category>
		<category><![CDATA[Graph Algorithms]]></category>
		<category><![CDATA[NVIDIA CUDA]]></category>
		<category><![CDATA[Papers]]></category>
		<category><![CDATA[Parallel Algorithms]]></category>

		<guid isPermaLink="false">http://gpgpu.org/?p=4634</guid>
		<description><![CDATA[Abstract: Breadth-first search (BFS) is a core primitive for graph traversal and a basis for many higher-level graph analysis algorithms. It is also representative of a class of parallel computations whose memory accesses and work distribution are both irregular and data-dependent. Recent work has demonstrated the plausibility of GPU sparse graph traversal, but has tended [...]]]></description>
			<content:encoded><![CDATA[<p>Abstract:</p>
<blockquote><p>Breadth-first search (BFS) is a core primitive for graph traversal and a basis for many higher-level graph analysis algorithms. It is also representative of a class of parallel computations whose memory accesses and work distribution are both irregular and data-dependent. Recent work has demonstrated the plausibility of GPU sparse graph traversal, but has tended to focus on asymptotically inefficient algorithms that perform poorly on graphs with non-trivial diameter.</p>
<p>We present a BFS parallelization focused on fine-grained task management constructed from efficient prefix sum that achieves an asymptotically optimal O(|V|+|E|) work complexity. Our implementation delivers excellent performance on diverse graphs, achieving traversal rates in excess of 3.3 billion and 8.3 billion traversed edges per second using single and quad-GPU configurations, respectively. This level of performance is several times faster than state-of-the-art implementations both CPU and GPU platforms.</p></blockquote>
<p>(Duane Merrill, Michael Garland and  Andrew Grimshaw: <em>&#8220;Scalable GPU graph traversal&#8221;</em>, Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming (PPoPP&#8217;12), pp.117-128, Feburary 2012. [<a title="DOI link to the paper" href="http://dx.doi.org/10.1145/2145816.2145832" target="_blank">DOI</a>])</p>
]]></content:encoded>
			<wfw:commentRss>http://gpgpu.org/2012/04/17/scalable-gpu-graph-traversal/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Acceleware 4 Day CUDA™ Course, Calgary</title>
		<link>http://gpgpu.org/2012/04/17/acceleware-4-day-cuda-course-calgary</link>
		<comments>http://gpgpu.org/2012/04/17/acceleware-4-day-cuda-course-calgary#comments</comments>
		<pubDate>Tue, 17 Apr 2012 16:14:44 +0000</pubDate>
		<dc:creator>dom</dc:creator>
				<category><![CDATA[Events]]></category>
		<category><![CDATA[NVIDIA CUDA]]></category>
		<category><![CDATA[Training]]></category>
		<category><![CDATA[Tutorials & Courses]]></category>

		<guid isPermaLink="false">http://gpgpu.org/?p=4624</guid>
		<description><![CDATA[Partnering with NVIDIA, this four day course (May 8-11, 2012) is designed for Programmers who are looking to develop comprehensive skills in writing and optimizing applications that fully leverage the multi-core processing capabilities of the GPU. Delivered by Acceleware Developers, who provide real world experience and examples, the training comprises of classroom lectures and hands-on [...]]]></description>
			<content:encoded><![CDATA[<p>Partnering with NVIDIA, this four day course (May 8-11, 2012) is designed for Programmers who are looking to develop comprehensive skills in writing and optimizing applications that fully leverage the multi-core processing capabilities of the GPU.</p>
<p>Delivered by Acceleware Developers, who provide real world experience and examples, the training comprises of classroom lectures and hands-on tutorials. Each student will be supplied with a laptop equipped with NVIDIA GPUs for the duration of the course. Small class sizes maximize learning and ensure a personal educational experience.</p>
<p>More information: <a title="acceleware announcement" href="http://www.acceleware.com/may8calgary" target="_blank">http://www.acceleware.com/may8calgary</a></p>
]]></content:encoded>
			<wfw:commentRss>http://gpgpu.org/2012/04/17/acceleware-4-day-cuda-course-calgary/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Wall Orientation and Shear Stress in the Lattice Boltzmann Model</title>
		<link>http://gpgpu.org/2012/03/16/wall-orientation-and-shear-stress-in-the-lattice-boltzmann-model</link>
		<comments>http://gpgpu.org/2012/03/16/wall-orientation-and-shear-stress-in-the-lattice-boltzmann-model#comments</comments>
		<pubDate>Fri, 16 Mar 2012 06:15:12 +0000</pubDate>
		<dc:creator>dom</dc:creator>
				<category><![CDATA[Research]]></category>
		<category><![CDATA[Fluid Simulation]]></category>
		<category><![CDATA[Hemodynamics]]></category>
		<category><![CDATA[Lattice Boltzmann Method]]></category>
		<category><![CDATA[NVIDIA CUDA]]></category>
		<category><![CDATA[Papers]]></category>

		<guid isPermaLink="false">http://gpgpu.org/?p=4579</guid>
		<description><![CDATA[Abstract: The wall shear stress is a quantity of profound importance for clinical diagnosis of artery diseases. The lattice Boltzmann is an easily parallelizable numerical method of solving the flow problems, but it suffers from errors of the velocity field near the boundaries which leads to errors in the wall shear stress and normal vectors [...]]]></description>
			<content:encoded><![CDATA[<p>Abstract:</p>
<blockquote><p>The wall shear stress is a quantity of profound importance for clinical diagnosis of artery diseases. The lattice Boltzmann is an easily parallelizable numerical method of solving the flow problems, but it suffers from errors of the velocity field near the boundaries which leads to errors in the wall shear stress and normal vectors computed from the velocity. In this work we present a simple formula to calculate the wall shear stress in the lattice Boltzmann model and propose to compute wall normals, which are necessary to compute the wall shear stress, by taking the weighted mean over boundary facets lying in a vicinity of a wall element. We carry out several tests and observe an increase of accuracy of computed normal vectors over other methods in two and three dimensions. Using the scheme we compute the wall shear stress in an inclined and bent channel fluid flow and show a minor influence of the normal on the numerical error, implying that that the main error arises due to a corrupted velocity field near the staircase boundary. Finally, we calculate the wall shear stress in the human abdominal aorta in steady conditions using our method and compare the results with a standard finite volume solver and experimental data available in the literature. Applications of our ideas in a simplified protocol for data preprocessing in medical applications are discussed.</p></blockquote>
<p>(Maciej Matyka, Zbigniew Koza, Łukasz Mirosław: <em>&#8220;Wall Orientation and Shear Stress in the Lattice Boltzmann Model&#8221;</em>, Preprint, 2012. [<a title="Link to paper on arXiv.org" href="http://arxiv.org/abs/1203.3078v1" target="_blank">arXiv</a>])</p>
]]></content:encoded>
			<wfw:commentRss>http://gpgpu.org/2012/03/16/wall-orientation-and-shear-stress-in-the-lattice-boltzmann-model/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Compressed Multiple-Row Storage Format</title>
		<link>http://gpgpu.org/2012/03/16/compressed-multiple-row-storage-format</link>
		<comments>http://gpgpu.org/2012/03/16/compressed-multiple-row-storage-format#comments</comments>
		<pubDate>Fri, 16 Mar 2012 06:11:39 +0000</pubDate>
		<dc:creator>dom</dc:creator>
				<category><![CDATA[Research]]></category>
		<category><![CDATA[NVIDIA CUDA]]></category>
		<category><![CDATA[Papers]]></category>
		<category><![CDATA[Sparse Linear Systems]]></category>

		<guid isPermaLink="false">http://gpgpu.org/?p=4578</guid>
		<description><![CDATA[Abstract: A new format for storing sparse matrices is proposed for efficient sparse matrix-vector (SpMV) product calculation on modern throughput-oriented computer architectures. This format extends the standard compressed row storage (CRS) format and is easily convertible to and from it without any memory overhead. Computational performance of an SpMV kernel for the new format is [...]]]></description>
			<content:encoded><![CDATA[<p>Abstract:</p>
<blockquote><p>A new format for storing sparse matrices is proposed for efficient sparse matrix-vector (SpMV) product calculation on modern throughput-oriented computer architectures. This format extends the standard compressed row storage (CRS) format and is easily convertible to and from it without any memory overhead. Computational performance of an SpMV kernel for the new format is determined for over 140 sparse matrices on two Fermi-class graphics processing units (GPUs) and the efficiency of the kernel, which peaks at 36 and 25 GFLOPS at single and double precision, respectively, is compared with that of five existing generic algorithms and industrial implementations. The efficiency of the new format is also measured as a function of the mean (mu) and of the standard deviation (sigma) of the number of matrix nonzero elements per row. The largest speedup is found for matrices with mu &gt; 20 and mu &gt; sigma &gt; 1.5 and can be as high as 43%.</p></blockquote>
<p>(Zbigniew Koza, Maciej Matyka, Sebastian Szkoda, Łukasz Mirosław: <em>&#8220;Compressed Multiple-Row Storage Format&#8221;</em>, Preprint, 2012. [<a title="Link to paper on arXiv.org" href="http://arxiv.org/abs/1203.2946" target="_blank">arXiv</a>])</p>
]]></content:encoded>
			<wfw:commentRss>http://gpgpu.org/2012/03/16/compressed-multiple-row-storage-format/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

