12824
Wen-Mei Hwu
Histogramming is a technique by which input datasets are mined to extract features and patterns. Histograms have wide range of uses in computer vision, machine learning, database processing, quality control for manufacturing, and many applications benefit from advance knowledge about the distribution of data. Computing a histogram is, essentially, the antithesis of parallel processing. Without […]
View View   Download Download (PDF)   
Ursula Reiterer
Clustering is a basic task in exploratory data analysis. It is used to partition elements of a set into disjoint groups, so-called clusters, such that elements within a group are similar to each other, but dissimilar to elements of other groups. Several clustering algorithms exist, which can be applied depending on the type of dataset […]
View View   Download Download (PDF)   
Sebastian Mayr
Ray tracing denotes a class of rendering algorithms that are well-known for their flexibility and their capability of generating highly realistic images of three dimensional models. However, due to the heavy computational requirements, it has traditionally been used for offline rendering. Improving the performance of ray tracing has been an active area of research and […]
View View   Download Download (PDF)   
Andreas Hormandinger
This thesis focuses on the use of automatic code generation to combine different classes of optimizations to find the best optimization for parallel reduction in OpenCL on various devices. It also introduces the optimizations used. In the end the results of the combinations will be evaluated and discussed.
View View   Download Download (PDF)   
Pedro Alonso, Murilo Boratto, J. Peinado, J. Ibanez, Jorge Sastre
Computing a matrix polynomial is the basic process in the calculation of functions of matrices by the Taylor method. One of the most efficient techniques for computing matrix polynomials is based on the Paterson-Stockmeyer method. Inspired by this method, we propose in this work a recursive algorithm and an efficient implementation that exploit the heterogeneous […]
View View   Download Download (PDF)   
Bin Ren
SIMD accelerators and many-core coprocessors with coarse-grained and fine-grained level parallelism, become more and more popular. Streaming SIMD Extensions (SSE), Graphics Processing Unit (GPU), and Intel Xeon Phi (MIC) can provide orders of magnitude better performance and efficiency for parallel workloads as compared to single core CPUs. However, parallelizing irregular applications involving dynamic data structures […]
View View   Download Download (PDF)   
Matthaus Wander, Lorenz Schwittmann, Christopher Boelmann, Torben Weis
When a client queries for a non-existent name in the Domain Name System (DNS), the server responds with a negative answer. With the DNS Security Extensions (DNSSEC), the server can either use NSEC or NSEC3 for authenticated negative answers. NSEC3 claims to protect DNSSEC servers against domain enumeration, but incurs significant CPU and bandwidth overhead. […]
Sudipta Chattopadhyay, Petru Eles, Zebo Peng
Embedded and real-time software is often constrained by several temporal requirements. Therefore, it is important to design embedded software that meets the required performance goal. The inception of embedded graphics processing units (GPUs) brings fresh hope in developing high-performance embedded software which were previously not suitable for embedded platforms. Whereas GPUs use massive parallelism to […]
View View   Download Download (PDF)   
Yuan Wen, Zheng Wang, Michael F.P. O'Boyle
Heterogeneous systems consisting of multiple CPUs and GPUs are increasingly attractive as platforms for high performance computing. Such platforms are usually programmed using OpenCL which provides program portability by allowing the same program to execute on different types of device. As such systems become more mainstream, they will move from application dedicated devices to platforms […]
View View   Download Download (PDF)   
Mathias Bourgoin, Emmanuel Chailloux
We present WebSpoc, an OCaml GPGPU library targeting web applications that is built upon SPOC and js_of_ocaml. SPOC is an OCaml GPGPU library focusing on abstracting memory transfers, handling GPGPU computations and offering easy portability. Js_of_ocaml is the OCaml byte-code to JavaScript compiler. Thus, WebSpoc provides high performance computations from the web browser while benefiting […]
View View   Download Download (PDF)   
Alastair F. Donaldson
I present a tutorial overview demonstrating the key technique used by GPUVerify, a static verification tool for graphics processing unit (GPU) kernels. The technique is a method for translating a massively parallel GPU kernel into a sequential program such that correctness of the sequential program implies data race-freedom of the parallel kernel.
Michael Gowanlock, Henri Casanova
The processing of moving object trajectories arises in many application domains. We focus on a trajectory similarity search, the distance threshold search, which finds all trajectories within a given distance of a query trajectory over a time interval. A multithreaded CPU implementation that makes use of an in-memory R-tree index can achieve high parallel efficiency. […]
View View   Download Download (PDF)   
Page 1 of 47812345...102030...Last »

* * *

* * *

Like us on Facebook

HGPU group

149 people like HGPU on Facebook

Follow us on Twitter

HGPU group

1239 peoples are following HGPU @twitter

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: AMD APP SDK 2.9
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.2
  • SDK: nVidia CUDA Toolkit 6.0.1, AMD APP SDK 2.9

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2014 hgpu.org

All rights belong to the respective authors

Contact us: