12473

Posts

Jul, 11

Accelerating Preconditioned Iterative Linear Solvers on GPU

Linear systems are required to solve in many scientific applications and the solution of these systems often dominates the total running time. In this paper, we introduce our work on developing parallel linear solvers and preconditioners for solving large sparse linear systems using NVIDIA GPUs. We develop a new sparse matrix-vector multiplication kernel and a […]
Jul, 11

A Hybrid Parallel Implementation of the Aho-Corasick and Wu-Manber Algorithms Using NVIDIA CUDA and MPI Evaluated on a Biological Sequence Database

Multiple matching algorithms are used to locate the occurrences of patterns from a finite pattern set in a large input string. Aho-Corasick and Wu-Manber, two of the most well known algorithms for multiple matching require an increased computing power, particularly in cases where large-size datasets must be processed, as is common in computational biology applications. […]
Jul, 11

Parallelization of BFS Graph Algorithm using CUDA

Graphs play a very important role in the field of Science and Technology for finding the shortest distance between any two places. This Paper demonstrate the recent technology named as CUDA (Compute Unified Device Architecture) working for BFS Graph Algorithm. There are some Graph algorithms are fundamental to many disciplines and application areas. Large graphs […]
Jul, 11

Algorithms and Data Structures for Interactive Ray Tracing on Commodity Hardware

Rendering methods based on ray tracing provide high image realism, but have been historically regarded as offline only. This has changed in the past decade, due to significant advances in the construction and traversal performance of acceleration structures and the efficient use of data-parallel processing. Today, all major graphics companies offer real-time ray tracing solutions. […]
Jul, 11

Hybrid Particle Lattice Boltzmann Shallow Water for interactive fluid simulations

We introduce a hybrid approach for the simulation of fluids based in the Lattice Boltzmann Method for Shallow Waters and particle systems. Our modified LBM Shallow Waters can handle arbitrary underlying terrain and arbitrary fluid depth. It also introduces a novel and simplified method of tracking dry-wet regions. Dynamic rigid bodies are also included in […]
Jul, 11

Visualization and Correction of Automated Segmentation, Tracking and Lineaging from 5-D Stem Cell Image Sequences

RESULTS: We present an application that enables the quantitative analysis of multichannel 5-D (x, y, z, t, channel) and large montage confocal fluorescence microscopy images. The image sequences show stem cells together with blood vessels, enabling quantification of the dynamic behaviors of stem cells in relation to their vascular niche, with applications in developmental and […]
Jul, 11

Visualization of Large Volumetric Multi-Channel Microscopy Data Streams on Standard PCs

BACKGROUND: Visualization of multi-channel microscopy data plays a vital role in biological research. With the ever-increasing resolution of modern microscopes the data set size of the scanned specimen grows steadily. On commodity hardware this size easily exceeds the available main memory and the even more limited GPU memory. Common volume rendering techniques require the entire […]
Jul, 10

Improving Performance and Energy Consumption of Runtime Schedulers for Dense Linear Algebra

The road towards Exascale Computing requires a holistic effort to address three different challenges simultaneously: high performance, energy efficiency, and programmability. The use of runtime task schedulers to orchestrate parallel executions with minimal developer intervention has been introduced in recent years to tackle the programmability issue while maintaining, or even improving, performance. In this paper, […]
Jul, 10

COFFEE: an Optimizing Compiler for Finite Element Local Assembly

The numerical solution of partial differential equations using the finite element method is one of the key applications of high performance computing. Local assembly is its characteristic operation. This entails the execution of a problem-specific kernel to numerically evaluate an integral for each element in the discretized problem domain. Since the domain size can be […]
Jul, 10

Random Fields Generation on the GPU with the Spectral Turning Bands Method

Random Field (RF) generation algorithms are of paramount importance for many scientific domains, such as astrophysics, geostatistics, computer graphics and many others. Some examples are the generation of initial conditions for cosmological simulations or hydrodynamical turbulence driving. In the latter a new random field is needed every time-step. Current approaches commonly make use of 3D […]
Jul, 10

GPU Accelerated Interactive Hybrid Collision Detection in Virtual Disassembly

Previous collision detection methods for virtual disassembly mainly detect collisions at discrete time interval, and use oriented bounding boxes to speedup the process. However, these discrete methods cannot guarantee no penetration occurs as the components move. Meanwhile, because some of the components are embedded into each other, these components cannot be separated in the following […]
Jul, 10

Understanding the SIMD Efficiency of Graph Traversal on GPU

Graph is a widely used data structure and graph algorithms, such as breadth-first search (BFS), are regarded as key components in a great number of applications. Recent studies have attempted to accelerate graph algorithms on highly parallel graphics processing unit (GPU). Although many graph algorithms based on large graphs exhibit abundant parallelism, their performance on […]
Page 5 of 738« First...34567...102030...Last »

* * *

* * *

Like us on Facebook

HGPU group

128 people like HGPU on Facebook

Follow us on Twitter

HGPU group

1189 peoples are following HGPU @twitter

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: AMD APP SDK 2.9
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.2
  • SDK: nVidia CUDA Toolkit 6.0.1, AMD APP SDK 2.9

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2014 hgpu.org

All rights belong to the respective authors

Contact us: