high performance computing on graphics processing units: hgpu.org

Posts

Jun, 13

GPU-accelerated Computation for Statistical Analysis of the Next-Generation Sequencing Data

The next-generation sequencing technologies are pouring big data and pushing the frontier of life sciences toward new territories that were never imagined before. However, such big data impose great computational challenges to statistical analysis of these data. It is important to utilize Graphics Processing Unit (GPU)’s large throughput and massive parallelism to process large data […]

CUDA

Jun, 13

Performance Improvement of Data Mining in Weka through GPU Acceleration

Data mining tools may be computationally demanding, so there is an increasing interest on parallel computing strategies to improve their performance. The popularization of Graphics Processing Units (GPUs) increased the computing power of current desktop computers, but desktop-based data mining tools do not usually take full advantage of these architectures. This paper exploits an approach […]

CUDA

Jun, 13

Secure Distributed Computing on a Manycore Cloud

Computation outsourcing is an increasingly successful paradigm today. Private and public organizations, as well as common users, can access a large number of economically viable resources to perform the desired computations or access data. The cloud approach allows outsourcers to offer on-demand scalable services to third parties or to perform large computations without high server […]

CUDA

Jun, 12

2nd Int. Conf. on Information Networking and Automation ICINA-II 2014

Publication: All papers, both invited and contributed, will be reviewed by two or three experts from the PC. After a careful reviewing process, all accepted paper will be publishedin WIT Transactions on Information and Communication Technologies (ISSN: 1743-3517), which will be indexed by EI Compendex, Scopus and ISI. Topics (not limited to): ■ Modern and […]

Jun, 12

GPPE: a GPU-based Parallel Processing Environment for Large Scale Concurrent Data Streams

With Extensive use of wireless sensor network is drawing increasing attention to the research on data-driven processing but it is a challenge to construct a system of concurrent processing for large-scale data streams (LCDS), a typical model of data-driven process. As Graphic Processing Unit (GPU) has good characteristics of SPMD (Single Program Multiple Data) while […]

CUDA

Jun, 12

Multi-threaded Kernel Offloading to GPGPU Using Hyper-Q on Kepler Architecture

Small-scale computations usually cannot fully utilize the compute capabilities of modern GPGPUs. With the Fermi GPU architecture Nvidia introduced the concurrent kernel execution feature allowing up to 16 GPU kernels to execute simultaneously on a shared GPU device for a better utilization of the respective resources. Insufficient scheduling capabilities in this respect, however, can significantly […]

CUDA

Jun, 12

A Fast Batched Cholesky Factorization on a GPU

Currently, state of the art libraries, like MAGMA, focus on very large linear algebra problems, while solving many small independent problems, which is usually referred to as batched problems, is not given adequate attention. In this paper, we proposed a batched Cholesky factorization on a GPU. Three algorithms – nonblocked, blocked, and recursive blocked – […]

CUDA

Jun, 12

Optimization Techniques on GPU: A Survey

In this paper, we present a comprehensive survey on parallelizing computations involved in optimization problem, on GPU using CUDA. Many researchers have reported significant speedup using CUDA on GPU. Stochastic algorithms, Metaheuristic algorithms and Heuristic algorithms i.e., Mixed Integer Non-linear Programming (MINLP), Central Force Optimization (CFO), Genetic Algorithms (GA), Particle Swarm Optimization (PSO), etc. are […]

CUDA

Jun, 12

A GPU-accelerated immersive audio-visual framework for interaction with molecular dynamics using consumer depth sensors

With advances in computational power, the rapidly growing role of computational/simulation methodologies in the physical sciences, and the development of new human–computer interaction technologies, the field of interactive molecular dynamics seems destined to expand. In this paper, we describe and benchmark the software algorithms and hardware setup for carrying out interactive molecular dynamics utilizing an […]

OpenCL

Jun, 11

Parallel Prefix Scan with Compute Unified Device Architecture (CUDA)

Parallel prefix scan, also known as parallel prefix sum, is a building block for many parallel algorithms including polynomial evaluation, sorting and building data structures. This paper introduces prefix scan and also describes a step-by-step procedure to implement prefix scan efficiently with Compute Unified Device Architecture (CUDA). This paper starts with a basic naive algorithm […]

CUDA

Jun, 11

Intersecting two families of sets on the GPU

The problem of intersecting two families of sets F and F’ is to find the family I of all the sets which are the intersection of some set in F and some other set in F’. In this paper we present an efficient parallel GPU-based approach, designed under CUDA architecture, to solve the problem. The […]

CUDA

Jun, 11

GPU Implementation of Gaussian Processes

Gaussian process models (henceforth Gaussian Processes) provide a probabilistic, non-parametric framework for inferring posterior distributions over functions from general prior information and observed noisy function values. This, however, comes with a computational burden of O(N3) for training and O(N2) for prediction, where N is the size of the training set [1]. Therefore, this method does […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

GPU-accelerated Computation for Statistical Analysis of the Next-Generation Sequencing Data

Performance Improvement of Data Mining in Weka through GPU Acceleration

Secure Distributed Computing on a Manycore Cloud

2nd Int. Conf. on Information Networking and Automation ICINA-II 2014

GPPE: a GPU-based Parallel Processing Environment for Large Scale Concurrent Data Streams

Multi-threaded Kernel Offloading to GPGPU Using Hyper-Q on Kepler Architecture

A Fast Batched Cholesky Factorization on a GPU

Optimization Techniques on GPU: A Survey

A GPU-accelerated immersive audio-visual framework for interaction with molecular dynamics using consumer depth sensors

Parallel Prefix Scan with Compute Unified Device Architecture (CUDA)

Intersecting two families of sets on the GPU

GPU Implementation of Gaussian Processes

Recent source codes

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

KISim: Kubernetes Intelligent Scheduling Simulator

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

Most viewed papers (last 30 days)