high performance computing on graphics processing units: hgpu.org

Posts

Feb, 17

Simulations of Large Membrane Regions using GPU-enabled Computations – Preliminary Results

In this short paper we present a GPU code for MD simulations of large membrane regions in the NVT and NVE ensembles with explicit solvent. We give an overview of the code and present preliminary performance results.

CUDA

Feb, 17

Dynamically scheduled Cholesky factorization on multicore architectures with GPU accelerators

Although the hardware has dramatically changed in the last few years, nodes of multicore chips augmented by Graphics Processing Units (GPUs) seem to be a trend of major importance. Previous approaches for scheduling dense linear operations on such a complex node led to high performance but at the double cost of not using the potential […]

CUDA

Feb, 17

A Strategy for Automatically Generating High Performance CUDA Code for a GPU Accelerator from a Specialized Fortran Code Expression

Recent microprocessor designs concentrate upon adding cores rather than increasing clock speeds in order to achieve enhanced performance. As a result, in the last few years computational accelerators featuring many cores per chip have begun to appear in high performance scientific computing systems. The IBM Cell processor, with its 9 heterogeneous cores, was the first […]

CUDA

Feb, 17

Accelerating Algorithms on GPUs in SCIRun: the Conjugate Gradient Case Study

The goal of this research is to integrate graphics processing units (GPUs) into SCIRun, a biomedical problem solving environment, in a way that is transparent to the scientist. We have developed a portable mechanism that allows seamless coexistence of CPU and accelerated GPU computations to provide the best performance while also providing ease of use. […]

CUDA

Feb, 17

Takagi Factorization on GPU using CUDA

Takagi factorization or symmetric singular value decomposition is a special form of SVD applicable to symmetric complex matrices. The computation takes advantage of symmetry to reduce computation and storage requirements. The Jacobi method with chess tournament ordering was used to perform the computation in parallel on a GPU using the CUDA programming model. We were […]

CUDA

Feb, 17

Automatically Tuned Dense Linear Algebra for Multicore+GPU

The Multicore+GPU architecture has been adopted in some of the fastest supercomputers listed on the TOP500. The MAGMA project aims to develop a dense linear algebra library similar to LAPACK but for heterogeneous/hybrid architectures processors like Multicore+GPU. However, to provide portable performance, manual parameter tuning is required. This paper presents automatically tuned LU factorization. The […]

CUDA

Feb, 16

GpuC: Data parallel language extension to CUDA

In recent years, Graphics Processing Units (GPUs) have emerged as a powerful accelerator for general-purpose computations. Current approaches to program GPUs are still relatively low-level programming models such as Compute Unified Device Architecture (CUDA), a programming model from NVIDIA, and Open Compute Language (OpenCL), created by Apple in cooperation with others. These two programming models […]

CUDA

Feb, 16

Enhancing the simulation of P systems for the SAT problem on GPUs

GPUs constitute nowadays a solid alternative for high performance computing, and the advent of CUDA/OpenCL allow programmers a friendly model to accelerate a broad range of applications. The way GPUs exploit parallelism differ from multi-core CPUs, which raises new challenges to take advantage of its tremendous computing power. In this respect, P systems or Membrane […]

CUDA

Feb, 16

Accelerating the Stochastic Simulation Algorithm using Emerging Architectures

In order for scientists to learn more about molecular biology, it is imperative that they have the ability to construct and evaluate models. Model statistics consistent with the chemical master equation can be obtained using Gillespie’s stochastic simulation algorithm (SSA). Due to the stochastic nature of the Monte Carlo simulations, large numbers of simulations must […]

CUDA

Feb, 16

GPU Accelerated Stochastic Simulation

Through computational methods, biologists are able learn more about molecular biology by building accurate models. These models represent and predict the reactions among species populations within a system. One popular method to develop predictive models is to use a stochastic, Monte Carlo method developed by Gillespie called the stochastic simulation algorithm (SSA). Since this algorithm […]

CUDA

Feb, 16

A GPU-based Flood Simulation Framework

We present a multi-core, GPU-based framework for simulation and visualization of two-dimensional floods, based on the full implementation of Saint Venant equations. A validated CPU-based flood model was converted to NVIDIA’s CUDA architecture. The model was run on two different NVIDIA graphics cards, a GeForce 8400 GS and a Tesla T10. The model was tested […]

CUDA

Feb, 16

Static Memory Access Pattern Analysis on a Massively Parallel GPU

The performance of data-parallel processing can be highly sensitive to any contention in memory. In contrast to multi-core CPUs which employ a number of memory latency minimization techniques such as multi-level caching and prefetching, Graphics Processing Units (GPUs) require that the data-parallel computations reference memory in a deterministic pattern in order to reap the benefits […]

OpenCL

high performance computing on graphics processing units: hgpu.org

Posts

Simulations of Large Membrane Regions using GPU-enabled Computations – Preliminary Results

Dynamically scheduled Cholesky factorization on multicore architectures with GPU accelerators

A Strategy for Automatically Generating High Performance CUDA Code for a GPU Accelerator from a Specialized Fortran Code Expression

Accelerating Algorithms on GPUs in SCIRun: the Conjugate Gradient Case Study

Takagi Factorization on GPU using CUDA

Automatically Tuned Dense Linear Algebra for Multicore+GPU

GpuC: Data parallel language extension to CUDA

Enhancing the simulation of P systems for the SAT problem on GPUs

Accelerating the Stochastic Simulation Algorithm using Emerging Architectures

GPU Accelerated Stochastic Simulation

A GPU-based Flood Simulation Framework

Static Memory Access Pattern Analysis on a Massively Parallel GPU

Recent source codes

CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization

LC Framework

pplx-garden: Perplexity open source garden for inference technology

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

OpScanner

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Most viewed papers (last 30 days)