high performance computing on graphics processing units: hgpu.org

Posts

Nov, 6

Parallelizing Alternating Direction Implicit Solver on GPUs

We present a parallel Alternating Direction Implicit (ADI) solver on GPUs. Our implementation significantly improves ex- isting implementations in two aspects. First, we address the scalability issue of existing Parallel Cyclic Reduction (PCR) implementations by eliminating their hardware resource constraints. As a result, our parallel ADI, which is based on PCR, no longer has the […]

CUDA

Nov, 6

Development of Generic Scheduling Concepts for OpenGL ES 2.0

The ability of a Graphics Processing Unit (GPU) to do efficient and massively parallel computations makes it the choice for 3D graphic applications. It is been extensively used as a hardware accelerator to boost the performance of a single application like 3D games. However, due to increasing number of 3D rendering applications and the limiting […]

OpenGL

Nov, 6

Accelerating Dissipative Particle Dynamics Simulations on GPUs: Algorithms, Numerics and Applications

We present a scalable dissipative particle dynamics simulation code, fully implemented on the Graphics Processing Units (GPUs) using a hybrid CUDA/MPI programming model, which achieves 10-30 times speedup on a single GPU over 16 CPU cores and almost linear weak scaling across a thousand nodes. A unified framework is developed within which the efficient generation […]

CUDA

Nov, 6

Performance of Kepler GTX Titan GPUs and Xeon Phi System

NVIDIA’s new architecture, Kepler improves GPU’s performance significantly with the new streaming multiprocessor SMX. Along with the performance, NVIDIA has also introduced many new technologies such as direct parallelism, hyper-Q and GPU Direct with RDMA. Apart from other usual GPUs, NVIDIA also released another Kepler ‘GeForce’ GPU named GTX Titan. GeForce GTX Titan is not […]

CUDA

Nov, 4

Batch Method for Efficient Resource Sharing in Real-time Multi-GPU Systems

The performance of many GPU-based systems depends heavily on the effective bandwidth for transferring data between the processors. For realtime systems, the importance of data transfer rates may be even higher due to non-deterministic transfer times that limit the ability to satisfy response time requirements. We present a new method that allows real-time applications to […]

CUDA

•

OpenCL

Nov, 4

DynaProg for Scala: A Scala DSL for Dynamic Programming on CPU and GPU

Dynamic programming is an algorithmic technique to solve problems that follow the Bellman’s principle: optimal solutions depends on optimal sub-problem solutions. The core idea behind dynamic programming is to memoize intermediate results into matrices to avoid multiple computations. Solving a dynamic programming problem consists of two phases: filling one or more matrices with intermediate solutions […]

CUDA

Nov, 4

Use of Checkpoint-Restart for Complex HEP Software on Traditional Architectures and Intel MIC

Process checkpoint-restart is a technology with great potential for use in HEP workflows. Use cases include debugging, reducing the startup time of applications both in offline batch jobs and the High Level Trigger, permitting job preemption in environments where spare CPU cycles are being used opportunistically and efficient scheduling of a mix of multicore and […]

Nov, 4

Initial Explorations of ARM Processors for Scientific Computing

Power efficiency is becoming an ever more important metric for both high performance and high throughput computing. Over the course of next decade it is expected that flops/watt will be a major driver for the evolution of computer architecture. Servers with large numbers of ARM processors, already ubiquitous in mobile computing, are a promising alternative […]

Nov, 4

OpenCUDA+MPI: A Framework for Heterogeneous GP-GPU Distributed Computing

The introduction and rise of General Purpose Graphics Computing has significantly impacted parallel and high-performance computing. It has introduced challenges when it comes to distributed computing with GPUs. Current solutions target specifics: specific hardware, specific network topology, a specific level of processing. Those restrictions on GPU computing limit scientists and researchers in various ways. The […]

CUDA

Nov, 3

PAKCK: Performance and Power Analysis of Key Computational Kernels on CPUs and GPUs

Recent projections suggest that applications and architectures will need to attain 75 GFLOPS/W in order to support future DoD missions. Meeting this goal requires deeper understanding of kernel and application performance as a function of power and architecture. As part of the PAKCK study, a set of DoD application areas, including signal and image processing […]

CUDA

Nov, 3

Parallel CPU and GPU computations to solve the job shop scheduling problem with blocking

In this paper, we studied the parallelization of an exact method to solve the job shop scheduling problem with blocking JSB. We used a modeling based on graph theory exploiting the alternative graphs. We have proposed an original parallelization technique for performing a parallel computation in the various branches of the search tree. This technique […]

CUDA

Nov, 3

A Fast and Secure Way to Prevent SQL Injection Attacks using Bitslice Technique and GPU Support

Most of the web applications are associated with database as back-end so there are possibilities of SQL injection attacks (SQLIA) on it. Even SQLIA is among top ten attacks according to Open Web Application Security Project (OWASP) but still approaches are not able to give proper solution to this problem. Numbers of measures are also […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

Parallelizing Alternating Direction Implicit Solver on GPUs

Development of Generic Scheduling Concepts for OpenGL ES 2.0

Accelerating Dissipative Particle Dynamics Simulations on GPUs: Algorithms, Numerics and Applications

Performance of Kepler GTX Titan GPUs and Xeon Phi System

Batch Method for Efficient Resource Sharing in Real-time Multi-GPU Systems

DynaProg for Scala: A Scala DSL for Dynamic Programming on CPU and GPU

Use of Checkpoint-Restart for Complex HEP Software on Traditional Architectures and Intel MIC

Initial Explorations of ARM Processors for Scientific Computing

OpenCUDA+MPI: A Framework for Heterogeneous GP-GPU Distributed Computing

PAKCK: Performance and Power Analysis of Key Computational Kernels on CPUs and GPUs

Parallel CPU and GPU computations to solve the job shop scheduling problem with blocking

A Fast and Secure Way to Prevent SQL Injection Attacks using Bitslice Technique and GPU Support

Recent source codes

OpScanner

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Kernel Library for LLM Serving

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Most viewed papers (last 30 days)