high performance computing on graphics processing units: hgpu.org

Posts

Mar, 13

Visualizing Trends on Twitter

With its popularity, Twitter has become an increasingly valuable source of real-time, user-generated information about interesting events in our world. This thesis presents TwitGeo, a system to explore and visualize trending topics on Twitter. It features an interactive map that summarizes trends across different geographical regions. Powered by a novel GPU-based datastore, this system performs […]

CUDA

Mar, 13

The Flocking Based and GPU Accelerated Internet Traffic Classification

Mainstream attentions have been brought to the issue of Internet traffic classification due to its political, economic, and legal impacts on appropriate use, pricing, and management of the Internet. Nowadays, both the research and operational communities prefer to classify network traffic through approaches that are based on the statistics of traffic flow features due to […]

CUDA

Mar, 12

Fast hydrodynamics on heterogenous many-core hardware

In this chapter, we present details of a heterogenous and massively parallel GPU library implementation in CUDA C/C++ of a nonlinear free surface water wave model [15]. We describe how flexible-order finite difference approximations to the partial differential equations of the model can be proto- typed using library components provided in an in-house library. In […]

CUDA

Mar, 12

Development of High-Performance Software Components for Emerging Architectures

Massively parallel processors, such as graphical processing units (GPUs), have in recent years proven to be effective for a vast amount of scientific appli- cations. Today, most desktop computers are equipped with one or more pow- erful GPUs, offering heterogeneous high-performance computing to a broad range of scientific researchers and software developers. Though GPUs are […]

CUDA

Mar, 12

2014 7th International Conference on Advanced Computer Theory and Engineering, ICACTE 2014

Submission Deadline: 2014-06-05 Publication： All accepted papers of ICACTE 2014 will be published in the conference proceedings, under an ISBN reference by ASME Press, which will be included in the ASME Digital Library, and the publisher will send the proceeding to be reviewed by the Ei Compendex, ISI Proceeding and other major indexing services. Call […]

Mar, 12

Configuration and Benchmarks of Peer-to-Peer Communication over Gigabit Ethernet and InfiniBand in a Cluster with Intel Xeon Phi Coprocessors

Intel Xeon Phi coprocessors allow symmetric heterogeneous clustering models, in which MPI processes are run fully on coprocessors, as opposed to offload-based clustering. These symmetric models are attractive, because they allow effortless porting of CPU-based applications to clusters with manycore computing accelerators. However, with the default software configuration and without specialized networking hardware, peer-to-peer communication […]

Mar, 12

Locality optimization on a NUMA architecture for hybrid LU factorization

We study the impact of non-uniform memory accesses (NUMA) on the solution of dense general linear systems using an LU factorization algorithm. In particular we illustrate how an appropriate placement of the threads and memory on a NUMA architecture can improve the performance of the panel factorization and consequently accelerate the global LU factorization. We […]

CUDA

Mar, 12

Reduced Vlasov-Maxwell simulations

In this paper we review two different numerical methods for Vlasov-Maxwell simulations. The first method is based on a coupling between a Discontinuous Galerkin (DG) Maxwell solver and a Particle-In-Cell (PIC) Vlasov solver. The second method only uses a DG approach for the Vlasov and Maxwell equations. The Vlasov equation is first reduced to a […]

OpenCL

Mar, 12

Genetically Improved CUDA kernels for StereoCamera

Genetic Programming (GP) may dramatically increase the performance of software written by domain experts. GP and autotuning are used to optimise and refactor legacy GPGPU C code for modern parallel graphics hardware and software. Speed ups of more than six times on recent nVidia GPU cards are reported compared to the original kernel on the […]

CUDA

Mar, 12

Efficient Preconditioned Conjugate Gradient Parallelization on GPU

We present a performance analysis of a parallel implementation of both conjugate gradient and preconditioned conjugate gradient solvers using graphic processing units with CUDA parallel programming model. The solvers were optimized for a fast solution of sparse systems of equations arising from Finite Element Analysis (FEA) of electromagnetic phenomena. The preconditioners were Incomplete Cholesky factorization […]

CUDA

Mar, 12

MaxSSmap: A GPU program for short read mapping with the maximum scoring subsequence

Exact short read mapping to whole genomes with the Smith-Waterman algorithm is computationally expensive yet highly accurate when aligning reads with mismatches and gaps. We introduce a GPU program called MaxSSmap with the aim of achieving comparable accuracy to Smith-Waterman but with faster runtimes. Similar to mainstream approaches MaxSSmap identifies a local region of the […]

CUDA

Mar, 10

OpenCL-Accelerated Simplified General Perturbations 4 Algorithm

The number of space objects such as satellites, spacecraft, and debris are increasing significantly, and so is the need for tracking them for security and collision avoidance purposes. In this context, as parallelism is becoming a new paradigm, the need of implementing high performance propagators remain unmet. For this, we implemented Simplified General Perturbations No. […]

OpenCL

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Visualizing Trends on Twitter

The Flocking Based and GPU Accelerated Internet Traffic Classification

Fast hydrodynamics on heterogenous many-core hardware

Development of High-Performance Software Components for Emerging Architectures

2014 7th International Conference on Advanced Computer Theory and Engineering, ICACTE 2014

Configuration and Benchmarks of Peer-to-Peer Communication over Gigabit Ethernet and InfiniBand in a Cluster with Intel Xeon Phi Coprocessors

Locality optimization on a NUMA architecture for hybrid LU factorization

Reduced Vlasov-Maxwell simulations

Genetically Improved CUDA kernels for StereoCamera

Efficient Preconditioned Conjugate Gradient Parallelization on GPU

MaxSSmap: A GPU program for short read mapping with the maximum scoring subsequence

OpenCL-Accelerated Simplified General Perturbations 4 Algorithm

Recent source codes

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Most viewed papers (last 30 days)