high performance computing on graphics processing units: hgpu.org

Posts

May, 13

CUDA 7 Performance Overview webinar

CUDA 7 Toolkit has lots of new features – and also many performance enhancements. Ujval Kapasi is Director, CUDA Product Management at NVIDIA. Ujval received his Ph.D. in Electrical Engineering from Stanford University and his Bachelor of Science in Engineering from Brown University. Download slides (PDF) View slides (PDF) via Google Docs

May, 12

7th International Conference on Computer Technology and Development (ICCTD 2015), 2015

Submission Deadline: 2015-07-10 Topics: A1: Algorithms B1: Communication Networks A2: Bioinformatics B2: Wireless Communications A3: Computer Simulation B3: Mobile Communications A4: Control Systems B4: Infrastructure for Next Generation Networks A5: Data Mining B5: Information & Communication A6: Expert Systems B6: Coding Theory A7: Image Processing B7: Optical Communications A8: Multimedia B8: Internet Technologies A9: Natural […]

May, 12

4th International Conference on Communication and Broadband Networking (ICCBN 2015), 2015

Submission Deadline: 2015-07-10 Topics: • Wireless Communications and Networking • Multimedia Networking • Signal Processing for Communications • Networking Algorithms and Performance Evaluation • Wireless Sensor Networks • Communication and Information Theory • Network Security • Cognitive Radio Networks • Internet Applications • Protocols and Algorithms • Coding Theory • 3G & 4G Mobile Communication […]

May, 12

International Conference on Systems, Control and Communications (ICSCC), 2015

Submission Deadline: 2015-07-10 Topics: Information-based control systems Distributed and cooperative control systems Networked control systems (NCS) Wired and wireless networks Network control (admission/flow/congestion control, etc.) Network scheduling and bandwidth allocation Informatics in control and communication Cyber-physical systems (CPSs) Sensor and actuator networks Multi-agent systems Case studies and applications For more topics: http://www.icscc.org/cfp.html Publication: All accepted […]

May, 12

Improving CUDA DNA Analysis Software with Genetic Programming

We genetically improve BarraCUDA using a BNF grammar incorporating C scoping rules with GP. Barracuda maps next generation DNA sequences to the human genome using the Burrows-Wheeler algorithm (BWA) on nVidia Tesla parallel graphics hardware (GPUs). GI using phenotypic tabu search with manually grown code can graft new features giving more than 100 fold speed […]

CUDA

May, 12

CVPI: A Computer Vision Library For Mobile and Embedded Platforms

CVPI is a library for implementing computer vision programs on computers supporting OpenVG. It adds additional image processing capabilities to OpenVG that are necessary for computer vision, as well a as providing an interface to setup the rendering environment. OpenVG is a hardware accelerated C API for vector and raster 2D graphics. It is widely […]

OpenGL

May, 12

Development of Parallel Architectures for Radar/Video Signal Processing Applications

The applications of digital signal processing continue to expand and use in many different areas such as signal processing, radar tracking, image processing, medical imaging, video broadcasting, and control algorithms for sensor array processing. Most of the signal processing applications are intensive and may not achieve the real time requirements. However, the Multi-core phenomenon has […]

CUDA

May, 12

Density Estimations for Approximate Query Processing on SIMD Architectures

Approximate query processing (AQP) is an interesting alternative for exact query processing. It is a tool for dealing with the huge data volumes where response time is more important than perfect accuracy (this is typically the case during initial phase of data exploration). There are many techniques for AQP, one of them is based on […]

CUDA

May, 12

FPGA-Based Design of Numerical Algorithms for Kernel Density Estimation Using High Level Synthesis Approach

FPGA technology can offer significantly higher performance at much lower power than is available from CPUs and GPUs in many computational problems. Unfortunately, programming for FPGA (using hardware description languages, HDL) is a difficult and not-trivial task and is not intuitive for C/C++/Java programmers. To bring the gap between programming effectiveness and difficulty the High […]

May, 10

Age and Gender Classification using Convolutional Neural Networks

Automatic age and gender classification has become relevant to an increasing amount of applications, particularly since the rise of social platforms and social media. Nevertheless, performance of existing methods on real-world images is still significantly lacking, especially when compared to the tremendous leaps in performance recently reported for the related task of face recognition. In […]

CUDA

May, 10

Numerical Simulation of Melting with Natural Convection Based on Lattice Boltzmann Method and Performed with CUDA Enabled GPU

A new solver is developed to numerically simulate the melting phase change with natural convection. This solver was implemented on a single Nvidia GPU based on the CUDA technology in order to simulate the melting phase change in a 2D rectangular enclosure. The Rayleigh number is of the order of magnitude of 108 and Prandlt […]

CUDA

May, 10

Tracking Many Solution Paths of a Polynomial Homotopy on a Graphics Processing Unit

Polynomial systems occur in many areas of science and engineering. Unlike general nonlinear systems, the algebraic structure enables to compute all solutions of a polynomial system. We describe our massive parallel predictor-corrector algorithms to track many solution paths of a polynomial homotopy. The data parallelism that provides the speedups stems from the evaluation and differentiation […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Posts

CUDA 7 Performance Overview webinar

7th International Conference on Computer Technology and Development (ICCTD 2015), 2015

4th International Conference on Communication and Broadband Networking (ICCBN 2015), 2015

International Conference on Systems, Control and Communications (ICSCC), 2015

Improving CUDA DNA Analysis Software with Genetic Programming

CVPI: A Computer Vision Library For Mobile and Embedded Platforms

Development of Parallel Architectures for Radar/Video Signal Processing Applications

Density Estimations for Approximate Query Processing on SIMD Architectures

FPGA-Based Design of Numerical Algorithms for Kernel Density Estimation Using High Level Synthesis Approach

Age and Gender Classification using Convolutional Neural Networks

Numerical Simulation of Melting with Natural Convection Based on Lattice Boltzmann Method and Performed with CUDA Enabled GPU

Tracking Many Solution Paths of a Polynomial Homotopy on a Graphics Processing Unit

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)