18456

Posts

Sep, 9

Doctor AI: Interpretable Deep Learning for Modeling Electronic Health Records

Deep learning recently has been showing superior performance in complex domains such as computer vision, audio processing and natural language processing compared to traditional statistical methods. Naturally, deep learning techniques, combined with large electronic health records (EHR) data generated from healthcare organizations have potential to bring dramatic changes to the healthcare industry. However, typical deep […]
Sep, 9

Using SIMD and SIMT vectorization to evaluate sparse chemical kinetic Jacobian matrices and thermochemical source terms

Accurately predicting key combustion phenomena in reactive-flow simulations, e.g., lean blow-out, extinction/ignition limits and pollutant formation, necessitates the use of detailed chemical kinetics. The large size and high levels of numerical stiffness typically present in chemical kinetic models relevant to transportation/power-generation applications make the efficient evaluation/factorization of the chemical kinetic Jacobian and thermochemical source-terms critical […]
Sep, 9

Cracks in the Sky: Abelian-Higgs Cosmic String Evolution with CUDA

Topological defects form at cosmological phase transitions by the Kibble mechanism, with cosmic strings and superstrings having the most interesting phenomenology. A rigorous analysis of their astrophysical consequences is limited by the availability of accurate numerical simulations, and therefore by hardware resources and computation time. Improving the speed and efficiency of existing codes is therefore […]
Sep, 2

Optimizing Communication for Clusters of GPUs

GPUs are frequently used to accelerate data-parallel workloads across a wide variety of application domains. While GPUs offer a large amount of computational throughput within a single node, the largest problems require a cluster of such devices communicating with different compute nodes across a network. These clusters can range in size from a small handful […]
Sep, 2

Performance Evaluation and Tuning of An OpenCL based Matrix Multiplier

Matrix multiplication is one of the fundamental building blocks of numerical linear algebra. It requires computer systems have huge computing capability and consumes much more power as problem size is increased. In this research, an OpenCL-based matrix multiplier is presented. When data are single precision floating-points, compared with the software simulations based on the Intel […]
Sep, 2

Implementing Strassen’s Algorithm with CUTLASS on NVIDIA Volta GPUs

Conventional GPU implementations of Strassen’s algorithm (Strassen) typically rely on the existing high-performance matrix multiplication (GEMM), trading space for time. As a result, such approaches can only achieve practical speedup for relatively large, "squarish" matrices due to the extra memory overhead, and their usages are limited due to the considerable workspace. We present novel Strassen […]
Sep, 2

Full Speed Ahead: 3D Spatial Database Acceleration with GPUs

Many industries rely on visual insights to support decision- making processes in their businesses. In mining, the analysis of drills and geological shapes, represented as 3D geometries, is an important tool to assist geologists on the search for new ore deposits. Aeronautics manipulate high-resolution geometries when designing a new aircraft aided by the numerical simulation […]
Sep, 2

A study of integer sorting on multicores

Integer sorting on multicores and GPUs can be realized by a variety of approaches that include variants of distribution-based methods such as radix-sort, comparison-oriented algorithms such as deterministic regular sampling and random sampling parallel sorting, and network-based algorithms such as Batcher’s bitonic sorting algorithm. In this work we present an experimental study of integer sorting […]
Aug, 26

Deep learning: A guide for practitioners in the physical sciences

Machine learning is finding increasingly broad applications in the physical sciences. This most often involves building a model relationship between a dependent, measurable output, and an associated set of controllable, but complicated, independent inputs. We present a tutorial on current techniques in machine learning – a jumping-off point for interested researchers to advance their work. […]
Aug, 26

Optimizing Web Virtual Reality

Performance has always been a key factor in any virtual and augmented reality experience. Since Virtual Reality was conceived, performance has always been the factor that has often slowed down, or at times even halted the adoption of Virtual Reality related technologies. More recently, the hardware advancements have caught up with the development so that […]
Aug, 26

Auto-tuning Hybrid CPU-GPU Execution of Algorithmic Skeletons in SkePU

The trend in computer architectures has for several years been heterogeneous systems consisting of a regular CPU and at least one additional, specialized processing unit, such as a GPU.The different characteristics of the processing units and the requirement of multiple tools and programming languages makes programming of such systems a challenging task. Although there exist […]
Aug, 26

Performance Evaluation of OpenMP’s Target Construct on GPUs – Exploring Compiler Optimizations

OpenMP is a directive-based shared memory parallel programming model and has been widely used for many years. From OpenMP 4.0 onwards, GPU platforms are supported by extending OpenMP’s high-level parallel abstractions with accelerator programming. This extension allows programmers to write GPU programs in standard C/C++ or Fortran languages, without exposing too many details of GPU […]
Page 2 of 95912345...102030...Last »

Recent source codes

* * *

* * *

HGPU group © 2010-2018 hgpu.org

All rights belong to the respective authors

Contact us: