13812

Posts

Apr, 7

Multi-Lingual Speech Recognition with Low-Rank Multi-Task Deep Neural Networks

Multi-task learning (MTL) for deep neural network (DNN) multilingual acoustic models has been shown to be effective for learning parameters that are common or shared between multiple languages[1, 2]. In the MTL paradigm, the number of parameters in the output layer is large and scales with the number of languages used in training. This output […]
Apr, 7

Flexible, Fast and Accurate Sequence Alignment Profiling on GPGPU with PaSWAS

MOTIVATION: To obtain large-scale sequence alignments in a fast and flexible way is an important step in the analyses of next generation sequencing data. Applications based on the Smith-Waterman (SW) algorithm are often either not fast enough, limited to dedicated tasks or not sufficiently accurate due to statistical issues. Current SW implementations that run on […]
Apr, 7

On Password Guessing with GPUs and FPGAs

Passwords are still by far the most widely used form of user authentication, for applications ranging from online banking or corporate network access to storage encryption. Password guessing thus poses a serious threat for a multitude of applications. Modern password hashes are specifically designed to slow down guessing attacks. However, having exact measures for the […]
Apr, 4

OmpSs task offload

Exascale performance requires a level of energy efficiency only achievable with specialized hardware. Hence, to build a general purpose HPC system with exascale performance different types of processors, memory technologies and interconnection networks will be necessary. Heterogeneous hardware is already present on some top supercomputer systems that are composed of different compute nodes, which at […]
Apr, 4

Reduction of a Symmetrical Matrix to Tridiagonal Form on GPUs

Many eigenvalue and eigenvector algorithms begin with reducing the input matrix into a tridiagonal form. A tridiagonal matrix is a matrix that has non-zero elements only on its main diagonal, and the two diagonals directly adjacent to it. Reducing a matrix to a tridiagonal form is an iterative process which uses Jacobi rotations to reduce […]
Apr, 4

An Effective Model of CPU/GPU Collaborative Computing in GPU Clusters

Remote procedure call (RPC) is a simple, transparent and useful paradigm for providing communication between two processes across a network. The compute unified device architecture (CUDA) programming toolkit and runtime enhance the programmability of the graphics processing unit (GPU) and make GPU more versatile in high performance computing. The current researches mainly focus on the […]
Apr, 4

The Design and Implementation of a Verification Technique for GPU Kernels

We present a technique for the formal verification of GPU kernels, addressing two classes of correctness properties: data races and barrier divergence. Our approach is founded on a novel formal operational semantics for GPU kernels termed synchronous, delayed visibility (SDV) semantics, which captures the execution of a GPU kernel by multiple groups of threads. The […]
Apr, 4

Using OpenCL to Implement Median Filtering and RSA Algorithms: Two GPGPU Application Case Studies

Graphics Processing Units (GPU) and their development tools have advanced recently, and industry has become more interested in using them. Among several development frameworks for GPU(s), OpenCL provides a programming environment to write portable code that can run in parallel. This report describes two case studies of algorithm implementations in OpenCL. The first algorithm is […]
Apr, 1

Distributed wideband software-defined radio receiver for heterogeneous systems

Recent years have seen an increasing need for computationally efficient implementation of software-defined radio (SDR) systems. Given the limitations of a typical SDR application running on a single machine, we present a distributed SDR system using high-performance techniques. To split a digital signal into multiple channels, we use an efficient digital signal processing technique: a […]
Apr, 1

Generating Null Models for Large-Scale Networks on GPU

A network generated by randomly rewiring the edges of an original network on some constraint conditions is called the null model of the original network. It’s a useful tool for revealing some mechanisms affecting the topology of networks. As the scales of networks become larger and larger, time consumption of generating null models increases. How […]
Apr, 1

Microbranching in mode-I fracture using large scale simulations of amorphous and perturbed lattice models

We study the high-velocity regime mode-I fracture instability using large scale simulations. At large driving displacements, the pattern of a single, steady-state crack that propagates in the midline of the sample breaks down, and small microbranches start to appear near the main crack. Some of the features of those microbranches have been reproduced qualitatively in […]
Apr, 1

Separable projection integrals for higher-order correlators of the cosmic microwave sky: Acceleration by factors exceeding 100

We study the optimisation and porting of the "Modal" code on Intel(R) Xeon(R) processors and/or Intel(R) Xeon Phi(TM) coprocessors using methods which should be applicable to more general compute bound codes. "Modal" is used by the Planck satellite experiment for constraining general non-Gaussian models of the early universe via the bispectrum of the cosmic microwave […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: