Posts
Jul, 16
2nd International Conference on Signal Processing (ICOSP), 2016
Paper Publication Accepted papers of ICOSP 2016 could be published in: International Journal of Signal Processing Systems (IJSPS)
Jul, 16
2nd International Conference on Mechanical Engineering and Electrical Systems (ICMES), 2016
Publication: All accepted papers will be published in the volume of MATEC Web of Conferences (ISSN: 2261-236X), being indexed by Ei Compendex, Inspec, DOAJ, CPCI (Web of Science) and Scopus. Submission Methods: Full Paper (publication and oral presentation) Abstract (oral presentation only) Electronic Submission System (.pdf) https://www.easychair.org/conferences/?conf=icmes2016
Jul, 16
Stability and Performance of Various Singular Value QR Implementations on Multicore CPU with a GPU
Singular Value QR (SVQR) can orthonormalize a set of dense vectors with the minimum communication (one global reduction between the parallel processing units, and BLAS-3 to perform most of its local computation). As a result, compared to other orthogonalization schemes, SVQR obtains superior performance on many of the current computers, where the communication has become […]
Jul, 16
An investigation of GPU-based stiff chemical kinetics integration methods
A fifth-order implicit Runge-Kutta method and two fourth-order exponential integration methods equipped with Krylov subspace approximations were implemented for the GPU and paired with the analytical chemical kinetic Jacobian software pyJac. The performance of each algorithm was evaluated by integrating thermochemical state data sampled from stochastic partially stirred reactor simulations and compared with the commonly […]
Jul, 16
Finite Element Integration with Quadrature on the GPU
We present a novel, quadrature-based finite element integration method for low-order elements on GPUs, using a pattern we call thread transposition to avoid reductions while vectorizing aggressively. On the NVIDIA GTX580, which has a nominal single precision peak flop rate of 1.5 TF/s and a memory bandwidth of 192 GB/s, we achieve close to 300 […]
Jul, 16
GeauxDock: Accelerating Structure-Based Virtual Screening with Heterogeneous Computing
Computational modeling of drug binding to proteins is an integral component of direct drug design. Particularly, structure-based virtual screening is often used to perform large-scale modeling of putative associations between small organic molecules and their pharmacologically relevant protein targets. Because of a large number of drug candidates to be evaluated, an accurate and fast docking […]
Jul, 16
Accelerating Eulerian Fluid Simulation With Convolutional Networks
Real-time simulation of fluid and smoke is a long standing problem in computer graphics, where state-of-the-art approaches require large compute resources, making real-time applications often impractical. In this work, we propose a data-driven approach that leverages the approximation power of deep-learning methods with the precision of standard fluid solvers to obtain both fast and highly […]
Jul, 13
GPU Accelerated Discrete Element Method (DEM) Molecular Dynamics for Conservative, Faceted Particle Simulations
Faceted shapes, such as polyhedra, are commonly found in systems of nanoscale, colloidal, and granular particles. Many interesting physical phenomena, like crystal nucleation and growth, vacancy motion, and glassy dynamics are challenging to model in these systems because they require detailed dynamical information at the individual particle level. Within the granular materials community the Discrete […]
Jul, 13
Survey of Domain-Specific Languages for FPGA Computing
High-performance FPGA programming has typically been the exclusive domain of a small band of specialized hardware developers. They are capable of reasoning about implementation concerns at the register-transfer level (RTL) which is analogous to assembly-level programming in software. Sometimes these developers are required to push further down to manage even lower levels of abstraction closer […]
Jul, 13
OpenFace: A general-purpose face recognition library with mobile applications
Cameras are becoming ubiquitous in the Internet of Things (IoT) and can use face recognition technology to improve context. There is a large accuracy gap between today’s publicly available face recognition systems and the state-of-the-art private face recognition systems. This paper presents our OpenFace face recognition library that bridges this accuracy gap. We show that […]
Jul, 13
The Vectorization of the Tersoff Multi-Body Potential: An Exercise in Performance Portability
Molecular dynamics simulations, an indispensable research tool in computational chemistry and materials science, consume a significant portion of the supercomputing cycles around the world. We focus on multi-body potentials and aim at achieving performance portability. Compared with well-studied pair potentials, multibody potentials deliver increased simulation accuracy but are too complex for effective compiler optimization. Because […]
Jul, 13
LU, QR, and Cholesky factorizations: Programming Model, Performance Analysis and Optimization Techniques for the Intel Knights Landing Xeon Phi
A wide variety of heterogeneous compute resources, ranging from multicore CPUs to GPUs and coprocessors, are available to modern computers, making it challenging to design unified numerical libraries that efficiently and productively use all these varied resources. For example, in order to efficiently use Intel’s Knights Langing (KNL) processor, the next-generation of Xeon Phi architectures, […]