Posts
Apr, 14
A smooth particle hydrodynamics code to model collisions between solid, self-gravitating objects
Modern graphics processing units (GPUs) lead to a major increase in the performance of the computation of astrophysical simulations. Owing to the different nature of GPU architecture compared to traditional central processing units (CPUs) such as x86 architecture, existing numerical codes cannot be easily migrated to run on GPU. Here, we present a new implementation […]
Apr, 12
CUED-RNNLM – An Open-Source Toolkit for Efficient Training and Evaluation of Recurrent Neural Network Language Models
In recent years, recurrent neural network language models (RNNLMs) have become increasingly popular for a range of applications including speech recognition. However, the training of RNNLMs is computationally expensive, which limits the quantity of data, and size of network, that can be used. In order to fully exploit the power of RNNLMs, efficient training implementations […]
Apr, 12
Efficient Parallel Implementation for Single Block Orthogonal Dictionary Learning
Dictionary training for sparse representations involves dealing with large chunks of data and complex algorithms that determine time consuming tasks. In this paper we propose an improved parallel version for the single block orthogonal dictionary learning algorithm that reduces the representation error and improves the execution time. Our solution targets OpenCL capable graphical device units […]
Apr, 12
Portable and Transparent Software Managed Scheduling on Accelerators for Fair Resource Sharing
Accelerators, such as Graphic Processing Units (GPUs), are popular components of modern parallel systems. Their energy-efficient performance make them attractive components for modern data center nodes. However, they lack control for fair resource sharing amongst multiple users. This paper presents a runtime and Just In Time compiler that enables resource sharing control and software managed […]
Apr, 12
Algorithmic and Software System Support to Accelerate Data Processing in CPU-GPU Hybrid Computing Environments
Massively data-parallel processors, Graphics Processing Units (GPUs) in particular, have recently entered the main stream of general-purpose computing as powerful hardware accelerators to a large scope of applications including databases, medical informatics, and big data analytics. However, despite their performance benefit and cost effectiveness, the utilization of GPUs in production systems still remains limited. A […]
Apr, 12
Real-Time Computation of Parameter Fitting and Image Reconstruction Using Graphical Processing Units
In recent years graphical processing units (GPUs) have become a powerful tool in scientific computing. Their potential to speed up highly parallel applications brings the power of high performance computing to a wider range of users. However, programming these devices and integrating their use in existing applications is still a challenging task. In this paper […]
Apr, 9
GIFT: A Real-time and Scalable 3D Shape Search Engine
Projective analysis is an important solution for 3D shape retrieval, since human visual perceptions of 3D shapes rely on various 2D observations from different view points. Although multiple informative and discriminative views are utilized, most projection-based retrieval systems suffer from heavy computational cost, thus cannot satisfy the basic requirement of scalability for search engines. In […]
Apr, 9
Monte-Carlo Black-Scholes Implementation using OpenCL Standard
The OpenCL is a standard parallel language which is based on C language. It offers users to take full advantage and also provide the flexibility of high level language. In this paper, we explore the use of OpenCL language to implement the complex design on FPGAs by describing the design with high level abstraction language. […]
Apr, 9
Optimizing Performance of Recurrent Neural Networks on GPUs
As recurrent neural networks become larger and deeper, training times for single networks are rising into weeks or even months. As such there is a significant incentive to improve the performance and scalability of these networks. While GPUs have become the hardware of choice for training and deploying recurrent models, the implementations employed often make […]
Apr, 9
dMath: A Scalable Linear Algebra and Math Library for Heterogeneous GP-GPU Architectures
A new scalable parallel math library, dMath, is presented in this paper that demonstrates leading scaling when using intranode, or internode, hybrid-parallelism for deep-learning. dMath provides easy-to-use distributed base primitives and a variety of domain-specific algorithms. These include matrix multiplication, convolutions, and others allowing for rapid development of highly scalable applications, including Deep Neural Networks […]
Apr, 9
The 8th International Conf. on Signal Processing Systems (ICSPS), 2016
Publication Accepted papers will be published in the conference proceedings, which will be indexed by EI Compendex; SCOPUS; ULRICH’s Periodicals Directory; INSPEC; etc. Agenda November 21, 2016 – Registration & Conference Materials Collection November 22, 2016 – workshop November 23, 2016 – Keynote Speeches & Participants’ Oral Presentation November 24, 2016 – Academic Visiting The […]
Apr, 9
Internaitonal Conf. on Biomedical Signal and Bioinformatics (ICBSB), 2016
Schedule November 21, 2016 (Monday) Participants Onsite Registration & Conference Materials Collection November 22, 2016 (Tuesday) Opening Ceremony and Keynote Speeches Participants’ Oral Presentation Excellent Paper Awards Ceremony & Dinner Banquet November 23, 2016 (Wednesday) Academic Visit November 24, 2016 (Thusday) Tutorial Registration Tutorial Conference Venue The Sir Paul Reeves Building at AUT Address: […]