Posts
Nov, 4
Exact diagonalization of quantum lattice models on coprocessors
We implement the Lanczos algorithm on an Intel Xeon Phi coprocessor and compare its performance to a multi-core Intel Xeon CPU and an NVIDIA graphics processor. The Xeon and the Xeon Phi are parallelized with OpenMP and the graphics processor is programmed with CUDA. The performance is evaluated by measuring the execution time of a […]
Nov, 4
Heterogeneous CPU/(GP) GPU Memory Hierarchy Analysis and Optimization
Heterogeneous systems, more specifically CPU – GPGPU platforms, have gained a lot of attention due to the excellent speedups GPUs can achieve with such little amount of energy consumption. Anyhow, not everything is such a good story, the complex programming models to get the maximum exploitation of the devices and data movement overheads are some […]
Nov, 4
3rd International Conference on Mechanical, Electronics and Computer Engineering (CMECE), 2016
Dear Scholars and Researchers, Warmest Greetings from CMECE 2016! This is 2016 3rd International Conference on Mechanical, Electronics and Computer Engineering (CMECE 2016) conference committee. We are very pleased to tell you that CMECE 2016 will be held in New York, USA during January 07-09, 2016. CMECE2014 and 2015 had been held in Sanya, China […]
Nov, 4
4th International Conference on Nano and Materials Science (ICNMS), 2016
Dear Scholars and Researchers, Warmest Greetings from ICNMS 2016! This is 2016 4th International Conference on Nano and Materials Science (ICNMS 2016) conference committee. We are very pleased to tell you that ICNMS 2016 will be held in New York, USA during January 7-9, 2016. Publication All papers, both invited and contributed, will be reviewed […]
Nov, 3
A Framework for Transparent Execution of Massively-Parallel Applications on CUDA and OpenCL
We present a novel framework for the simultaneous development for different massively parallel platforms. Currently, our framework supports CUDA and OpenCL but it can be easily adapted to other programming languages. The main idea is to provide an easy-to-use abstraction layer that encapsulates the calls of own parallel device code as well as library functions. […]
Nov, 3
Structural Agnostic SpMV: Adapting CSR-Adaptive for Irregular Matrices
Sparse matrix vector multiplication (SpMV) is an important linear algebra primitive. Recent research has focused on improving the performance of SpMV on GPUs when using compressed sparse row (CSR), the most frequently used matrix storage format on CPUs. Efficient CSR-based SpMV obviates the need for other GPU-specific storage formats, thereby saving runtime and storage overheads. […]
Nov, 3
Software Defined Radio over CUDA
Software Defined Radio (SDR) is a wireless communication system in which components of transmitters and receivers are mostly implemented by software (filters, mixers, modulators). Thanks to this approach, is possible to implement a single universal radio transceiver, capable of multi-mode and multi-standard wireless communications. These capabilities are very useful for researchers and radio amateur, who […]
Nov, 3
On the programmability of multi-GPU computing systems
Multi-GPU systems are widely used in High Performance Computing environments to accelerate scientific computations. This trend is expected to continue as integrated GPUs will be introduced to processors used in multi-socket servers and servers will pack a higher number of GPUs per node. GPUs are currently connected to the system through the PCI Express interconnect, […]
Nov, 3
Exploring Optimisations for the Local Assembly phase of Finite Element Methods on GPUs
Finite Element Methods (FEM) are ubiquitous in science and engineering where they are used in fields as diverse as structural analysis, ocean modeling and bioengineering. FEM allow us to find approximate solutions to a system of partial differential equations over an unstructured mesh. The first phase of solving a FEM problem, local assembly, involves computing […]
Oct, 31
Energy-Efficient Execution of Data-Parallel Applications on Heterogeneous Mobile Platforms
State-of-the-art mobile system-on-chips (SoC) include heterogeneity in various forms for accelerated and energy-efficient execution of diverse range of applications. The modern SoCs now include programmable cores such as CPU and GPU with very different functionality. The SoCs also integrate performance heterogeneous cores with different power-performance characteristics but the same instruction-set architecture such as ARM big.LITTLE. […]
Oct, 31
Estimation of numerical reproducibility on CPU and GPU
Differences in simulation results may be observed from one architecture to another or even inside the same architecture. Such reproducibility failures are often due to different rounding errors generated by different orders in the sequence of arithmetic operations. Reproducibility problems are particularly noticeable on new computing architectures such as multicore processors or GPUs (Graphics Processing […]
Oct, 31
Parallelization of Encryption and Hashing Algorithm Using GPU
With the development of the GPGPU (General-purpose computing on graphics processing units), more and more computing problems are solved by using the parallel property of GPU (Graphics Processing Unit). CUDA (Compute Unified Device Architecture) is a framework which makes the GPGPU more accessible and easier to learn for the general population of programmers. This is […]