Posts
Oct, 7
Investigation on the Use of GPGPU for Fast Sparse Matrix Factorization
Solution for network equations is frequently encountered by power system researchers. With the increasingly larger system size, time consumed network solution is becoming a dominant factor in the overall time cost. One distinct and important feature of the network admittance matrix is that it is highly sparse, which need to be addressed by specialized computation […]
Oct, 7
GPGPU-assisted prediction of ion binding sites in proteins
Prediction of binding sites for different types of ions in protein 3D structure context is a complex challenge for biophysical computational methods. One possible approach involves using empirical, also called as knowledge-based, potentials. In the current study, we present a new GPGPU program complex, PIONCA (Protein-ION CAlculator) for efficient generation of empirical potentials for protein-ion […]
Oct, 7
Heterogeneous NPACI-Rocks/MPI/CUDA distributed multi-GPGPU application for seeking counterexamples to Beal’s Conjecture: MPI/CUDA integration component
Beal’s Conjecture asserts that if Ax + By = Cz for integers A,B,C > 0 and integers x,y,z > 2, then A, B, and C share a common prime factor. While empirical computational studies by several researchers have established that Beal’s Conjecture holds for all A,B,C,x,y,z < 1000, the truth of the general conjecture remains […]
Oct, 7
Hybrid coherence for scalable multicore architectures
This work describes a cache architecture and memory model for 1000+ core microprocessors. Our approach exploits workload characteristics and programming model assumptions to build a hybrid memory model that incorporates features from both software-managed coherence schemes and hardware cache coherence. The goal is to achieve the scalability found in compute accelerators, which support relaxed ordering […]
Oct, 7
Intel’s Array Building Blocks: A retargetable, dynamic compiler and embedded language
Our ability to create systems with large amount of hardware parallelism is exceeding the average software developer’s ability to effectively program them. This is a problem that plagues our industry. Since the vast majority of the world’s software developers are not parallel programming experts, making it easy to write, port, and debug applications with sufficient […]
Oct, 7
A Framework for Automatic OpenMP Code Generation
It is always a tedious task to manually analyze and detect parallelism in programs. When we deal with autoparallelism the task becomes more complex. Frameworks such as OpenMP is available through which we can manually annotate the code to realize parallelism and take the advantage of underlying multi-core architecture. But the programmer’s life becomes simple […]
Oct, 7
Implementation of a High Throughput 3GPP Turbo Decoder on GPU
Turbo code is a computationally intensive channel code that is widely used in current and upcoming wireless standards. General-purpose graphics processor unit (GPGPU) is a programmable commodity processor that achieves high performance computation power by using many simple cores. In this paper, we present a 3GPP LTE compliant Turbo decoder accelerator that takes advantage of […]
Oct, 7
Run-time Reconfigurable Multiprocessors
The main advantage in multiprocessors is the performance speedup obtained with parallelism at processor-level. Similarly, the flexibility for application-specific adaptability is the advantage in reconfigurable architectures. To benefit from both these architectures, we present a reconfigurable multiprocessor template, which combines the benefits of parallelism in multiprocessors and flexibility in reconfigurable architectures. A fast, single cycle, […]
Oct, 6
Divergence Analysis and Optimizations
The growing interest in GPU programming has brought renewed attention to the Single Instruction Multiple Data (SIMD) execution model. SIMD machines give application developers a tremendous computational power; however, the model also brings restrictions. In particular, processing elements (PEs) execute in lock-step, and may lose performance due to divergences caused by conditional branches. In face […]
Oct, 6
CUDA performance analyzer
GPGPU Computing using CUDA is rapidly gaining ground today. GPGPU has been brought to the masses through the ease of use of CUDA and ubiquity of graphics cards supporting the same. Although CUDA has a low learning curve for programmers familiar with standard programming languages like C, extracting optimum performance from it, through optimizations and […]
Oct, 6
Offloading Java to Graphics Processors
Massively-parallel graphics processors have the potential to offer high performance at low cost. However, at present such devices are largely inaccessible from higher-level languages such as Java. This work allows compilation from Java bytecode by making use of annotations to specify loops for parallel execution. Data copying to and from the GPU is handled automatically. […]
Oct, 6
Parallelisation of Java for Graphics Processors
The aim of the project was to allow extraction and compilation of Java virtual machine bytecode for parallel execution on graphics cards, specifically the NVIDIA CUDA framework, by both explicit and automatic means. The compiler, which was produced, successfully extracts and compiles code from class files into CUDA C++ code, and outputs transformed classes that […]

