14248

Contributions to Music Semantic Analysis and Its Acceleration Techniques

Boyang Gao
Ecully, Ecole centrale de Lyon
Ecole centrale de Lyon, 2014

@phdthesis{gao2014contributions,

   title={Contributions to music semantic analysis and its acceleration techniques},

   author={Gao, Boyang},

   year={2014},

   school={Ecully, Ecole centrale de Lyon}

}

Download Download (PDF)   View View   Source Source   

1871

views

Digitalized music production exploded in the past decade. Huge amount of data drives the development of effective and efficient methods for automatic music analysis and retrieval. This thesis focuses on performing semantic analysis of music, in particular mood and genre classification, with low level and mid level features since the mood and genre are among the most natural semantic concepts expressed by music perceivable by audiences. In order to delve semantics from low level features, feature modeling techniques like K-means and GMM based BoW and Gaussian super vector have to be applied. In this big data era, the time and accuracy efficiency becomes a main issue in the low level feature modeling. Our first contribution thus focuses on accelerating k-means, GMM and UBM-MAP frameworks, involving the acceleration on single machine and on cluster of workstations. To achieve the maximum speed on single machine, we show that dictionary learning procedures can elegantly be rewritten in matrix format that can be accelerated efficiently by high performance parallel computational infrastructures like multi-core CPU, GPU. In particular with GPU support and careful tuning, we have achieved two magnitudes speed up compared with single thread implementation. Regarding data set which cannot fit into the memory of individual computer, we show that the k-means and GMM training procedures can be divided into map-reduce pattern which can be executed on Hadoop and Spark cluster. Our matrix format version executes 5 to 10 times faster on Hadoop and Spark clusters than the state-of-the-art libraries. Beside signal level features, mid-level features like harmony of music, the most natural semantic given by the composer, are also important since it contains higher level of abstraction of meaning beyond physical oscillation. Our second contribution thus focuses on recovering note information from music signal with musical knowledge. This contribution relies on two levels of musical knowledge: instrument note sound and note co-occurrence/transition statistics. In the instrument note sound level, a note dictionary is firstly built from Logic Pro 9. With the musical dictionary in hand, we propose a positive constraint matching pursuit (PCMP) algorithm to perform the decomposition. In the inter-note level, we propose a two stage sparse decomposition approach integrated with note statistical information. In frame level decomposition stage, note co-occurrence probabilities are embedded to guide atom selection and to build sparse multiple candidate graph providing backup choices for later selections. In the global optimal path searching stage, note transition probabilities are incorporated. Experiments on multiple data sets show that our proposed approaches outperform the state-of-the-art in terms of accuracy and recall for note recovery and music mood/genre classification.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: