Towards a Unified Sentiment Lexicon (USL) based on Graphics Processing Units (GPUs)
Universidad Politecnica de Madrid
Mathematical Problems in Engineering, 2013
@article{barbosa2013towards,
title={Towards a Unified Sentiment Lexicon (USL) based on Graphics Processing Units (GPUs)},
author={Barbosa-Santill{‘a}n, Liliana Ibeth},
year={2013}
}
This paper presents an approach to create what we have called a Unified Sentiment Lexicon (USL). This approach aims at aligning, unifying and expanding the set of sentiment lexicons which are available on the web in order to increase their robustness of coverage. A sentiment lexicon is a critical and essential resource for tagging subjective corpora on the web or elsewhere. In many situations, the multilingual property of the sentiment lexicon is important because the writer is using two languages alternately in the same text, message or post. Monolingual Sentiment Lexicons such as: SentiWordNet, the Bing Liu Sentiment Lexicon, the MPQA lexicon and NTU Sentiment Dictionary, have been validated by researchers and their institutions. The first three are in English language and the last one is in Chinese. One problem related to the task of the automatic unification of different scores of sentiment lexicons is that there are multiple lexical entries for which the classification of Positive, Negative or Neutral {P,N,Z} depends on the unit of measurement used in the annotation methodology of the source sentiment lexicon. Our USL approach computes the unified strength of polarity of each lexical entry based on the Pearson correlation coefficient which measures how correlated lexical entries are with a value between 1 and -1, where 1 indicates that the lexical entries are perfectly correlated, 0 indicates no correlation, and -1 means they are perfectly inversely correlated and the UnifiedMetrics procedure for CPU and GPU, respectively. Another problem is the high processing time required for computing all the lexical entries in the unification task. Thus the USL approach computes a subset of lexical entries in each of the 1344 GPU cores and uses parallel processing in order to unify 155802 lexical entries. The USL approach extends the Unified Sentiment Lexicon by adding two lexical resources: 1) the Spanish Travel Subjective Lexicon and 2) the Pan American Sentiment lexicon. These resources were developed by our research group Communication in Specialized Domains. The results of the analysis conducted using the USL approach shows that the Unified Sentiment Lexicon has 95,430 lexical entries, out of which there are 35,201 considered to be positive, 22,029, negative and 38,200 neutral. Furthermore, the Unified Sentiment Lexicon has multilingual properties because includes terms in several languages which are classified in clusters such as: Chinese, English, Spanish, and Portuguese. Finally, the runtime was 10 minutes for 95,430 lexical entries, this allows a reduction of the time computing for the Unified Metrics by 3 times.
October 24, 2013 by hgpu