Scalable sentiment analytics

Bakirov, Aslan; Cogalmis, Kevser; Bulut, AHMET

doi:10.3906/elk-1311-128

Scalable sentiment analytics

TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, cilt.24, sa.3, ss.1560-1570, 2016 (SCI-Expanded, Scopus, TRDizin)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 24 Sayı: 3
Basım Tarihi: 2016
Doi Numarası: 10.3906/elk-1311-128
Dergi Adı: TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, TR DİZİN (ULAKBİM)
Sayfa Sayıları: ss.1560-1570
Anahtar Kelimeler: Sentiment analysis, MapReduce, Spark, Hadoop, Apache Mahout
Acıbadem Mehmet Ali Aydınlar Üniversitesi Adresli: Hayır

Özet

Spark has become a widely popular analytics framework that provides an implementation of the equally popular MapReduce programming model. Hadoop is an Apache foundation framework that can be used for processing large datasets on a cluster of computers using the MapReduce programming model. Mahout is an Apache foundation project developed for building scalable machine learning libraries, which includes built-in machine learning classifiers. In this paper, we show how to build a simple text classifier on Spark, Apache Hadoop, and Apache Mahout for extracting out sentiments from a text collection containing millions of text documents. Using a collection of 7 million movie reviews taken from IMDB, a Bayesian classifier was learned to predict sentiments for test reviews. Separate classifiers were learned on both Spark and Hadoop, i.e. our contenders for scalable sentiment analytics. Our empirical results showed that the sentiment learning task on Spark ran almost 10 times faster than the learning task on Hadoop.