Abstract

Machine learning is a diverse topic nowadays which is used to solve many typical business problems in no amount of time. Business analytics has been a very important subset of machine learning where we perform various business problems and take down the analysis in order to have a good insight over any problem and also to solve the business problem with minimal risks.ASSESSMENT COVER SHEET

Purpose of the project

The purpose or the main Idea behind the project is to analyses the reviews of the products which are purchased by the customers from an online website called Amazon and also, we will perform sentiment analysis to understand how negative or positive the reviews are for a different item.

Approach

The project approach will be using a Vader sentiment Analyzer to determine the opinion behind product reviews given by the customer after purchasing an item.

Findings

The main takeaway of the project will be the analysis of emotion of every customer when they make a purchase of Apple products.

Business implications.

The project can be used in various fields in business domains and also for those companies which produces different products in different year. The solutions of the analysis might be an answer to understand the problem of the products faced by customers after purchasing it and also it can give an opportunity in order to increase the performance of the product.

Abstract 1

Purpose of the project 1

Approach 1

Findings 1

Business implications. 1

Introduction 2

Supervised learning 2

Unsupervised learning 3

Semi supervised learning 3

Reinforcement Learning 3

Graphical inference of the data 4

Discussion of Data source 5

Tools 5

Justification of Choice 5

Key Results 5

Visualization Results 6

N-gram analysis on Positive reviews 6

N-gram analysis on negative reviews 7

Business Story Telling 9

Conclusion 10

Project Limitations 10

Recommendations 11

References 11

Introduction

Sentiment analysis had been considered a very widely used topic in natural language processing and machine learning and sometimes it is also referred as opinion mining. It helps various business problems to overcome the situation where various products face different issues after the launch and it finds out the Real Emotion behind opinion made by customer after purchasing an item .During the modern ages sentiment analysis has been a hot topic of interest in most of the business domains such as the companies that produces medicines, electronic products and other machines and it also helps to understand the marketing strategies behind the products and after taking prompt analysis from the reviews made by customer. Sentiment analysis can be of great help to give out different insights from the reviews which are made by the customer after purchasing the item. Social media analytics has been a great study and also an efficient source from where different data can be collected such as the reviews of the customers and also the ratings of the customers made after purchasing the products and it gives us to understand the main cause of the customers and also the taste of the customers who are willing to buy the product or not.

The overview of the project mainly discusses about the sentiments which are driven out from the reviews provided by the customers. For such analysis the aid of Machine learning algorithms had been considered as it is not possible without it.

Machine Learning can be broadly distinguished into four types.

Such types can be termed as

Supervised learning

Unsupervised learning

Semi supervised learning

Reinforcement Learning.

Supervised learning

Supervised learning is an approach in which a label or an output is defined. An output is nothing but the target variables which are either used for prediction or classification based on the features or attributes. Let us discuss supervised learning using two examples

If we are to predict the prices of houses based on the attributes such as number of bedrooms, the location of the building, the area of the rooms, etc. this problem comes under supervised problems because price is the target variable in this problem.

If we are to classify whether a person is eligible for loan or not based on his income, number of children, liabilities, marital status, etc., then this problem comes under classification and is supervised as the target variable is known an is discrete.

Here are some of the most important supervised learning algorithms:

k-Nearest Neighbors

Linear Regression

Logistic Regression

Support Vector Machines (SVMs)

Decision Trees and Random Forests

ANN

Unsupervised learning

Unsupervised learning is opposite as the labels in these problems are not defined. This type of problems is often used for segmentation into groups or reducing the dimensions of the data.

Here are some of the most important unsupervised learning algorithms:

Clustering

Hierarchical Cluster Analysis (HCA)

Expectation Maximization

Principal Component Analysis (PCA)

Kernel PCA

Locally-Linear Embedding (LLE)

t-distributed Stochastic Neighbor Embedding (t-SNE)

Association rule learning

Semi supervised learning

Semi supervised learning is the combination of both the approaches. In this approach, some of the data are labelled while some are not. Some of the example algorithms for semi supervised learning are

deep belief networks (DBNs), Restricted Boltzmann machines (RBMs).

Reinforcement Learning

Reinforcement approach is neither supervised nor unsupervised. This type is mainly trained on the mistake it does by a strategy called a policy. Often rewards are given after each mistake in order to improve the learning process. A policy defines what action the agent should choose when it is in a given situation.

The project we will be working is a supervised learning method where we need to predict the labels in the form of sentiments. The sentiments in this case, will be generated by using a sentiment analyzer where it will distinguish the reviews into positive, neutral and negative.

The data contain mostly of reviews made by customer from a website called Amazon and also provides attributes such as the ratings and the votes made by the customer in different reviews. It should undergo text cleaning it is very crucial role in natural language processing and also, we should determine various other feature engineering techniques such as missing values present in the data and also show the sense of the text lies behind the data.

Some of the preprocessing techniques are given below.

Downsizing the data

The reviews contain both upper and lower words and it is considered untidy. For cleaning, we need to change all the words to lower case.

Removal of punctuations

Punctuations are symbols used in between the words and they cause problems while vectorization. This is why punctuations need to be removed for proper sentiment analysis.

Imputation of Missing Value

Missing values or null values cause problems while preprocessing texts. For this reason, we dropped all the missing values.

Removal of Stopwords

Stopwords are the common words occurring in a text. These words can be noun, pronoun, prepositions, adjectives, etc. Such words should be removed.

Lemmatization

Lemmatization is the process of converting words to its root form. The data in our case undergoes lemmatization and we have used WordnetLemmatizer to change all the words to its root form.

Graphical inference of the data

Graphical analysis or proper visualization techniques play very crucial role in data analysis and machine learning as it tells various insights about the data and also can provide various information which is not possible to attain after doing model building or other preprocessing steps.

A graphical analysis can be mainly done by two ways in a text data.

Word Cloud

Word cloud is a process which extracts all the commonly occurring words in a picture and the most weighted words most occurred are indicated by their sizes.

N-Gram analysis

An n-gram is a combination of words from a speech. These words are the words which occurred the most in a document or a speech. Mostly unigrams, bigrams and trigrams are used in n-gram analysis.

Discussion of Data source

The information and the entire data had been collected from Kaggle. The data contain about more than 58000 reviews of various smartphones that will provide favorable insights for different number of reviews made by a customer.

Here we have used the reviews of Apple phones purchased by the customers from Amazon.

Tools

Python

Python will be used throughout the building the model. Python is chosen as it is flexible in handling text data and the libraries are readily available which make very easy interpretation while preprocessing the data.

Tableau

Tableau which is one of the finest visualization Framework used in modern days among all other Frameworks. Tableau is the right choice as it is very easy to use and it is very efficient to do insights from the features. From business dashboards to business storytelling, it gives us an opportunity to understand the relationship between each and every data and the attributes present inside it. Tableau is chosen as it is very fine-grained Framework design to be used by anyone because of its simple and efficient drug and drop features.

Justification of Choice

There are many sentiment analyzing techniques from which Vader sentiment analyzer has been chosen as it is the right fit from this model. It tries to help us by finding on the polarity scores between the reviews and tell us what are the reviews are being positive, neutral or negative.

Python is chosen for this model as it is providing a lot of flexible libraries for data manipulation and feature engineering process. Compared to R, python seem to be fast and reliable.

Tableau is chosen on the other hand on behalf of performing various visualizations techniques which is not possible by Python efficiently or seamlessly like Tableau. Tableau give us an opportunity to perform attractive dashboards in storytelling within a very easy process and it enables us to explain the data much easily to any management and also business problems can be seen very smooth less compared to other frameworks.

Key Results

From the entire 56000 reviews present in the data we have observed that the most of the reviews which are five star are more than 34000 and it is very interesting to observe that the reviews which are less than 3 Stars or which are low such as one star reviews are more than 10000 from which we can tell that the sentiment analysis can be vary biased in order to make a good analyzer. After converting our data performing several preprocessing steps such as missing value imputation and removal of all the punctuations from the text, we have observed that the stop words are heavily influencing the data when it comes to sentiment analysis. So, we have used Vader Sentiment analyzer to build a proper sentiment analysis and we have only taken two labels which are positive and negative. After taking the two labels that are positive and negative, we have observed that we got a review of about 48527 and 7527 which are negative reviewed.

Visualization Results

The n-gram analysis of both positive and negative reviews is given below

N-gram analysis on Positive reviews

From the positive reviews it is clear that people are loving the phone giving compliment about the price and the service the Apple provides. Also, there is a mention about iPhone 7 plus which is mostly purchased by the customer and the count itself suggests that it has got good and favorable reviews. After the Purchase made by a customer, the other features which are also described by the customers such as the camera and the SIM card and the condition and design of the phones.

N-gram analysis on negative reviews

As we can see from the graph, customers who bought the phones are going though non-trivial issues such as network bands which are commonly mentioned in the negative reviews.

Now let us look into the word clouds of the phones.

Let us see the word cloud of the phones that are rated more than 3 stars.

From the wordcloud, the words such as ‘good’, ’fast’, amazing’, love’ have good frequencies which suggests that the customers are satisfied after purchasing those phones. Now there are also words such as ‘scratch’ and ‘crack’ which is unobvious and might claim that these phones are sometimes delivered having defects and the design condition is not so good claimed by the customers.

This statement will be tested by looking into those phones which got ratings of 1.

From the word cloud ,the words such as ‘long’, ‘missing’, ‘ locked’,’ quality’, and ‘battery’ claims that this type of phones which got ratings of 1 had very decent battery backup up or slow while using and also the words like ‘missing’ and ‘connect’ suggests that these type of phones miss some body parts after the purchase or have different connectivity issues.

Business Story Telling

Let us compare both the positive and negative trigrams side by side to grasp some of the analysis

The trigrams observed in the positive reviews is ‘great phone great price’.

The trigrams which are observed use in the negative reviews is ‘open sim card slot’.

From both the graphical analysis, we can take the following steps to increase the performance of the products.

Upgrading the network bands.

The customers claim bad network in most of the negative reviews. These reviews are often neglected which can be a mistake and upgrading of several networks might increase the sales of such phones.

Build quality

Most of the negative reviews are about defective body conditions which states that Apple doesn’t built good quality phones in terms of its design and durability .So in order to improve the marketing strategy, Apple must come up with a good design and build quality material in order to increase the durability and performance.

Sim Issues

From the trigrams, the maximum words occurred is about the warning which claims that most of the customers are suffering from sim issues.

Upgrading software’s and providing good connectivity for different network providers regular basis enables a customer faces smooth experiences which is one of the prime factors for successful marketing campaigns.

Conclusion

Vader sentiment analysis is seem to be very efficient and productive in our data where it successfully segment the data as positive and negative .It is also surprising to see that most of the products which are 5 star reviewed falls under negative category and the products which are less than 3 star or are as low as one star falls under positive category. This is because of the highly imbalanced data present in our data set. For such issues increase of the data may help overcome such problems. On most of the phones having positive sentiments many phones are also very low rated due to many issues. Some issues are discussed in business story telling where we have attempted certain measures to improve the marketing strategy as well as sales and also improve the design quality and various other problems faced by the customer after purchasing Apple phones.

We have also observed that some of the phones have SIM card and connectivity issues which is also an attempt to show that Apple smartphones have such negativity which are faced by the customers and in order to improve such problems Apple must come up with better sim quality and better network bandwidth which can provide an access to most of the network providers in specific regions.

N-gram analysis done by us also give us a better inside about the design quality of the phones and the compliments made by the customers after purchasing phones such as iPhone 7 plus. Preprocessing of the data is done in order to make the sentiment analyser most effective .Some other preprocessing data can also be done such as Stemming and POS tagging .Another preprocessing steps which is named entity recognition is also done in order to understand the names of various products manufactured as well as organisation where it was produced .From the sentiment analyser, we can also make the target variable as positive and negative and also we can perform various text classification problems in order to predict the rating of the feature reviews given by customer. Also, our objective is to pull out the sentiment from the reviews on which we did not hesitate to do it and also, we can do far more analysis after taking the sentiment out from the reviews. The other thing we have done is to pulled out word clouds from both the positive and negative reviews. Word cloud have us understand what are the words repeated in the reviews and it also gave us a sense of the customer faced after purchasing several items. Most of the word cloud suggested the design quality of the phone was not good and the positive word cloud suggested that the phone was smooth and fast. We can pull some other insights from word clouds and make an analysis in order to make our business storytelling much clearer and more efficient.

Project Limitations

There are some limitations of the project.

This analysis is only limited to such products purchased by the customer. Such issues can be solved by updating the data to recent times for attractive storytelling from the insights.

Sentiment analyzer predicted positive for those reviews which are negative. For such issues, low rated reviews must be separated or balanced in order to make unbiased estimates.

Recommendations

Update of data is a very necessary in crucial step in machine learning as it can often leads to problems such as overfitting and underfitting. The sentiment analysis made on the entire data which are also miss classified in some of the low reviews as positive and high reviews as negative can be changed by the increase of the data .Reviews on today’s market plays a very important role to change the fate of the product purchased by customer and it can give a very good hint in order to improve the proper marketing campaign and take precautionary measures in order to improve the problems faced by the customers after purchasing an item .Sentiment analysis can be done on different text body such as the headline for the products made by the customer on the reviews. They are also different kind of sentiment analysis which can be used to see the accuracy and efficiency as compared to Vader sentiment analyser.

Collecting live data from an online source is advisable as it can give an overview of reviews on those products which are released or going to be released.

Attributes such as votes of the reviews, time of the reviews, rating of the product are good enough to make a better analysis as these can be efficient to determine the proper opinions of the customers.

References

Wang, S. and C. Manning. (2012). Baselines and Bigrams: Simple, Good Sentiment and Topic Classification. In Proceedings of ACL-2012.

A.Pak and P. Paroubek. „Twitter as a Corpus for Sentiment Analysis and Opinion Mining”. In Proceedings of the Seventh Conference on International Language Resources and Evaluation, 2010, pp.1320-1326

R. Parikh and M. Movassate, “Sentiment Analysis of User- GeneratedTwitter Updates using Various Classi_cation Techniques”, CS224N Final Report, 2009

Go, R. Bhayani, L.Huang. “Twitter Sentiment ClassificationUsing Distant Supervision”. Stanford University, Technical Paper,2009

L. Barbosa, J. Feng. “Robust Sentiment Detection on Twitterfrom Biased and Noisy Data”. COLING 2010: Poster Volume,pp. 36-44.

Bifet and E. Frank, “Sentiment Knowledge Discovery inTwitter Streaming Data”, In Proceedings of the 13th InternationalConference on Discovery Science, Berlin, Germany: Springer,2010, pp. 1-15.

Agarwal, B. Xie, I. Vovsha, O. Rambow, R. Passonneau, “Sentiment Analysis of Twitter Data”, In Proceedings of the ACL 2011Workshop on Languages in Social Media,2011, pp. 30-38

Dmitry Davidov, Ari Rappoport.” Enhanced Sentiment Learning Using Twitter Hashtags and Smileys”. Coling 2010: Poster Volumepages 241{249, Beijing, August 2010

Tags: academicwriting, artassignment, assignmentaustralia, assignmentcanada, assignmentdeadline, assignmentdone, assignmentnewzealand, assignmentoman, assignmentseason, assignmentuniversity, assignmentwork, assignmentwriter, assignmentwritinghelp, bhfyp, collegeassignment, essayhelper, essaytime, essaytips, essaywriter, essaywritinghelp, help, java, onassignmentphx, photoassignments, pmphotoassignments, student, studentlife, students, writing

Abstract

Purpose of the project

Approach

Findings

Business implications.

Table of Contents

Introduction

Supervised learning

Unsupervised learning

Semi supervised learning

Reinforcement Learning

Graphical inference of the data

Discussion of Data source

Tools

Justification of Choice

Key Results

Visualization Results

N-gram analysis on Positive reviews

N-gram analysis on negative reviews

Business Story Telling

Conclusion

Project Limitations

Recommendations

References