Sentiment analysis or opinion mining is an automated process of understanding an opinion about given subject from written or spoken language. It is a part of Natural Language Processing and it analyses the subjectivity of text, including the general attitude of person towards a given topic. This is very exciting and novel area of development because it has many practical applications and the datasets are all around us: from review sites, forums, blogs, and social media; containing numerous public or personal opinions about products, brands, politics or any topic worth expressing opinion about. This unstructured information can be very useful in commercial aspects like marketing analysis, product feedback, customer service, etc.
What is an opinion? Every text information can be roughly classified into two main categories: facts and opinions. A fact is an objective expression about something, an independent description of real state without any alternation made by the person stating it. While, an opinion is usually a subjective expression of personal attitude, impression, and feeling toward some topic. Fact can be proven and opinion is more valuable if there is a fact supporting it. Therefore, the opinion mining can be handled as a text classification problem. Each sentence can be classified by subjectivity: objective or subjective; and by polarity: positive, negative or neutral expression. If you want to address it as a regression problem, you can assign polarity value to a sentence ranging from -1 (very negative) to 1 (very positive).
What’s so hard about sentiment analysis?
Sentiment analysis nowadays is normally around 85%, but we have to take in the account that degree of agreement between humans is around 80% because we have different perspectives and we can experience the sentences differently. Here, we’ll go through a list of problems that you’ll encounter during a creation of the sentiment analyzer. Some of them can be bypassed with a model selection but others are still non solvable.
When you are using a classical “Bag of words” model, it doesn’t work well with negations. Considering that word order isn’t important, the model doesn’t recognize the opposite meaning between sentences like “I really like this chocolate” and “I really do not like this chocolate”. It would probable give them the same score, marking words like and chocolate as positive.
Irony, Sarcasm, Metaphors, Jokes
Nothing weird, but computers aren’t able to understand a figurative language. Most of the times, even people aren’t really sure what was intended to been said. Sarcasm usually changes a positive literal meaning of word into something negative, or vice versa. Thus, the irony or sarcasm detector requires a good context investigation.