Mining of queries for detecting user behaviour and intent analysis using machine learning Essay

Mining of Queries for Detecting User Behaviour and Intent analysis using Machine learning

ABSTRACT

Web search engines is at the peak of its usage today as every single person uses it to find something on the internet. Users may or may not be goal oriented when they search something on the internet. User intent or queries (query intent) is an important aspect for business people and markets to know whether they have pictched their product at the right place or not. The data set of possible queries is taken and is examined. The methodology used is to find how the goal specificity reflects the user behaviour using Pearson’s correlation coefficient which is one of the targets of the project. The user intent is found out using POS(Parts of Speech) tagging to label the keywords that belong to the query. Naive Bayes classifier classifies the user intent based on certain categories. The data set queries are split into words that are matched with a particular intent and then are trained and tested. Text mining and text classification is the done on the queries for extracting intent and classifying them.

Intent analysis is one step further process for opinion mining where the users goals and intention behind the queries are studied and classified to benefit the organizations in endorsing their products online. The intent of the user can be a query, a feedback or a complaint. Many deep learning and machine learning algorithms can be applied to mine the intent. The query analysis is done by collecting the data from google trends which is a site where the top trending searches around the globe are listed according to demography. The data set is preprocessed and then some natural language processing tools are applied to the queries to identify them under a category of intent. After the intent is classified some conclusions can be observed from it that what the user actually wanted, what is the most searched among the data that is being compared for a particular demography.

LITERATURE SURVEY

[1]

Opinion-Aspect Relations in cognizing Customer feelings via Reviews

Anh-Dung Vo et al. had proposed a method that evaluates the aspects of product reviews of customers. Opinion mining becomes an important aspect when it comes to assessing the reviews for profitable measures. Sentimental analysis is categorized into sentence and document level which shows the result in polarity for the reviews.

Natural language processing is done on the data for knowledge mining and further sentimental analysis is done. Precision and recall are calculated for laptop and camera reviews using nlp and the efficiency is tried to improve.

[2]

Query intent mining with multiple dimensions of web search data

Di Jiang et al., worked on the latent intent of the user that determines what relation exists between texts in documents for which query logs can be taken and analyzed using QTLM( Query Time Log Model). Conventional methods used for query mining haven’t shown significant results as such and so the approach of finding relations between queries is done using ROF( Result Oriented Framework) and TOF( Topic Oriented Framework)which employs QTLM.

These frameworks are applied to the URL’s, sessions and terms. ROF uses intermediate intent mining where each of the dimensions is analyzed separately and then combined.

[3]

Classifying and Characterizing Query Intent

Azin Ashkan et al., had developed a methodology to find the commercial intent of a user by using ad clickthrough logs to enrich the experience for the user. The user intent is devised through the query information on the web page. The query is classified as a navigational, transformational and informational query which is applied to the search engine pages with click-throughs of the web page and the query.

Ads features such as length, information are used for understanding the click-through rate of new ads using the current rate which uses dimensions such as anchor links. The URL of the Ad is targeted and the ad viewing experience is improved through decision rules generated.

[4]

Intent Detection through Text mining and analysis

Samantha Akulick and El Sayed Mahmoud investigated the text posts to classify customer intents that should be considered while marketing the products. Intents can be classified into many types such as query, feedback etc. To know and deep dive into the customer intention the algorithms of n-grams, and SVM algorithm are implemented.

The n-grams use a dictionary to classify negative and positive words and the model used is unigram and bigram model. The expressions in the text are found using POS tagging which uses patterns and then the SVM algorithm is used to train and label the data set.

[5]

Mining User Consumption Intention from Social Media Using Domain Adaptive Convolutional Neural Network

Xiao Ding et al., had investigated about how Social media plays a pivotal role in bringing out the users desires and demands. User consumption intent is important for businesses and to investigate this the CIMM model which takes in each word and converts it into a representation of input and the CNN network is applied to this to extract the local features.

Word extraction is done by intent extraction algorithm and the score for each word is calculated and the ones with the highest values are the intended words.

[6]

Predicting Intent Using Activity Logs: How Goal Specificity and Temporal Range Affect User Behavior

Justin Cheng et al., had developed a framework for intent analysis based on goal explicitness and time based user task. A survey was conducted on users activity on Pinterest and they were divided into goal specific and non-goal specific in terms of their usage. The time interval of usage was also surveyed based on the demography, current and the past activities of the user.

The categories of the search were split up and the most searched pins were found out. Metrics like clickthroughs were used to study the pattern of the user. The temporal range had categories like short, medium and long term. The correlation was found out between the two dimensions of goals and temporal range of the users.

[7]

Identifying Web Queries with Question Intent

Gilad Tsur et al., had presented an approach to extract verticals or sub categories in the web search that the user intends to search. Here the focus is on the finding question intent through Community answering sites. The first method applied here is the random forest classifier which can modify the query form.

The second approach followed is the word clustering with sees the position of the query which is an input to the first approach for discriminating words in the query. Error analysis is done using false positives and false negative scores.

[8]

Query Recommendation using Query Logs in Search Engines

Ricardo Baeza-Yates et al., had proposed a method that hints at different suggestions of queries for a user input. K- means clustering is adopted here and clusters of semantically same queries are identified and are ordered in accordance to their relevance. The relevance is categorized based on what the user has searched ( the query) and how much the answers of the queries have captivated the user.

Keywords, strings and url are the notions that are used based on the query. Considering a set of 15 queries the ones with maximum occurrence are given the weight accordingly. After weighing the terms the clustering takes place and the size of the cluster, similar keywords that represent the query are found out.

[9]

A Simple Model for Classifying Web Queries by User Intent

D. Irazú Hernández et al., extracted only the terms related to the query to classify them and find the user intent. Machine learning algorithms are implemented along with some classification techniques to evaluate the results. POS tagging is done to the queries. The notations such as entity names, query length are taken are represented as one vector.

The query classification takes place based on only the text in the query and naïve bayes and SVM algorithms are implemented on the test data set of million queries and both the algorithms are compared for their efficiency.

[10]

Query Subtopic Mining by Combining Multiple Semantics

Lizhen Liu et al., had worked on query distribution and phrase embedding. Clustering of queries for query mining is an important aspect for finding the users intent. Sub topic mining is done using the keywords ",titles and snippets for the users query and subsequently the sub query is extracted from these keywords using clustering algorithms.

The aspect extraction is done from the keywords and then the k- means clustering is done on the aspects and the metrics used are TSR ( Term space representation) and DSR ( Document space representation).

REFERENCES

[1] Anh-Dung Vo, Quang-Phouc Nguyen and Cheol-Young Ock ",Department of IT convergence, University of Ulsan, South Korea.

2018 IEEE Transactions, volume 6",2018.

[2] Di Jiang ",KennethWai-Ting Leung ",Wilfred Ng Springer Science+Business Media New York 2015 World Wide Web (2016) 19:475–497

[3] Azin Ashkan, Charles L.A Clarke, Eugene Agichtein and Qui Guo

ECIR:2009, LNCS 5478, pp. 578-576, 2009. Springer –Verlag Berlin Heidelberg 2009.

[4] Samantha Akulick and El Sayed Mahmoud Faculty of Applied Science & Technology Sheridan College Oakville, Canada. Future Technologies Conference (FTC) 2017 29-30 November 2017 | Vancouver, Canada.

[5] Xiao Dingy, Ting Liuy_, Junwen Duany, Jian-Yun Niez yResearch Center for Social Computing and Information Retrieval Harbin Institute of Technology, China. 2015, Association for the Advancement of Artificial Intelligence (www.aaai.org).

[6] Justin Cheng, Caroline Lo, Jure Leskovec Stanford University

2017 International World Wide Web Conference Committee (IW3C2), published under Creative Commons CC BY 4.0 License.

WWW’17 Companion, April 3–7, 2017, Perth, Australia. ACM 978-1-4503-4914-7/17/04.

[7] Gilad Tsur, Yuval Pinter, Idan Szpektor, David Carmel Yahoo Labs, Haifa 31905, Israel. International World Wide Web Conference Committee (IW3C2). WWW 2016, April 11–15, 2016, Montréal, Québec, Canada. ACM 978-1-4503-4143-1/16/04.

[8] Ricardo Baeza-Yates1, Carlos Hurtado1, and Marcelo Mendoza2 Center for Web Research Department of Computer Science Universidad de Chile.

[9] D. Irazú Hernández1, Parth Gupta2, Paolo Rosso3, and Martha Rocha1, 4 Instituto Tecnológico de León, México

2 Universitat Politècnica de València, Spain NLE Lab.- ELiRF, Universitat Politècnica de València, Spain PRHLT, Universitat Politèctica de València, Spain

[10] Lizhen Liu, Wenbin Xu, Wei Song, HanshiWang and Chao Du Information and Engineering College, Capital Normal University, Beijing 100048, China. International Journal of Multimedia and Ubiquitous Engineering Vol.10, No.12 (2015), pp.341-354.

How to cite this essay: