Skip to main content

A.D.A.M: A Multi-class Intent Classification Case Study

 

                                                                             Photo by Daniel Seßler on Unsplash

                                                                       

Objective

Select a suitable algorithm for performing multi-class intent classification within A.D.A.M


Simplicity over Complexity

While strategically designing A.D.A.M - (Adaptive Datacenter and Migration) voice-controlled AI virtual assistant , a significant amount of thought went into whether or not a simple model would suffice, or not. Suffice meaning, provide great results with minimal data, computation, etc. Could a simple model work? If so, how well? Thus, two algorithms were chosen as a start to analyze and compare against each other  and for this sole purpose: OvR (One-vs-Rest) Logistic Regression and Multinomial Naive-Bayes.

 

Intuition behind the classification algorithms

The two main techniques for approaching multi-class classification scenarios are "one-versus-rest" and "one-versus-one". 

In a "one-versus-rest" scenario, C separate binary classification models are trained. Every individual classifier is trained with an aim of determining the inclusion of each example as it pertains to being a part of class, c or not.


In order to predict a class for a new sample x, all C Classifiers are run on x and the class with the highest score is selected:

 In comparison, in a "one-vs-one", regression scenario, separate binary classification models are trained for each possible pair of classes. Hence, to make an inference on a new sample x, we would run every classifier on x and select the class with the highest number of votes.

(One-vs-Rest) Logistic Regression  

As suggested earlier, there are several methods for classifying an instance into k>=2 classes. One-vs-Rest Logistic Regression instantiates a separate binary logistic regression for each class, with an assumption that each classification model is independent. Whereas in A.D.A.M.'s intent classification case study, there are 6 independent intents or classes (as of this writing). Furthermore, the One-vs-Rest Logistic Regression model will define the log odds ration for each outcome and through a linear model (Whitaker et al., n.d):    

 

(MNB) Multinomial Naive-Bayes 

MNB is an instance of a Naive Bayes that is suitable for classification scenarios with discrete features.In addition, it is based on the Bayes Theorem, which inherently calculates the probability P(c|x), where c is a class that is composed of all possible outcomes and x is an instance that should be classified as it represents certain features (Puruula, 2012):

                                                    P(c|x) = P(x|c) * P(c) / P(x)

 


Proposed Model for Evaluating these Algorithms








 

 

 

 

 

 

 

 

 So? Which Algorithm wins as it pertains to A.D.A.M?

To give it away, OvR Logistic Regression ends up beating Multinomial Naive Bayes in 3 different metrics.  For further information and insights into this case study, please review the paper,  "Algorithm comparison for multi-class intent classification: A case study in A.D.A.M."


Next Steps

In my opinion, further research is always a given, since data changes and grows, new algorithms are born and current methods can always be improved. Stay connected to see what comes next with A.D.A.M.


 

 

References

De Loera, J. A., & Hogan, T. (2020). Stochastic Tverberg Theorems With Applications in Multiclass 

            Logistic Regression, Separability, and Centerpoints of Data. SIAM Journal on Mathematics of 

            Data Science, 2(4), 1151–1166. https://doi.org/10.1137/19m1277102

Puurula, A. (2012). Combining modifications to multinomial naive bayes for text classification. In

 

Asia Information Retrieval Symposium (pp. 114-125). Springer, Berlin, Heidelberg.

 

Whitaker, T., Beranger, B., & Sisson, S. (n.d.). Logistic regression models for aggregated data.

 

https://arxiv.org/pdf/1912.03805.pdf












Comments

Popular posts from this blog

ADAM - AI voice-controlled virtual assistant

  A.D.A.M – A daptive D ata Center a nd M igration Assistant A Solution to a Real-World Problem Description of the problem: According to the Ponemon Institute Research report, in 2016, the average cost of a data center outage was $740,357. What’s more, is that research indicates that 93% of businesses who have suffered an outage for more than 10 days, filed for bankruptcy in a year within the event.   As we know, Data Center problems and outages can range from hardware, software, change, incident, network connections, configuration, service provider etc., and can cost millions to resolve.    Solution to help make troubleshooting easier: ADAM -   is a voice-controlled virtual assistant that can help troubleshoot your Data Center in normal day-to-day operations and during migration windows.    How? Adam uses several AI techniques in the back-end. One technique is a neuro-symbolic approach to troublesho...