Effective Enterprise Grade Conversational AI


Transcript:

Thank you I it's good to be here. I'm Mady Mantha and the artificial intelligence platform leader at Serious where I own the vision and strategy for AI and machine learning. I'm al so, an applied machine learning architect. so, I design and build machine learning applications. We're really passionate about building artificial intelligence solutions that work for everyone and augmenting processes with scalable machine learning. so, that human systems can focus On Innovation and delegate the mundane. so, today I'm really excited to be here and to talk to you all about something that I love doing and that's conversational Ai and NLP. So, I want to start by doing a quick overview of NLP, which I'm sure we're all very familiar with here and then we'll dive right into the problem space which is what we were implementing last year. We'll talk about the tech stack that we employed some challenges that we saw before going into production with just deployments and al so, issues with NLP with respect to entity disambiguation and then we'll take a look at some results and then open it up to questions. So, you know NLP and conversational ai, is definitely not something new. It's not a New Concept by any means but conversational Ai and machine learning have been experiencing a Renewed focus in recent years and I think that's because we're at a seminal moment in Computing, you know, if you take a step back and think about it Computing or you know, the history of computing has always had major shifts every 10-15 years or so, it all started in 1980s with a personal computer it is really The way that we interact with technology then you know 10 or so, years later the internet arrived. so, the internet is arguably one of the biggest changes In Our Lifetime, right it democratize technology and information it changed the way that we do business it revolutionized Industries and you know, it changed, it just totally changed a lot of things and 10 or so, years later the smartphone arrived, now the smartphone led and continues to lead a radical shift in the way that we interact with technology and in the way that we interact with one another we're at another such moment now, you know where we're moving from a mobile first to an AI first world where we're going be seeing people interact with technology in ways never seen before now. What does it mean to be AI first I think it means that we're talking about technology that you can naturally converse with? I think it means that we're talking about technology that is thoughtfully contextual. You know, it learns and remembers things about you. It doesn't have to be reminded of it each time and it's able to offer a personalized experience. for you and I think it means we're talking about technology that learns and evolves and gets better each time. It gets more intelligent with each iteration. At the heart of all of this is NLP. NLP something that you need when you're working on building a performance search engine to Auto classifying docents and invoices and stuff like that to spellcheck to autocomplete. Basically any text intelligent application needs an NLP. so,, it’s a really exciting and really amazing area to focus your research on and I think that's why it naturally fascinates a lot of data scientists and machine learning engineers. so,, here’s our problem space. We wanted to implement a contextual assistant that augmented RIT and help desk team’s capabilities. so, essentially an assistant to drive up employee self-sufficiency that answered basic questions about, you know, routine technical issues things that the help desk team typically deals with but we wanted to go a little above that, you know, and go beyond simply answering Because any rule-based bot can do that. We wanted to build a contextual assistant that can go beyond answering questions because it’s cool. If you have bot that can answer questions, but when you can actually get contextual assistant to perform tasks on your behalf to actually do things for you. That's when you start to add True Value to any engagement. so, we focused on purposeful automation. We looked at things that you know, users and employees and people typically find useful repeatable processes that we could automate. Eight, so, things like opening a ticket. so, we integrated our bot with certain external third-party services, like servicenow Outlook and things like that. so, the bucket actually do things like opening tickets notifying stakeholders escalating up the escalation chain and so, forth and then we wanted to go a level above that and provide intelligent smart recommendations to enable real-time and and better decision-making. so, we're still working on this, but we got the Bots actually Do things like? Hey, it looks like 80% of your colleagues have upgraded Microsoft Office this weekend and they've reduced. You know, a lot of latency issues. Do you want me to upgrade for you delegate the mundane so, you can focus on Innovation. so, when we were, you know, focusing on trying to surface this type of right information at the right time and a channel that people actually use so, in the right channel, we started seeing some issues with entity disambiguation. so, what that Means is you know, let's say You're Building A travel assistant and you want to say I want to book a flight to San Francisco, or I want to book a flight to Chicago and you don't necessarily specify the so,urce and t you just specify the destination entity or if you were to say something like hey, I'm on my cell. I can't talk right now. Sell here could mean lots of things, right? It could be in biological cell or a cell phone and traditional neural networks. typically can't they don't have a whole lot of context to go off of because they're stateless. So, they don't have persistence but human beings you and I when I say I'm on my cell you probably know what I'm talking about because your thoughts are thoughts have persistence but Lstm's recurrent neural networks attention models Transformers. Their networks have loops in them allowing information to persist. so, they're able to store context which is why we started looking at, you know crfs LS TMS encoder/decoder models language models Transformers and so, forth. And we zeroed in on Bert. Now Bert is a pre-training NLP technique that was released by Google and this time last year when we were looking to implement this it was really state-of-the-art. A lot of people saw promising results with it. so, we thought cool why not we'll go ahead use Bert and one of the cool things about Bert is that it has this ability to pre-trained bi-directional contextual language representation. It's modeled on a large Text corpus and as a result this pre-trained Rick model doesn't need you know, it only needs one. It can be fine tuned to have just one additional output layer doesn't need many and with that you can create state-of-the-art models to do language inference question-answering, autosuggest and named entity recognition other Downstream and NLP tasks. So, we leveraged Bert specifically for any ir or named entity recognition and autosuggest. so, I want to show you how we integrated our Pipeline with birth I should Burt I should say that we use the Microsoft bot framework as our Enterprise draft framework and we integrated that with raw, so, we use Raza 4nl you and Russell really gives you a way to cherry pick all of the all of the entity components that you want to use so, you can use dpling entity extractor. You could use CRF you can al so, use like an entity map. Which is can map known synonyms. so, we we took advantage of rasa's flexibility here and then we customized it to include Bert within within our pipeline. Now. I want to show you how Burt it's resulted in compare it to other language models and other extractors like Stanford and ER or any are so, before we do that, though, I want to be mindful. This isn't a complete compari so,n by any means are you Is case ranging from HR to it a travel bot. We tried it out with lot of data and you know a lot of really intuitive conversation and it seemed to work in our use case, but obviously Spacey and Stanford any are great models and you should take advantage of all of them.

So, I want to show you this oh, and I al so, want to mention that this custom-built component that was added to the annual you pipeline. It was a rapper to The Bert service because I think some of you, you know are familiar with this bird does have some performance issues, right? so, when you're trying to load birth the request time, the response time is somewhere between five to six seconds and that so,rt of latency can be unacceptable when you're talking about conversationally, I or really most operations. So, what we did was we added this as a rapper to point to the service and then Service was loaded to memory. so,, that was a work around for us to get around the request processing so, that the first request was somewhere between five to six seconds and then every consecutive request was almost instantaneous. so,, this is Bert and we're looking at like a, you know travel. A travel bot. so, if as you wanted to say, you know book a flight to Munich, can see it was able to recognize the entity. But now let's take a look at any are CRF and you can see as you can see here. It wasn't really able to recognize the entity here. And then now we're going to take a look at Spacey which is al so, like a really cool model and it wasn't able to recognize the entity when you use the same example and you said book a ticket to Munich. So, that's the you know, the results that we saw with Bert, so, we were really happy with the way that it's all for entity disambiguation of the other things that we looked at. It's a performance model. And with the workaround that we had the request processing time was al so, pretty low.

And now I want to show you the end result, right which is a custom contextual travel assistant that can really deal with messy hand behavior. so, language is messy, you know, there's 40,000 different ways. You can ask about the weather. One of my favorite ways is when people say is it going to rain cats and dogs today? so, how does your model account for such complexity? Will you do that by making sure that your model can look at context switching both? Narrow and Broad you want to do that by? By ensuring that your model can handle chitchat and things like that. so, this is a travel assistant and it's going to ask you a few questions in order to get some information so, we can book a travel experience for you. so, it wants to know where you're going when you're going and who's going with you and so, users will say yeah, it's just me, but then they'll change their mind. so, think of it in the context of like, you know, and ITR help desk by people will say yeah, I want to open a ticket and then they may change their mind or you know, In the financial sector people say, here's my account number; actually no that was the wrong account number that I gave you. Here's the correct one.

so, you want to be able to handle Corrections. You al so, want to be able to handle narrow context which is you know, the assistant asked where do you want to go? And then the user said why do you want to know, you know, people will say why do you No, my account information, you know, why do you need to know this? so, you want to be able to handle that especially if you're thinking about using conversational AI to augment certain teams capabilities you want to be able to handle this sort of messy and not always a happy path unhappy past so,rt of scenario. so, I was able to see answer why it needs to know that information and then it nudged the user back onto the happy path. So, that's really what you want to do. Okay, and then after we got the box really be able to handle chitchat Corrections both narrow and Broad contacts and messy hand behavior. We played around with natural language Generation Now NLG is like a really new and cool space and a lot of people are doing a lot of good research with it. If you guys have, if you've been to the chapel conference in New York, you may have heard sander. Woobin give a really informative talk on natural language generation and a lot of really intelligent and talented people are focusing their efforts in Bayesian program synthesis or this this Concept of Ideal learning and transfer learning and reinforcement learning and stuff like that. so, we played around with an OG if you've looked at open a is GPT to language model. You may be familiar with some of this. so, I tweaked that language model to generate, you know, natural language and because you know, we're still trying to solve this problem of generality, you know NLP, think one of the biggest problems today or one of the biggest areas of focus today is not necessarily. E accuracy because I mean it's very important but a lot of models are accurate now, we're trying to solve the problem of generality and scalability. so, NLG is not at a place where it's generalized. You can't necessarily use it for instance, you know a legal, but you definitely don't want to have your, but give legal advice or if you were here yesterday at the press conference. You may have talked Tiago from you, / you know, and he's working on a really cool emotional context. Assistant so, you probably can't use NLG for that right, but I used it just for Chit Chats. It was able to you know respond back on its own and it’s somewhat relevant, you know, although it didn't say the type of music that is was into was able to recognize that and then, you know engage in small talk. so, energy is really cool. We just played around with it for fun. And this bot is something that we're working on for research purposes. so, once you know, you have a really solid architecture you have performant NLP that works, you're able to use your tested because with conversational AI you want to be able to use your test as much as possible and as often as possible and don't wait until your Bots perfect to test it out because you know, it's you're never going to get to a perfect bot, it will always have flaws but you want to be able to use your test. You're a contextual assistant. so, once we did that we were and you know, the contextual assistant was able to Your questions it was able to do things on your behalf. It was able to offer personalized recommendations after we got there. We thought we were ready to go to production and you know, we wanted to go to Pride and obviously when you're deploying any kind of so,ftware application and you are going to production in an Enterprise setting there are lots of challenges involved and we Face a lot of those same challenges as well with conversational AI it's a little more complex than that. So, we obviously followed best standards and practices. We wanted to use CIC to automate our deployment because we were going to production almost every day and we coated with security in mind. so, we made sure that we were handling the keys and certificates and personally identifiable information, you know carefully and in a way that adhere to best practices and standards. But as we stood up this one contextual assistant, we wanted to stand up some more because we found some more use cases. So, we really wanted to automate that entire process of going to Action, so, we built our own service called scone serious conversation experience and scone it starts with initializing a git repository and then it gives you a way to automate your CI CD pipeline. It creates those configurations for you and it hooks into any chat Frameworks whether it's Microsoft stuff or rest is stuff it hooks into that. And if you're using Kubernetes to package your Bot, what we did was we used Kubernetes. Teas and AKs to package it and it auto generated all of the Kubernetes deployment files. so, this is something that saved us all a lot of time when we were working on this and make things easier when we stood up additional conversational AI. so, once you go to production, actually I want to show you a quick video of scone and how easy it can be to bootstrap contextual assistant. Now, this is specific to the buffering work to Louis by like I said before we because we're using rasa for some of our NLU stuff we plug this into Russ as well. so, you'll see that it's pretty easy. It's going to ask you a few questions and then bootstrap a but create the project. And for you initializing the git repository and stuff like that. There you go. so, again, if you were here yesterday, you would have heard the importance of Interactive Learning and really testing your body and making improvements as you go along. so, one of the key things is when you deploy you have access to all of this really valuable real user data, of course you have access to some of it when your user testing but we store we started by storing and tracking all of these conversations and we used all of that data to learn from it to make our kind of contextual assistant a lot better by learning. These mistakes and then we trained our model we added some more happy and unhappy Pat's, you know, whatever was applicable in that sense. And we noticed a pattern with the feedback that we were getting because we asked users for feedback when we were testing our contextual AI so, users. Typically when they did give feedback they would either say Hey, you know, I asked you about, you know how to download Zoom, but you told me you know, what Zoom is so, it's not really relevant. Even so, we wanted to fix some of those irrelevant answers and then there were cases where the Bots simply wasn't able to answer which I'm sure you're all familiar with and if that fit within the scope of our use case that we added those answers and then we would get requests for new features. So, you know, hey, you thank you for the ServiceNow link. But if you more useful if you actually opened a ticket on my behalf, so, if we got enough of those requests, then we took that into consideration and then we added that particular integration and we got the bot to perform that Particular skill and of course, we trained and retrain our model and put it back into production. As we were doing this, you know, we really wanted a way to continually test both linear and nonlinear conversation. We wanted a way to track all of the user conversations and visualize them and we al so, wanted a way to plug that into our CI CD pipeline so, it could run integration tests before and after any kind of deployment. so, we built our own training and operations manager and as you can see it allows business to really visualize all of Metrics, you know track the things that are important in conversational AI like total number of users, you know user retention abandonment rate and things of that nature and Tom is al so, an end-to-end holistic tester. so, that's what we really wanted to focus on a way to automatically test and generate test scripts. so, we use this to really, you know, make the process of continually monitoring and improving your conversational AI faster so, at this point, you know, we had performant NLP. We had contextual assistant that could do go beyond answering questions, but actually doing things and offering proactive advice at the right time. We made sure that we continually tested it and improved it. We were deploying to production each day. And so, we know that a lot of things go into conversational AI, right, you know, there are a lot of things that make it successful life performant NLP seamless integration with apis. Purposeful automation that actually solves what users want that need solving and building a custom domain knowledge base that's relevant to your industry your vernacular. So, with all of these pieces, you know, we got a little bit better at deploying maintaining and improving conversational AI but it was a really exciting project to work on we had a we learned a lot and we had a lot of fun doing it. So, you know, I look forward to hearing more from you and learning a lot today. So, thank you. Thank you. All right. We're going to do fires fast fire. First five pans. I see up one two, three four. Can I get a lady eating ladies? All right five. All right. So, I see you're using GPT to and other languages for small talk. How do you see that being implemented for a specific context in the future or what barriers are there to being able to do that specific context? You said? Yeah. Yeah. That's a really good question. I mean, I think that something that a lot of us are focusing our efforts on right now is how do you use like a language model? That's you know modeled after an entire Wikipedia Corpus Texan and make sure you stick to context but you know. You teach you with a pie chart implementation will still be able to identify intense and respond with what it what it knows. I wouldn't necessarily recommend using it for just any use case. We thought it would be okay for small talk because we can't really validate a lot of the information but I would start with maybe deploying it as a pilot project and one particular use case make sure you validate that conversation that flows through and make those improvements. But yeah, I mean that’s something that we're all trying to solve. But yeah, great question. Thank you. Hey, I have two questions. My first question is your framework work with dialog flow? We haven't tested our implementation with dialogue flow right now, but it should be able to because we were able to plug it into the Micro soft web framework pretty easily. Okay? Great. Thank you. My second question is, does the framework include any performance testing capabilities against the training data, you're talking about Tom doing performance testing, right? We're only testing the conversation aspects and we have; you know certain unit testing Frameworks and you know bdd sweets that we use outside of but in terms of like load testing where you know, Tom does not include that but that would be a feature that we're looking at. Thank you. Yeah. Thank you. number three Hey all my questions going to come in a little bit of a tangent here. I work in a very large OEM Automotive or am so, I have like an exclusive and sort of like and own customers. We need to convince lot of the work that you're doing is analogous to the work we do as researchers, but it's always difficult to communicate the progress of the work to the executor. so, is there anything within your improvement pipeline that could actually communicate the executive level? It's them essentially about a specific Voice or chatbot related project. Yeah. mean, you know, we've worked on voice implementations we've worked on use cases where you just keep it very high level and speak to Executives in terms of like reporting and stuff like that. But as of right now if you're talking about scone or Tom in addition to just high-level metrics was you would you could you know, what you saw there, you know, Tom itself does not include that type of repair reporting capability at the moment before who is number four. I have a quick question. Can you elaborate more about your testing? Is it more unit testing or end-to-end testing or model testing? And generally, what you suggest is best way to test like conversations make sure the flow is not disrupted with your last changes and things like that. Yeah good question. so, right now we're doing end-to-end testing of the entire workflow and we're al so, doing testing of In your conversation, which is just question answering and then we're testing nonlinear conversation, which is dialogue. so, it will Tom will generate lots of combinations and permutations of where you know dialogue could go it will Loop through so, me of those happy and unhappy pets and test all of those invalidate them and if the answer matches what it expected then that's a pass and if it doesn't it's a fail and then we look into that. We look into why it failed and make those improvements right there. So, those are some ways to test. I'm sure. You know user testing is very important because we're still talking about conversationally AI in which has a lot of human intervention. We want human systems to still retain agency and we're only automating and you know delegating things that are repeatable right? so, I think user testing is very important. I would say, you know test it as often and as much with real users as possible and of course if you're talking about testing software applications, then you know, some of the applications that we've worked on we've used unit tests and integration tests, you know mocha chai Jasmine stuff like that. You touched for a moment on the feedback that you get from users and specifically you kind of hinted at mismatch of intense, right? so, I was wondering what you do to try and sort of mitigate negative experiences for people who have that mismatch right? Like so, you're getting data when the mismatch happens and they say hey, that's not what I meant. But what sorts of conversational elements do you use to try and make the user feel? Little bit. Okay about what just happened? Yeah, great question and I'm sure Dustin dies going to be up here a little bit later to really Inspire us all about the importance of writing good copy and dealing, you know and failing gracefully and dealing with all of those fallbacks. so, when you get an irrelevant answer and you were to say and if the user says hey, that's not what I meant. That's not what I asked you. We have like a fallback mechanism added in there to say. Oh, I'm sorry. Can you rephrase your question or something like that? That and then they try again and hopefully your contextual assistant is able to answer it. If not, you know another round of fall back. We typically do two or three levels of fall back.

And then at that time if you're contextual assistant has to be routed to human will say, you know, this person can you know take care of you? I’m sorry for any inconvenience or something like that. That would be one way to do it and that's why I think monitoring your conversational AI your all of the user conversations that happen every day is very important because you want to be making continual improvements and if large unsupervised models obviously learn after you stopped teaching them but you still want to validate them and if you're talking about supervised models, then it's going to stop learning when you stop teaching. So, it's important to continually, you know, look at them and test and improve them. But yeah writing good copy is vital like you saw an LG. We're not really going anywhere with it right now. We're not close to solving the problem of generality. so, I think that writing really good conversation is very important to ensuring that users have a good experience because if they don't like what they see chances are they're not going to come back and talk to you about again round of applause for Mady.

© 2017 by Chatbots Conferene TM.