top of page

"How TurboTax plans to make e-filing your taxes easier with Chatbots" Casey Phillips


Hi everyone. I'm happy to be here and happy to share a successful Chapel case study from into a Turbo Tax. I'm sorry if a little bit about myself I've been working as a product manager in the chat bar space now for almost two and a half years into it. I'm responsible for leading the implementation of AI and chatbots for their consumer group product portfolio, which is TurboTax turbo mint and I'm just really active in the chat pod space Avid blogger a speaker on all things chat Bots AI which is obviously why I'm here today. So meet the TurboTax digital assistant. This is a chatbot that we developed to help users see file their taxes easier in TurboTax by automating customer support this replaced the previous experience of a search panel. Basically connect your standard rather than male knowledge based search users. Ask this Chapa anything from product support questions, which could include how do I clear and start over? How do I save and exit to actual expert level tax questions such as what is box Dinformes de 8394.

I don't actually know what that means but users will ask it some of the business goals were targeting. Obviously we want to reduce the number of support phone calls that go to our call center. We want to increase the self-help content engagements who I'm more users to engage in self-help rather than just saying that they want to contact a human on the phone and we want to increase product conversion. So we want users that in Act with this chat Bots to be more likely to actually complete their taxes with TurboTax and obviously pay because every company likes to get paid so the old versus the new experience. So this right here is the old experience. Like I said, it's just the run-of-the-mill knowledge base search and then you have the new experience obviously slides a little bit cut off but that is the experience that we're going for now with the chat Bots for self-help.

So, how are we doing? So we did a 6-month A/B test with the chat box compared to the previous experience, which was the knowledge base the search panel the results were great. We saw a significant reduction in support phone calls and the rate at which users were creating support for support calls.

We saw a increase in self-help content engagement. So more users are actually interacting with and using the chat box when they When they start open up on their screen, then when they actually saw the search panel that you saw on the previous screen and we also during the test. It showed potential to actually increase product and version which is huge. It's very rare that a health product or self-help product actually drives product conversion for a term for a product as big as TurboTax. So that was really exciting and obviously probably one of the most exciting things for me is that users seem to enjoy the experience so much that they're actually sticking with TurboTax as their e-file tax preparation software of choice. So this A/B tests we ran it started around mid-October which is when the small business tax deadline begins and then we wrapped up mid-April after the standard tax deadline. So how we got there to those results? So we focused on creating what I think is a really exceptional multi-tiered fallback experience, which I will elaborate more on later, but we opened up with our fallbacks experience gang users access to the entire community of into a self-help content, which is millions of pieces of content.

We targeted areas of high confusion and create disambiguation intense and flows to help guide users what they need in those areas of high confusion.

We also through our design guided users to use short specific responses that are that are ideal for dialogue flow to process and intent match on we also incorporate personalization of suggested topics and kind of conversation stars in our welcome message and that's based on relevant user data that we feel gives us insights into what the user is going to be looking for the most so example is what type of tax filer they are or other relevant data such as if the users tax return was rejected by the IRS. And lastly we provide a clear path for users to contact a human if they got frustrated by the chat Bots, or maybe they just hate chat box, which I have no some users seem to So this is a look at the wild west as I call it of TurboTax self-help intense. So in this test the top 10 user intense appeared in just over twenty one percent of the total chat sessions. So obviously this required the need for us to create an exceptional fallback experience honestly outside of the top ten. You can't I think it's cut off on the screen but intent number 10 showed up in like 0.39% of the conversation so it drops off very quickly. So honestly like the there's a seemingly endless number of total unique user intense. I'm still amazed by some of the tax questions that users will ask this chat box.

So we really have our work cut out for us in trying to handle this because it's just so ambiguous in terms of what users will ask So that brings me to our fallback experience. So like I said, if we if we only have like 20 percent of the intense are common recurring ones. We need to be able to find a way to respond when users ask something out of the blue because that's happening more often than users asking something that we expect or our models are trained on so when users don't match and intend to the crib, we have chronic a multi-tiered fallback approach where we actually look at several services in our into a Content Library and return a result on the services and determine what is the best piece of content to serve to the user our very the very last level this multi-tiered fallback approach is just kind of a standard FAQ search. We will actually present to the user what we think are the top four FAQ articles. Obviously, that's not the most ideal because I have seen a lot of conversations that instances. A user will ask a question. The first search result will be perfect. It'll be word for with the users looking for but they won't engage on and they won't click on it.

So it's not a perfect solution. It's something that we're trying to work on getting better at but it certainly beats the chat box coming back and saying I'm sorry. I don't have an answer for that. Handling ambiguity and confusion. So we noticed a lot of troubling Trends were users would ask questions or we just didn't have enough information to know. What is the intent that they really have you can see it's just cut off on the screen. But in this case if a user just said help with a refund, well, that's hard because what kind of help with a refund you want. Do you need help with an IRS or refund? Do you need help with the state refund you looking for a refund on terror attacks product? You want to learn about the Turbo Tax Refund events program.

That's not enough information for us to actually be able to answer the user. So when it comes to these types of really short utterances or not exactly clear. So what the intent is. We’re crying disambiguation flows and dissipate disambiguation intense and the goal that I have with this is we want to cast a wide net rain and all these really short utterances that we really don't know the users looking for help eliminate confusion disambiguation and get the users to what we feel are the most the top or the most likely fa Q's that they're going to have when we really don't have enough information to know exactly what answer they need. One thing I mentioned that we also do is oh, so we really try and give examples to the user through the u.s. Conversation design about how to actually interact with the chat Bots and how to ask the questions. So we will actually want to use their sayings on us a question. We will give them examples of how to ask a question examples of questions that work and let them know that we want them to be short but also giving enough details and be specific enough so we know what answer we need to give them.

Personalization is really King for us and something that we're trying to get better and better at what we've really incorporate. A lot of personalization is in our welcome message. So we use personalization in the welcome message to really try and get users to engage and go down the path of self-help and to be able to get their answer without having to speak to a person over the phone and one of the ways we do this is by customizing some of the response options or kind of like to call conversation starters we present to users in the welcome message. So as you can see there on the screen, those are two different users and they see different conversations stars for the second and third choices and that's based on a lot of information about the user what type of tax filer they are Etc. One of the things that we're currently working on building towards is actually personalization based on the user's browser metadata for the so customizing those conversation Stars based on what screen in Terror attacks the users on and also analysis on their clickstream.

So if we do more clickstream analysis, we could know what problem they're most likely having and then actually surface that right there for the user. So it's right there in front of their face. As I mentioned again, we always make contact and human option for the user and we make it clear and easy for them to find so we don't hide the ball as we kind of call internally and we make it clear that they can get in touch with the human.

We also ask the user what their question is first before we actually get them too human and run it through all of our care of our call group and support routing and actually determine who's the best agent or the best call group and Route directly to them. So we're actually helping these are our we're not just sending them to a standard ivr number. We're actually getting them in touch with the best support agent for their needs. So what's next for us? So some of the things we're trying to work for and build towards displaying an FAQ answer like a normal conversation flow when the search confidence is high enough. So that again kind of is to mitigate the issue that I mentioned before where we will present those FAQ search results to users in the fall back. And sometimes the perfect result will be there, but they won't click on it. So we want to identify the like what those perfect results are like identify when the search confidence is high enough take the FAQ answer and parse it into the chat box conversation. So it looks like a conversation flow that was crazy a dialogue flow, but actually came from our online content Community.

We want to start to answer calculation relay questions. Did you get a lot of those so that will be incorporating with another in? Not into it service comprehensive Flow versus flow A/B Testing. That's why I'm really excited about right now. We only really have high level chappa'ai versus chatbot be testing but we really want to get the point where we can start to make edits to content and see if we change the content on one level will conversation flow was that doing? How is that impacting whether the user context us or not?

How is it impacting how the user response it really gang that granular level of A/B Testing that we don't have yet potentially in the future possibly proactive support? So actually having the chat bot instead of the user having to actually right now they have to click the help button and then the widget opens up when the chat bar conversation starts, but actually having that the widget pop open automatically and starting the conversation with the user when we're confident that the user struggling and we know how we can help them. So again that's going to be leveraging the browser metadata and a lot more clickstream. This and lastly probably the hardest thing and maybe too big a wish list item for me right now, but domain specific intent matching. So one of the biggest challenges that we have I think a lot of people have in the NOP space is how do you handle a long tail utterances? So ideally we want to go to a point where we're only focusing on the portion of the utterance that's relevant to our domain of tax filing.

So this utterance is like a very common one that we see before where a customer will say, I've been a loyal TurboTax customer for 10 years and today while I was eating breakfast yada, but all that stuff is great. It's great to hear that. They're loyal Terror attacks customer but dialogue flow. That's not going to help Dalek full match to that to the right tent. So like in this utterance all we really should be caring about is this last bit where these are says they gotten are trying to import a second W2 so We're trying to get domain specific intent matching and to be able to overcome the issue of long tail utterances and the issues they posed for intent matching and that's all I had. Questions All right. Here you go. Hi.

Okay. I have kind of a two-part question. So in regards to the like really long tail questions that you get is there ever a time where you don't provide them with a fallback experience? It's like there's no top match at all and or in cases where there's like a really great match and other matches that aren't as close to the result and how do you decide when to show each of those? Yeah, so to answer your first question with a long tail utterances, so they still go through the fallback experience which is why like we need to get more like domain-specific with aren't a matching like ideally I prefer that when we do get those long tail irises that even where we're in the fall back experience are only sending what's actually domain-specific to tax filing to we use Bing search the search like our entire FAQ Library.

So I prefer to do that because you will see instances where someone will sit like that example I show or something I've been a loyal to end. The first FAQ result is one that has like someone else saying like I've been a loyal TurboTax customer for Success, so it's not perfect. So they do go the fallback experience, but there have been a lot of cases where a lot of times users have gotten good fallback experiences with the long tail, but it definitely is a problem. So the I'm getting more domain-specific for the fallback experience would be helpful as well for your other question in terms of so right now the we use Bing search but we also have like several other layers that we use in our search algorithm to determine what FAQ article to serve.

Number one a lot of is like searching through the content of FAQ article and seeing what we think is more relevant and then also looking at whether or not the article was approved by Superuser then also like the helpful, right so users are able to indicate whether a piece of Epic you cantos help for not so something has maybe like 8,000 up votes.

That's probably going to appear higher in that order then a similar article that maybe only has a hundred of votes. Casey hey, it's Rob Lugo. Nice to meet you in person. Nice to meet you great job. So back to that thing about dissing disambiguating something like help with my refund a human would disambiguate that by saying what kind of refund for the reasons you explained. They wouldn't say.

Oh, I don't know which kind of refund so just go look at the fa Q's these might apply that's putting the work on me. So my question is what about a follow-up intent that says simply which kind of refund that's an easy thing to trigger with the word refund and then the suggestion ships could say tax refund on it into a product etcetera.

Yes, that is something that we have explored. We still found some issues with as well or even like when we ask for that clarification sometimes users just don't they still don't respond in a way. That's me for think a lot of times you just get frustrated seen cases where user will just like keep repeating one word over and over and over again.

So what we're really trying to do when we create those disambiguation intensive close to just really cover the vast majority. So like in that case like those five questions relate to refunds probably make up like 80% So I would rather serve something like that and cover 80% of those questions then give another potential opportunity for the user to provide an answer that's not helpful because we have tried that we have tried to approach we asked some more detail and the users will either ask something in a manner that are you know dialogue flow wasn't expecting or else again.

They just won't provide them with the details we need or it could be an issue where they do all those long tail. It says where they complain about, you know their day or how much they hate taxes and then ends up falling through the loop. So we're really trying to in that case like minimize any like second opportunity for getting feedback from the user that isn't helpful.

But that is a great point of saying that we have experimented with them. We just haven't really perfected that year where it's been quite as helpful as we would hope. Hi Casey. So today there's been a lot of conversation around team structures in order to build a successful chat pod. So it was just wondering what kind of team structure you were working with when you deployed and built this bot and maybe some insights on what worked and some lessons learned with that team structure. So what structure team structure team? Yeah. So like who we have on our team how big the team is? Yeah. So more in terms of just like I guess the different roles as well.

Like there was some talk today about having conversational designers versus call me writer’s developer’s analysts’ etcetera. Okay? Yeah. It's definitely a / a cross-functional team like we have like a team that's more focused on like the data science and the natural language understanding we have a few people that Focus solely on writing the chat box content and owning the chapel content. We have a few people that are focused more sound like the rules and logic like chatbot the conversation designers. We also have like some product managers obviously and then we have a lava Engineers are focused on like the front end like the visual aspect where you see and then on the back and creating those Integrations and apis with other different into it knowledge services that I mentioned. So it's a pretty big team and it's cross-functional. It's yeah the I'd say there's probably around maybe 50 people that have a pretty fairly sizable stake in this All right. We got one more question.

It just so happens to be the closest person that I can reach. Here you go, sir. This is this is the big deal. This is the last question. Are you sure? It's good? Okay, no pressure. No pressure. Here you go. I just wanted to get your thoughts on doing like a fool filing using a chatbot.

Just Yeah CS thoughts on using the chat box for the entire filing. Right? I mean, I think that's a sign that we may be talked about. But that's kind of a much bigger business decision them quite involved. I think right now we're just mostly focus on mastering the self-help portion of it.

You know, I think there might be opportunities to gain more conversational with the entire tax filing process, but I think that's a much longer Journey for us and like several years on their oh right now really just trying to nail the self-help process, you know, this goes learn from it and see if we can apply it in other areas of the product and have it work great as well. So this potential for it, but I'm not the person that could really give you the right answer for that. All right, that's a level 5 pipe dream. Right? We're there but that would be great.

I am the first person to sign up to make sure that I bought can help me through that process because it's tough. Thank you everybody put it together for this man right here.



bottom of page