Preventing Data Leaks and Hallucinations: How to Make GPT Enterprise-Ready

In the rapidly evolving landscape of analytics and insights, emerging technologies have sparked both excitement and apprehension among enterprise leaders. The allure of Generative AI models, such as OpenAI’s ChatGPT, lies in their ability to generate impressive responses and provide valuable business insights. However, with this potential comes the pressing concern of data security and the risks associated with “hallucinations,” where the model fills in the gaps when under-specified queries are posed. As Analytics and Insights leaders seek to harness the power of these technologies, they must find a balance between innovation and safeguarding sensitive information. In this enlightening interview, Co-founders Mike Finley and Pete Reilly shed light on how they are making emerging technologies enterprise-ready.

Watch the video below or read the transcript of the interview to learn more.


How is AnswerRocket making these emerging technologies enterprise-ready?

Mike: I would start simply by saying that the idea of keeping data secure and providing answers that are of high integrity is table stakes for an enterprise provider. Making sure that users who should not have access to data don’t have access to it and that the data is never leaked out, right? That’s table six for any software at the enterprise. And so it doesn’t change with the advent of the AI technology. So AnswerRocket is very focused on ensuring that data flowing from the database to the models, whether it’s the OpenAI models or other models of our own, does not result in anything being trained or saved. So that it could be used by some third party, so that it could be leaked out, so that it could be taken advantage of in any way other than its intended purpose. So that’s a core part of what we offer. 

The flip side of that is, as you mentioned, many of these models are sort of famous at this point for producing hallucinations, where when you under specify what you ask the model, you don’t give it enough information, it fills in the blanks. It’s what it does, it’s generative, right? The G in generative is what makes it want to fill in these blanks. AnswerRocket takes two steps to ensure that doesn’t happen. First of all, when we pose a question to the language model, we ensure that the facts supporting that question are all present. It doesn’t need to hallucinate any facts because we’re only giving it questions that we have the factual level answers for so that it can make a conversational reply. The second thing we do is when we get that conversational reply, like a good teacher, we’re grading it. We’re going through checking every number, what is the source of that number, is that one of the numbers that was provided? 

Is it used in the correct way? And if so, we allow that to flow through and if not, we never show that to the user, so they never see it. A demonstration is not value creation, right? A lot of companies that just kind of learned about this tech are out and are out there demonstrating some cool stuff. Well, it’s really easy to make amazing demonstrations out of these language models. What’s really hard is to make enterprise solutions that are of high integrity that meet all of the regulatory compliance requirements that provide value by building on what your knowledge workers are doing and making them do a better job still. And so that’s very much in the DNA of AnswerRocket. And it’s 100% throughout all the work that we do with language models. 

How can enterprises avoid data leakage and hallucinations when leveraging GPT?

Pete: A lot of the fear that you hear people saying, oh, I’m going to have leaking data and so on, a lot of that’s just coming from ChatGPT. And if you go and read the terms and conditions of ChatGPT, it says, hey, we’re going to use your information, we’re going to use it to train the model. And it’s out there. That’s where you’re seeing a lot of companies really lock down ChatGPT based on the terms and conditions that makes sense. But when you look at the terms and conditions of, say, the OpenAI API, it is not using your data to train the model. It is not widely available to even anybody inside of the company, it’s removed in 30 days and so on. So those are much more restrictive and much more along the lines of what I think a large enterprise is going to expect.

You can go to another level. And I think a lot of our, what we’re seeing is a lot of our customers, they do a lot of business with, say, Microsoft. Microsoft also can host that model inside the same environment that you’re hosting all your other corporate data and so on. So it really has that same level of security that if you trust, say, Microsoft, for example, to host your corporate enterprise data, well then really trusting them to sort of host the OpenAI model is really on that same level. And what we’re seeing is large enterprises are getting comfortable with that. And in terms of hallucinations, as Mike said, it’s really just important how we use it. We analyze the data, we produce facts, and there are settings in these large language models to tell it how creative to get or not. And you say, don’t get creative, I just want the facts, but give me a good business story about what is happening. 

And then we also provide information to the user that tells them exactly where that information came from, traceable all the way down to the database based, all the way down to the SQL query so that it’s completely audible in terms of where the data came from and being able to trust. 

In Conclusion

Analytics and Insights leaders who wish to utilize ChatGPT technology in their organizations have to balance the possible rewards with the risks. We’re committed to providing a truly enterprise-ready solution that leverages the power of ChatGPT with our augmented analytics platform to securely get accurate insights from your data. By providing AI models with complete and correct supporting facts, we can eliminate the possibility for hallucinations and maintains full control over the generated responses in the platform. Furthermore, we use astringent grading process to validate the AI-generated insights before presenting them to the users. 

Scroll to Top