ChatGPT and Privacy: 3 Essential Tips for Secure and Confidential Usage

Yoriyasu Yano
IllumiTacit Blog
Published in
7 min readMay 23, 2023

--

ChatGPT has taken the world by storm, changing how many creators work with its powerful text and code generation capabilities. While there is no doubt that its writing, chatting, and coding abilities are game changers, many potential users and companies are concerned about the confidentiality of their data when using the technology.

Notably, Samsung banned the use of ChatGPT after it was uncovered that employees had accidentally provided some of their trade secrets to ChatGPT while using it. Other companies are starting to follow suit, including most recently, Apple.

These concerns have some merit, especially given the nature of these new tools. OpenAI’s (the creator of ChatGPT) Terms of Use mentions that they “may use Content from Services other than our API … to help develop and improve our Services.” This means that anything said to ChatGPT on the OpenAI website (this distinction is important, as you will see later), will be included in their training data.

This is a concern as there is risk that users outside the company can extract it with carefully crafted prompts to the model. As an example, GitHub Copilot, which is based on the Codex model, a close relative of the model powering ChatGPT by OpenAI (the same creator), has been proven to regurgitate code from its training data verbatim under certain inputs.

However, there are ways to safely use this new technology without an outright ban, while keeping your data confidential. ChatGPT and related technologies are a game changer in many industries and not using it could be a huge opportunity cost, potentially missing out on major productivity gains for yourself or your company.

The important part is knowing about how and where your data is used, and accessing the models in ways that ensure your data won’t be added to the training set. This is similar to data privacy when using the internet in general: the best way to keep your data private is to never connect to the internet, which is an option, but there are ways to protect yourself while reaping the benefits.

Here are 3 tips about ChatGPT data privacy that can help you protect your content while leveraging the full power of these AI tools:

  1. Access ChatGPT through the API
  2. Understand the privacy policies of OpenAI and 3rd party apps
  3. Evaluate why and how long your data is preserved

1. Access ChatGPT through the API

OpenAI offers access to ChatGPT in two different ways:

In their Terms of Use, OpenAI explicitly distinguishes between the two access models. As mentioned previously, OpenAI will use content uploaded to ChatGPT through their app interfaces as part of their training process. However, all data communicated through the API will not be used for training or testing (“We do not use Content that you provide to or receive from our API (“API Content”) to develop or improve our Services.”).

Note that the API is not strictly passthrough. In their API Data Usage Policy addendum, they mention that the data is retained in the OpenAI system for up to 30 days for abuse and monitoring purposes. During this time, the data is accessible to OpenAI employees and contractors (who are “subject to confidentiality and security obligations”), and may be viewed by said personnel if the content is suspected of abusing the system. However, this is far less risk to companies from a data privacy perspective than it is to use the non-API services.

OpenAI can also sign a BAA (Business Associate Agreement) for HIPAA compliance, allowing you to use their services for use cases that involve Personally Identifiable Information. Note that this is strictly for API usage.

If you are a developer, it should be a no-brainer to be interacting with the OpenAI models using the API directly with their programming SDKs. It not only ensures better privacy for your data, but also gives you full control over the interaction, and makes it easy to embed the model into your apps.

For other users though, this may be a tall order. Having to know programming to interact with ChatGPT makes for a poor user experience.

Fortunately, there are apps that embed ChatGPT and use the API to interact with it, such as IllumiTacit. These apps must interact with OpenAI over the API, which ensures that OpenAI won’t be retaining your data for training purposes.

The bottom line is that you should be cautious about any non-programming interface that OpenAI provides if you are concerned about the confidentiality of your data.

2. Understand the privacy policies of OpenAI and 3rd party apps

There is a high chance that any app that embeds ChatGPT will be using the OpenAI API, and thus will protect your content from being stored by OpenAI for training purposes. However, this is only limited to OpenAI, and does not automatically mean that the app you are using isn’t storing your data.

Be aware that you are passing your data through at least two entities: the app that you are interacting with, and OpenAI (or other provider) that powers the actual model. The API provisions only protect the data when it reaches OpenAI servers, but it does not protect you from what the app will do with your data. Since your data goes through the app, you are subject to the terms of use and privacy policies of the app, whatever that may be.

It is important to be aware of the privacy policies of the apps you are using ChatGPT through. The apps may have different policies that allow them to store your data for other purposes, which may be detrimental to your company.

You should always lean towards using apps that are explicit and transparent about their data policies.

For example, IllumiTacit’s privacy policy state that we retain data provided during Macro creation in the prompt templates (otherwise, we can’t create the prompts to send to OpenAI), but do not retain or use data provided during Macro execution (the data provided by users at runtime in the various apps like Microsoft Word and Google Chrome). Therefore it is safe to use your confidential data at runtime, provided you are ok with the OpenAI API usage terms.

3. Evaluate why and how long your data is preserved

The final tip is to consider the reason the data is being preserved when evaluating if there is a risk to your data’s confidentiality.

The biggest risk to companies with these AI models is the inadvertent leakage of data to external users of the model, through regeneration of training data. This makes any purpose that allows the providers to retain data for training purposes a significant risk to your business.

However, certain data retention uses should be tolerable as it limits the direct risk and impact that it may have on your business.

As mentioned previously, OpenAI retains data provided over the API for up to 30 days for abuse and monitoring purposes. Unlike the risk of secrets leaking into the training data, the risks for this form of retention is relatively low. The risk of data leaking in this scenario are limited to leaks caused by irresponsible storage of data (data breach caused by hackers gaining access to OpenAI servers) and leaks caused by malicious internal employees (OpenAI employees and contractors stealing the content). Both are tolerable risks for many businesses as they are already accepting similar risks through the use of third-party SaaS software like Microsoft Office 365. The risk is even lower if you consider that the data will be deleted after 30 days.

Knowing why data is being preserved allows you to make better decisions about real business risks and empowers you to have confidence in the privacy of your data, while leveraging the power of ChatGPT.

Summary

While concerns about data privacy with ChatGPT are valid, there are ways to leverage the power of this tool while keeping your data confidential. Just remember these three key tips to protect your data:

  1. Access ChatGPT through the API: OpenAI offers two access models — through the ChatGPT app or via the API. Using the API ensures that your content won’t be used for training purposes, offering better privacy and control. If you’re a developer, interacting with the models directly through the API is recommended. However, for non-programmers, using apps that embed ChatGPT and interact with OpenAI over the API is a good alternative.
  2. Understand the privacy policies of OpenAI and 3rd party apps: When using apps that integrate ChatGPT, be mindful of both the hosting app and OpenAI’s privacy policies. The API provisions protect your data on OpenAI servers, but you should also be aware of what the app does with your data. Choose apps that have clear and transparent data policies to ensure your content is handled appropriately.
  3. Evaluate why and how long your data is preserved: Assess why data is being retained to gauge the level of risk it poses to your business. OpenAI retains API data for up to 30 days for abuse and monitoring purposes, which presents a relatively low risk compared to data leaking into training sets. Understanding the reasons for data preservation allows you to make informed decisions about the risks involved.

By following these tips, you can confidently utilize ChatGPT while safeguarding your data and maximizing the productivity gains it offers. Remember, being informed and proactive is key to maintaining data privacy in the era of AI-powered tools. Don’t settle for an outright ban and lock yourself out from the powers of AI!

If you want a no-code way to leverage ChatGPT and similar models in your team while preserving data privacy, try out IllumiTacit. IllumiTacit allows you to share your prompts as action buttons that are accessible in a wide range of apps like Office 365 and Google Chrome, all without writing a single line of code. Supercharge your team with AI effortlessly. Join now!

--

--

Staff level Startup Engineer with 10+ years experience (formerly at Gruntwork)