top of page
Mark Brierley

How to Opt Out of AI Training on Multiple Platforms

For some people, the AI revolution brings hope of an easier world where we can live longer, healthier lives. For others, there is a threat of our work and livelihoods being taken over by tech corporations. A third group believes that it is a fashionable talking point that will come to nothing. AI probably has been overhyped since ChatGPT went from zero to a million users in five days in 2022. At the same time, it is also under-hyped, and many individuals are both over-anxious and underprepared for it. As with all new technologies, legal and social frameworks are not yet in place to guide development in ways that will protect us and bring the biggest benefits to the most people. Ultimately, it is understandable that some may choose to opt-out of having their data used to train AI, and there are plenty of resources that outline how to complete this task for various platforms.


What is AI?


AI refers to artificial intelligence, but this is an ambiguous term. The actual meaning of “artificial” is “man-made” but in English we often use the term to mean “not real”, and intelligence is difficult to define because it is interwoven with how we solve problems and perceive ourselves as humans. When we say AI we are often talking about machine learning, which is a technique for computers to complete tasks through trials, errors, and then testing on large amounts of data. We may worry about where the data is coming from, and may be reluctant for data to be sourced from our digital files and online interactions. 


Sources of AI Training Data


Since the beginning, the internet has presented a conflict between the benefit of sharing our data and the desire to keep data private. Open AI’s ChatGPT has built its very successful large language models (LLMs) using publicly available data, licensed data and data created by human trainers. Ironically in relation to its name, Open AI does not disclose exactly which data it has used. Google has developed BERT, Bard and now Gemini using the search queries we have been writing and the emails and documents that it seems to let us send for free. Facebook’s AI models are trained on the data in posts and interactions we have been eager to share on its platforms. Amazon’s Alexa has been collecting voice data, not just of commands and questions but also conversations going on in the background.


Looking at this positively, all this data will help machine learning models to better understand how we communicate and work as humans, and the technology will improve. The point of technology is to complete tasks more quickly, more cheaply and with less energy, and so the goal of AI is not just to be as good as humans. If bicycles moved at the same speed and with the same effort as walking, nobody would use them: we use bicycles because they are easier and faster. The goal of AI is to be better than us. 


Given the multitude of sources for datasets used to train AI, it is unsurprising that there have been resulting legal cases. In 2016, before most people had heard of machine learning, Cambridge Analytica became notorious for harvesting data from millions of Facebook users, then using it for “weapons-grade” data manipulation techniques in an election and a referendum. In 2019, the Federal Trade Commission fined Amazon $25 million for violating the children’s online privacy protection act. Google faced court cases in 2021 and 2022 in the UK, Australia and Texas. In 2023 Open AI faced a lawsuit for unauthorised use of personal data to train its models.


How to Opt Out


Companies have often used unclear methods to collect user data for training purposes but it is sometimes possible to opt out. It’s important to note that many of them have two tiers of pricing: a cheaper tier that will use your data for their training, and a more expensive one where it opts your data out. Generally, if you are sharing your data on a platform and not paying then you may be paying for the service with your data. 


Know how to opt out of unknowingly providing proprietary data for training. 


Grammarly: 


Grammarly has an opt-out option for people who purchase 500 licences, but for most users it promises only to take snippets and to dissociate them from any user data.  


Adobe: 

Adobe has promised not to use creators’ works for training models. 


Figma:  

Admins can control customer content sharing with a new setting, as of August 15, 2024. 


Google Cloud Services:  

Google has many AI models as a service. 

 

A screenshot of a data logging notice for Google's Speech-to-Text service

  

Microsoft  

According to this, MS does not use your data for training their public models: https://learn.microsoft.com/en-us/azure/cognitive-services/openai/faq  


And on this page, it says there is a way to opt-out of abuse monitoring and content filtering: https://learn.microsoft.com/en-us/legal/cognitive-services/openai/data-privacy?context=%2Fazure%2Fcognitive-services%2Fopenai%2Fcontext%2Fcontext  



Looking to the Future 

No matter the technology, it’s important to be aware of how it works and take the appropriate actions in response. In light of the ever-changing nature of AI, we will continue to update this list with the latest resources. 


References 

Michael Seadle, (2020). The Great Hack (documentary film). Produced and directed by Karim Amer and Jehane Noujaim. Netflix, 2019. 

 

About Mark Brierley 

Mark Brierley has been teaching English at a university in Japan while developing online systems to support students reading. 

Comments


Commenting has been turned off.

Still got research questions?

Request a free 30-minute consultation

Thank you for your submission. We'll find a solution for you shortly!

bottom of page
Privacy Policy Cookie Policy