Life has become easier. Instead of your former internet buddy, the good old search engine, giving you reams of answers to your search, the new hype on the market is “ChatGPT”, which produces for you the one and only best answer out there amalgamed from its vast training set of data. Inspiration for your job application to CERN? There you go. Quickly dashing off a travel request in Swahili? Karibu*. A love letter in poetic French? Voilà, mon cœur. Producing a code snippet for a software you need? {int return(1)}. Even creating your “own” photographic artworks has become as easy as pie ─ not to mention films and music in the near future. Deepfakes, anyone?
So, life becomes easier. And more confusing. The truth is becoming blurred, as ChatGPT’s answers are only as good as the information provided by its data set. So, beware: your application form, love letter or program code might not produce the quality and result you expected. Common sense, gut feelings, human intelligence and thinking for yourself are your best friends when it comes to assessing ChatGPT’s “truth” (see “Hallucination”).
But, apart from these sociological problems, there are also certain security and privacy aspects to consider. In ChatGPT there is no secrecy!
- Data exposure: Depending on who runs your ChatGPT platform, everything you type in could become mangled into other answers, eventually disclosing some confidential stuff you don’t want to see in the public domain (we’re aware of some CERN developers posting their code snippets into ChatGPT and asking it to find the bug – these might have included passwords or other secrets).
- Data disclosure during training: Any AI needs training. This training is based on lots and lots of training data which may or may not be considered sensitive/restricted. If adequate protection means are omitted, when the AI training mangles different trainings sets, including those of third parties, and if all the tenants are not well separated, your data might make it into the public domain. To third-party tenants or to creative users. It wouldn’t be the first time that a company leaked data through inadequate data protection means.
- Data leakage: Even if you’ve secured the confidentiality of your training data, when it is exposed to third parties for usage or “questioning”, clever people might be able to extract some confidential information by clever questioning.
- Copyright: The training set, and your subsequent result, might be based on copyrighted material. Currently, it is a legal grey area whether or not your new artwork, sound bite or video is subject to those copyrights and you should pay compensation to the owners of the pieces in question.
- Poisoning: This is where an attacker (or an inexperienced AI trainer) manipulates the training sets in such a way that the results are flawed or biased.
- Cheating: Finally, to the chagrin of schoolchildren and students, ChatGPT is a perfect tool to produce results that are not your own. Not your own painting. Not your own homework. Not your own paper. While it might be difficult to spot the real origin today, time may reveal that some authors plagiarised their work.
And, of course, like any other (cloud) software, there are the same computer security and privacy risks that require the same protective means: access control, active system maintenance and patching, encryption and data protection, back-up and disaster recovery, monitoring and logging, etc.
So, like with any new technology, and while ChatGPT definitely has its merits and might well be the next game-changer in IT, it also comes with certain risks linked to copyright, privacy and SeCReCY. Make sure the benefits outweigh the potential harm!
* “Karibu”: a Swahili word that in English would mean “don’t hesitate” or “please”.
_____
Do you want to learn more about computer security incidents and issues at CERN? Follow our Monthly Report. For further information, questions or help, check our website or contact us at Computer.Security@cern.ch.