OpenAI confirms that AI writing detectors don’t work

fne8w2ah@lemmy.world · 2 years ago

OpenAI confirms that AI writing detectors don’t work

Boddhisatva@lemmy.world · 2 years ago

OpenAI discontinued its AI Classifier, which was an experimental tool designed to detect AI-written text. It had an abysmal 26 percent accuracy rate.

If you ask this thing whether or not some given text is AI generated, and it is only right 26% of the time, then I can think of a real quick way to make it 74% accurate.

Leate_Wonceslace@lemmy.dbzer0.com · 2 years ago

I feel like this must stem from a misunderstanding of what 26% accuracy means, but for the life of me, I can’t figure out what it would be.

dartos@reddthat.com · edit-2 2 years ago

Looks like they got that number from this quote from another arstechnica article ”…OpenAI admitted that its AI Classifier was not “fully reliable,” correctly identifying only 26 percent of AI-written text as “likely AI-written” and incorrectly labeling human-written works 9 percent of the time”

Seems like it mostly wasn’t confident enough to make a judgement, but 26% it correctly detected ai text and 9% incorrectly identified human text as ai text. It doesn’t tell us how often it labeled AI text as human text or how often it was just unsure.

EDIT: this article https://arstechnica.com/information-technology/2023/07/openai-discontinues-its-ai-writing-detector-due-to-low-rate-of-accuracy/

schzztl@lemmy.nz · 2 years ago

Specificity vs sensitivity, no?

cmfhsu@lemmy.world · edit-2 2 years ago

In statistics, everything is based off probability / likelihood - even binary yes or no decisions. For example, you might say “this predictive algorithm must be at least 95% statistically confident of an answer, else you default to unknown or another safe answer”.

What this likely means is only 26% of the answers were confident enough to say “yes” (because falsely accusing somebody of cheating is much worse than giving the benefit of the doubt) and were correct.

There is likely a large portion of answers which could have been predicted correctly if the company was willing to chance more false positives (potentially getting studings mistakenly expelled).

notatoad@lemmy.world · 2 years ago

it seemed like a really weird decision for OpenAI to have an AI classifier in the first place. their whole business is to generate output that’s good enough that it can’t be distinguished from what a human might produce, and then they went and made a tool to try and point out where they failed.

Boddhisatva@lemmy.world · 2 years ago

That may have been the goal. Look how good our AI is, even we can’t tell if its output is human generated or not.

Blackmist@feddit.uk · 2 years ago

The only thing AI writing seems to be useful for is wasting real people’s time.

Max Demon@lemm.ee · 2 years ago

True -

Write points/summary
Have AI expand in many words
Post
Reader uses AI to generate summarize post preferably in points
Profit??

driving_crooner@lemmy.eco.br · 2 years ago

Terence Tao just did a thread on Mathstodon talking about jow ChatGPT help him program a algorithm for looking for numbers.

Matriks404@lemmy.world · 2 years ago

Did human-generated content really become so low quality that it is distinguishable from AI-generated content?

tech@lemmy.world · 2 years ago

Should I be able to detect whether or not this is an AI generated comment?

nodsocket@lemmy.world · 2 years ago

As an AI language model, I am unable to confirm whether or not the above post was written by an AI.

funktion@lemm.ee · 2 years ago

People kind of just suck at writing in general. It’s not a skill that’s valued so much, otherwise writers, editors, and proofreaders would be paid more.

DogMuffins@discuss.tchncs.de · 2 years ago

Not necessarily. It’s just that AI’s can’t tell the difference.

Although I don’t know whether humans can.

Arsenal4ever@lemmy.world · 2 years ago

have you seen exTwitter?

mind@lemmy.world · 2 years ago

deleted by creator

possibly a cat@lemmy.ml · edit-2 2 years ago

deleted by creator

Nioxic@lemmy.dbzer0.com · edit-2 2 years ago

I have to hand in a short report

I wrote parts of it and asked chatgpt for a conclusion.

So i read that, adjusted a few points. Added another couple points…

Then rewrote it all in my own wording. (Chatgpt gave me 10 lines out of 10 pages)

We are allowed to use chatgpt though. Because we would always have internet access for our job anyway. (Computer science)

ReallyKinda@kbin.social · 2 years ago

I know a couple teachers (college level) that have caught several gpt papers over the summer. It’s a great cheating tool but as with all cheating in the past you still have to basically learn the material (at least for narrative papers) to proof gpt properly. It doesn’t get jargon right, it makes things up, it makes no attempt to adhere to reason when it’s making an argument.

Using translation tools is extra obvious—have a native speaker proof your paper if you attempt to use an AI translator on a paper for credit!!

SpikesOtherDog@ani.social · 2 years ago

it makes things up, it makes no attempt to adhere to reason when it’s making an argument.

It doesn’t hardly understand logic. I’m using it to generate content and it continuously will assert information in ways that don’t make sense, relate things that aren’t connected, and forget facts that don’t flow into the response.

mayonaise_met@feddit.nl · edit-2 2 years ago

As I understand it as a layman who uses GPT4 quite a lot to generate code and formulas, it doesn’t understand logic at all. Afaik, there is currently no rational process which considers whether what it’s about to say makes sense and is correct.

It just sort of bullshits it’s way to an answer based on whether words seem likely according to its model.

That’s why you can point it in the right direction and it will sometimes appear to apply reasoning and correct itself. But you can just as easily point it in the wrong direction and it will do that just as confidently too.

Aceticon@lemmy.world · 2 years ago

It has no notion of logic at all.

It roughly works by piecing together sentences based on the probability of the various elements (mainly words but also more complex) being there in various relations to each other, the “probability curves” (not quite probability curves but that’s a good enough analog) having been derived from the very large language training sets used to train them (hence LLM - Large Language Model).

This is why you might get things like pieces of argumentation which are internally consistent (or merelly familiar segments from actual human posts were people are making an argument) but they’re not consistent with each other - the thing is not building an argument following a logic thread, it’s just putting together language tokens in common ways which in its training set were found associate with each other and with language token structures similar to those in your question.

Cosmic Cleric@lemmy.world · 2 years ago

That’s a great summary of how it works. Well done.

pc_admin@aussie.zone · edit-2 2 years ago

Any teacher still issuing out of class homework or assignments is doing a disservice IMO.

Of coarse people will just GPT it… you need to get them off the computer and into an exam room.

ReallyKinda@kbin.social · 2 years ago

Even in college? I never had a college course that allowed you to work on assignments in class

Muffi@programming.dev · 2 years ago

I studied engineering. Most classes were split into 2 hours of theory, followed by 2 hours of practical assignments. Both within the official class hours, so teachers could assist with the assignments. The best college-class structure by far imo.

shameless@lemmy.world · 2 years ago

deleted by creator

Turun@feddit.de · 2 years ago

Or, because you can’t rely on computers to tell you the truth. Which is exactly the issue with LLMs as well.

sfgifz@lemmy.world · 2 years ago

You can’t rely on books or people tell you the truth either.

Turun@feddit.de · 2 years ago

I was mostly referring to the top comment. If you need to write an essay on Hamlet, the book can in fact not lie, because the entire exercise is to read the book and write about the contents of it.

But in general, you are right. (Which is why it is proper journalistic procedure to talk to multiple experts about a topic you write about. Also a good article does not present a forgone conclusion, but instead let’s readers form their own opinion on a topic by providing the necessary context and facts without the author’s judgement. LLMs as a one-stop-shop do not provide this and are less reliable than listening to a single expert would be)

atrielienz@lemmy.world · 2 years ago

Which is why bibliographies exist.

Absolutemehperson@lemmy.world · 2 years ago

mfw just asking ChatGPT to write an undetectable essay.

Later, losers!

cheese_greater@lemmy.world · edit-2 2 years ago

I would be in trouble if this was a thing. My writing naturally resembles the output of a ChatGPT prompt when I’m not joke answering or shitposting.

Steeve@lemmy.ca · 2 years ago

We found the source

BananaOnionJuice@lemmy.dbzer0.com · 2 years ago

Do you also need help from a friend to prove you are not a robot?

cheese_greater@lemmy.world · 2 years ago

I need a lotta help, just not from a friend and about anything robot-related 😮‍💨

BananaOnionJuice@lemmy.dbzer0.com · 2 years ago

Hope you have some good friends and family that can help.

Jargus@lemmy.world · edit-2 2 years ago

deleted by creator

robbotlove@lemmy.world · 2 years ago

this comment could have been written in 2005 and still have been true.

SpaceCowboy@lemmy.ca · 2 years ago

AI might democratize grifting. You no longer will have to have the resources that Russia and China have devoted to this kind of thing. Anyone will be able to generate vast amounts of fake inflammatory rhetoric.

Then once there’s a 99.9% chance that the person you’re talking to on social media is an AI, people might realize how stupid it is to believe anything they read on the internet.

HelloThere@sh.itjust.works · 2 years ago

Regardless of if they do or don’t, surely it’s in the interests of the people making the “AI” to claim that their tool is so good it’s indistinguishable from humans?

efrique@lemm.ee · 2 years ago

Just need to get AI on that.

m0darn@lemmy.ca · 2 years ago

Aren’t there very few student priced ai writers? And isn’t the writing done on their servers? And aren’t they saving all the outputs?

Can’t the ai companies sell to schools the ability to check paper submissions against recent outputs?

dyc3@lemmy.world · 2 years ago

Chatgpt 3.5 is free. Can’t get more student priced than that.

Regarding the second part about outputs: that’s not practical. Suppose you ignore students running their own LLMs offline on their gaming gpus, where these corps wouldn’t have access to the info. It’s still wildly impractical because students can paraphrase LLM output into something that doesn’t look like the original output.

m0darn@lemmy.ca · 2 years ago

Chatgpt 3.5 is free. Can’t get more student priced than that.

Yeah, my point was I don’t think there are many offering the service for free. And they are probably looking for revenue streams.

Suppose you ignore students running their own LLMs offline on their gaming gpus

I actually feel like this is the one that shouldn’t be ignored. But I don’t have a good sense of the computational power vs quality output.

It’s still wildly impractical because students can paraphrase LLM output into something that doesn’t look like the original output.

At least doing that is likely to result in the student internalizing the information to some degree. It’s also not so different (not at all different?) from the most benign academic dishonesty that existed when I was a student.

One issue with the approach I suggested is the copyright issue of profs submitting students’ original work for AI processing without understanding/caring about copyright implications.

dyc3@lemmy.world · 2 years ago

And they are probably looking for revenue streams.

Yeah of course. As it stands right now gpt 3.5 is free, but gpt 4.0, which has been demonstrated to produce better output and get do more, costs a monthly subscription.

At least doing that is likely to result in the student internalizing the information to some degree.

This is a good point, and I agree.

nucleative@lemmy.world · edit-2 2 years ago

We need to embrace AI written content fully. Language is just a protocol for communication. If AI can flesh out the “packets” for us nicely in a way that fits what the receiving humans need to understand the communication then that’s a major win. Now I can ask AI to write me a nice letter and prompt it with a short bulleted list of what I want to say. Boom! Done, and time is saved.

The professional writers who used to slave over a blank Word document are now obsolete, just like the slide rule “computers” of old (the people who could solve complicated mathematics and engineering problems on paper).

Teachers who thought a hand written report could be used to prove that “education” has happened are now realizing that the idea was a crutch (it was 25 years ago too when we could copy/paste Microsoft Encarta articles and use as our research papers).

The technology really just shows us that our language capabilities really are just a means to an end. If a better means asrises we should figure out how to maximize it.

ram@lemmy.ca · 2 years ago

Huh?

GlendatheGayWitch@lib.lgbt · 2 years ago

Couldn’t you just ask ChapGPT whether it wrote something specific?

vale@sh.itjust.works · 2 years ago

Then you have that time that a professor tried to fail his whole class because he asked chatGPT if it wrote the essays.

https://wgntv.com/news/professor-attempts-to-fail-students-after-falsely-accusing-them-of-using-chatgpt-to-cheat/

Echo Dot@feddit.uk · 2 years ago

That doesn’t really work because it just says whatever half the time. It’s very good at making stuff up. It doesn’t really get that it needs to tell the truth because all it’s doing is optimising for a good narrative.

That’s why it says slavery is good, because the only people asking that question clearly have an answer in mind, and it’s optimising for that answer.

Also it doesn’t have access to other people’s sessions (because that would be hella dodgy) so it can’t tell you definitively if it did or did not say something in another session, even if it were inclined to tell the truth.

Kazumara@feddit.de · 2 years ago

Obviously not. Its a language generator with a bit of chat modeling and reinforcement learning, not an Artificial General Intelligence.

It doesn’t know anything, it doesn’t retain memory long term, it doesn’t have any self identity. There is no way it could ever truthfully respond “I know that I wrote that”.

mwguy@infosec.pub · 2 years ago

No. The model doesn’t have a record of everything it wrote.