So taking data without permission is bad, now?

I’m not here to say whether the R1 model is the product of distillation. What I can say is that it’s a little rich for OpenAI to suddenly be so very publicly concerned about the sanctity of proprietary data.

The company is currently involved in several high-profile copyright infringement lawsuits, including one filed by The New York Times alleging that OpenAI and its partner Microsoft infringed its copyrights and that the companies provide the Times’ content to ChatGPT users “without The Times’s permission or authorization.” Other authors and artists have suits working their way through the legal system as well.

Collectively, the contributions from copyrighted sources are significant enough that OpenAI has said it would be “impossible” to build its large-language models without them. The implication being that copyrighted material had already been used to build these models long before these publisher deals were ever struck.

The filing argues, among other things, that AI model training isn’t copyright infringement because it “is in service of a non-exploitive purpose: to extract information from the works and put that information to use, thereby ‘expand[ing] [the works’] utility.’”

This kind of hypocrisy makes it difficult for me to muster much sympathy for an AI industry that has treated the swiping of other humans’ work as a completely legal and necessary sacrifice, a victimless crime that provides benefits that are so significant and self-evident that it’s wasn’t even worth having a conversation about it beforehand.

A last bit of irony in the Andreessen Horowitz comment: There’s some handwringing about the impact of a copyright infringement ruling on competition. Having to license copyrighted works at scale “would inure to the benefit of the largest tech companies—those with the deepest pockets and the greatest incentive to keep AI models closed off to competition.”

“A multi-billion-dollar company might be able to afford to license copyrighted training data, but smaller, more agile startups will be shut out of the development race entirely,” the comment continues. “The result will be far less competition, far less innovation, and very likely the loss of the United States’ position as the leader in global AI development.”

Some of the industry’s agita about DeepSeek is probably wrapped up in the last bit of that statement—that a Chinese company has apparently beaten an American company to the punch on something. Andreessen himself referred to DeepSeek’s model as a “Sputnik moment” for the AI business, implying that US companies need to catch up or risk being left behind. But regardless of geography, it feels an awful lot like OpenAI wants to benefit from unlimited access to others’ work while also restricting similar access to its own work.

  • Merlin@lemm.ee
    link
    fedilink
    English
    arrow-up
    3
    ·
    9 hours ago

    Altman and his ilk are so full of shit. Even the name of their company is incredibly ironic and insulting.

  • webghost0101@sopuli.xyz
    link
    fedilink
    English
    arrow-up
    11
    ·
    edit-2
    14 hours ago

    OpenAI: “Our mission is to ensure that artificial general intelligence—AI systems that are generally smarter than humans—benefits all of humanity.”

    Also OpenAi: “We take aggressive, proactive countermeasures to protect our technology and will continue working closely with the Trump regime (US government) to protect the most capable models being built here”

    In the meantime Chatgpt on Sam:

  • pebbles@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    50
    arrow-down
    1
    ·
    1 day ago

    We aren’t there, but imagine a world where ideas aren’t owned. Where you aren’t hurt by someone else using your work. Where we all benefit from innovation and reuse.

    • themurphy@lemmy.ml
      link
      fedilink
      English
      arrow-up
      37
      arrow-down
      1
      ·
      edit-2
      1 day ago

      I remember that there was a science group that each year got millions in funding, unconditionally. Except everything you discovered would be open for anyone to use.

      Because it was unconditional, they could research ANYTHING. And it was very successful, because they could invent things without being controlled by profits or share holders.

      It basically worked well.

      EDIT: Found some of them. Look up The Invisible College or The Institute of Advanced Study. Also found 4 similar groups in Denmark being funded by private firms (like Carlsberg, the beer maker), where they can study anything and make it public.

    • maplebar@lemmy.world
      link
      fedilink
      English
      arrow-up
      7
      arrow-down
      6
      ·
      24 hours ago

      We aren’t talking about “ideas” being stolen here, we’re talking about work being stolen and exploited for corporate profit.

      Personally I don’t think it’s crazy to suggest that the person who writes a book should own it, the people who compose a song should own it, the artists who paints a painting should own it, etc.

      As much as techbros love to pretend that AI is ushering us into a post-capitalist, post-copyright Star Trek future, it is actually in fact doing the exact opposite–it’s empowering the biggest and richest tech companies to exploit human creativity in the largest industrial plagiarism scheme in history, all so some bullshit VC investors can gain their way up the pyramid scheme known as the stock market.

      • proceduralnightshade@lemmy.ml
        link
        fedilink
        English
        arrow-up
        5
        ·
        18 hours ago

        The problem with copyright/data ownership is that it’s useless if you’re unable to enforce it. Data is replicable, doesn’t matter if you call it “work” or “ideas”. Do you think you own the text you just wrote? Let me show you something.

        Spoiler

        We aren’t talking about “ideas” being stolen here, we’re talking about work being stolen and exploited for corporate profit.

        Personally I don’t think it’s crazy to suggest that the person who writes a book should own it, the people who compose a song should own it, the artists who paints a painting should own it, etc.

        As much as techbros love to pretend that AI is ushering us into a post-capitalist, post-copyright Star Trek future, it is actually in fact doing the exact opposite–it’s empowering the biggest and richest tech companies to exploit human creativity in the largest industrial plagiarism scheme in history, all so some bullshit VC investors can gain their way up the pyramid scheme known as the stock market.

        There. I just stole your text. I stole it. I own it now. It’s mine now. What are you gonna do about it?

        Instead there is no stealing when it comes to information, there’s only replication, there’s only copying.

        I agree with you, corpos shouldn’t have this amount of power. But you won’t get there by trying to protect the work of artists writers etc with the exact same scheme corpos pulled to protect their power and interests. Like, it didn’t work, did it? No copyright for me, thanks

        • maplebar@lemmy.world
          link
          fedilink
          English
          arrow-up
          9
          arrow-down
          1
          ·
          17 hours ago

          Data is replicable, doesn’t matter if you call it “work” or “ideas”.

          Your mistake is thinking that “data” and “copyright” or “ownship” are the same thing. They aren’t

          You can download a song, and thus be in possession of the data of that song, and you can even copy the file within the parameters of copyright law.

          However, simply having the data is not the same thing as owning or holding a license to the song itself, and so you are in violation of the law (where I live, at least) if you try to distribute that song or use it in a non-fair-use context.

          IF you were to copy my work and exploit it in a for-profit context for millions of dollars (and you happened to be operating in a region in which applicable copyright laws happen to apply) you’re damn right I would come after you for a slice of the pie, and I would almost certainly win. Just copying what I say and pasting it in a quote isn’t something that I can prove damages on, because it isn’t something you’re profiting on in any way, so the idea of “enforcing” it is irrelevant and obviously not worth it.

          I agree with you, corpos shouldn’t have this amount of power. But you won’t get there by trying to protect the work of artists writers etc with the exact same scheme corpos pulled to protect their power and interests. Like, it didn’t work, did it?

          This is where we are going to have to disagree. I am absolutely willing to fight fire with fire by using the copyright system against big tech. I don’t make the rules, but IF rules are to exist in terms of what is or is not fair use of copyrighted material, then I DO expect those rules to apply equitably. (Whether they will or not remains to be seen, but let’s see what precedent gets set and I’ll adapt from there.)

          No copyright for me, thanks

          Can I ask you a personal question: what do you create, and do you submit it to the public domain?

          As for me, I write music, create art, make games and write computer code and do a number of other things that I absolutely claim ownership over. So, when I write a song or paint a picture who the fuck is anyone else to try to take that away from me or claim it as something that they own and control? I’ve written thousands of lines of GPL code and contributed to many hippy-dippy open source free software projects over my lifetime, and even in that kind of copyleft context we still maintain a copyright over the code we right (as seen at the top of every source and header file).

          I only ask because I find that the people who are most pro-AI and most anti-copyright are generally people who have never created anything of their own–they’ve written no songs, they’ve drawn no pictures, they’ve written no stories–and now they incorrectly generative see AI as something that “evens the playing field” by compensating for their lack of skills and drive.

          But I’ll repeat myself, AI isn’t ushering us into a post-copyright world where the little guy is empowered in anyway. It’s just a punch of useful idiots downloading completely proprietary binary blobs from the biggest, richest corporations, fooling themselves into thinking that they’re being empowered to create things when in reality they’re just beta testing a plagiarism machine on a industrial scale that’s designed to enrich the richest.

          • dreadbeef@lemmy.dbzer0.com
            link
            fedilink
            English
            arrow-up
            3
            ·
            10 hours ago

            I’m a software engineer and I have been playing guitar nearly every day since I was 8 years old. I release everything GPL/AGPL or CC-BY-SA that I own and can. Heck, I am racking every day trying to figure out ideas that can hopefully make me a living while also giving everything I have away. I don’t want to own my shit man, I just want to share what I have and hope it’s useful, and I don’t want people being assholes so I opt for the copyleft instead of liberal licenses.

          • proceduralnightshade@lemmy.ml
            link
            fedilink
            English
            arrow-up
            3
            ·
            16 hours ago

            I just don’t like the premise of a market where one has to sell their artistic labor in order to survive, or thrive. I’m on board with noncommercial licenses and everything because the reality looks different, but that was not my point. And neither was it the point of the original comment you replied to.

  • Rentlar@lemmy.ca
    link
    fedilink
    English
    arrow-up
    17
    ·
    1 day ago

    Imagine if this consternation from OpenAI against DeepSeek is used against them in the lawsuits… would be quite delicious.

  • givesomefucks@lemmy.world
    link
    fedilink
    English
    arrow-up
    17
    ·
    1 day ago

    Nothing new about it…

    Movies are made in Cali because the first studios didn’t want to pay Edison for his stolen patents on equipment…

    So they moved to the opposite end of the country where it was mostly unenforceable, and it didn’t take long till intellectual property was a big concern for them.