• B0rax@feddit.org
    link
    fedilink
    arrow-up
    6
    arrow-down
    1
    ·
    1 day ago

    I don’t get why they didn’t just buy ebooks? Why go through the trouble of scanning physical books?

    • Michal@programming.dev
      link
      fedilink
      arrow-up
      8
      ·
      1 day ago

      The answer lies within the article

      Publishers legally control content that AI companies desperately want, but AI companies don’t always want to negotiate a license. The first-sale doctrine offered a workaround: Once you buy a physical book, you can do what you want with that copy—including destroy it. That meant buying physical books offered a legal workaround.

      And yet buying things is expensive, even if it is legal. So like many AI companies before it, Anthropic initially chose the quick and easy path. In the quest for high-quality training data, the court filing states, Anthropic first chose to amass digitized versions of pirated books to avoid what CEO Dario Amodei called “legal/practice/business slog”—the complex licensing negotiations with publishers. But by 2024, Anthropic had become “not so gung ho about” using pirated ebooks “for legal reasons” and needed a safer source.