Father, Hacker (Information Security Professional), Open Source Software Developer, Inventor, and 3D printing enthusiast

  • 9 Posts
  • 375 Comments
Joined 2 years ago
cake
Cake day: June 23rd, 2023

help-circle






  • If you hired someone to copy Ghibli’s style, then fed that into an AI as training data, it would completely negate your entire argument.

    It is not illegal for an artist to copy someone else’s style. They can’t copy another artist’s work—that’s a derivative—but copying their style is perfectly legal. You can’t copyright a style.

    All of that is irrelevant, however. The argument is that—somehow—training an AI with anything is somehow a violation of copyright. It is not. It is absolutely 100% not a violation of copyright to do that!

    Copyright is all about distribution rights. Anyone can download whatever TF they want and they’re not violating anyone’s copyright. It’s the entity that sent the person the copyright that violated the law. Therefore, Meta, OpenAI, et al can host enormous libraries of copyrighted data in their data centers and use that to train their AI. It’s not illegal at all.

    When some AI model produces a work that’s so similar to an original work that anyone would recognize it, “yeah, that’s from Spirited Away” then yes: They violated Ghibli’s copyright.

    If the model produces an image of some random person in the style of Studio Ghibli that is not violating anyone’s copyright. It is not illegal nor is it immoral. No one is deprived of anything in such a transaction.


  • I think your understanding of generative AI is incorrect. It’s not just “logic and RNG”…

    If it runs on a computer, it’s literally “just logic and RNG”. It’s all transistors, memory, and an RNG.

    The data used to train an AI model is copyrighted. It’s impossible for something to exist without copyright (in the past 100 years). Even public domain works had copyright at some point.

    if any of the training data is copyrighted, then attribution must be given, or at the very least permission to use this data must be given by the current copyright holder.

    This is not correct. Every artist ever has been trained with copyrighted works, yet they don’t have to recite every single picture they’ve seen or book they’ve ever read whenever they produce something.










  • If you studied loads of classic art then started making your own would that be a derivative work? Because that’s how AI works.

    The presence of watermarks in output images is just a side effect of the prompt and its similarity to training data. If you ask for a picture of an Olympic swimmer wearing a purple bathing suit and it turns out that only a hundred or so images in the training match that sort of image–and most of them included a watermark–you can end up with a kinda-sorta similar watermark in the output.

    It is absolutely 100% evidence that they used watermarked images in their training. Is that a problem, though? I wouldn’t think so since they’re not distributing those exact images. Just images that are “kinda sorta” similar.

    If you try to get an AI to output an image that matches someone else’s image nearly exactly… is that the fault of the AI or the end user, specifically asking for something that would violate another’s copyright (with a derivative work)?



  • I wasn’t being pedantic. It’s a very fucking important distinction.

    If you want to say “unethical” you say that. Law is an orthogonal concept to ethics. As anyone who’s studied the history of racism and sexism would understand.

    Furthermore, it’s not clear that what Meta did actually was unethical. Ethics is all about how human behavior impacts other humans (or other animals). If a behavior has a direct negative impact that’s considered unethical. If it has no impact or positive impact that’s an ethical behavior.

    What impact did OpenAI, Meta, et al have when they downloaded these copyrighted works? They were not read by humans–they were read by machines.

    From an ethics standpoint that behavior is moot. It’s the ethical equivalent of trying to measure the environmental impact of a bit traveling across a wire. You can go deep down the rabbit hole and calculate the damage caused by mining copper and laying cables but that’s largely a waste of time because it completely loses the narrative that copying a billion books/images/whatever into a machine somehow negatively impacts humans.

    It is not the copying of this information that matters. It’s the impact of the technologies they’re creating with it!

    That’s why I think it’s very important to point out that copyright violation isn’t the problem in these threads. It’s a path that leads nowhere.