Because most people do not understand what this technology is, and attribute far too much control over the generated text to the creators. If Copilot generates the text “Trans people don’t exist”, and Microsoft doesn’t immediately address it, a huge portion of people will understand that to mean “Microsoft doesn’t think trans people exist”.
Insert whatever other politically incorrect or harmful statement you prefer.
Those sorts of problems aren’t easily fixable without manual blocks. You can train the models with a “value” system where they censor themselves but that still will be imperfect and they can still generate politically incorrect text.
IIRC some providers support 2 separate endpoints where one is raw access to the model without filtering and one is with filtering and censoring. Copilot, as a heavily branded end user product, obviously needs to be filtered.
I don’t think they’re wrong in saying that if they aren’t allowed to train on copyrighted works then they will fall behind. Maybe I missed it in the article, but Japan for example has that exact law (use of copyright to train generative AI is allowed).
Personally I think we need to give them somewhat of an out by letting them do it but then taxing the fuck out of the resulting product. “You can use copyrighted works for training but then 50% of your profits are taxed”. Basically a recognition that the sum of all copyrighted works is a societal good and not just an individual copyright holders.