#22ED048A #FB-45 #新興科技組

AI & Copyright

兼論美國近期AI訴訟發展

Background . Court . Joint Forces . Pushback

🔸

楊琇惠 Jackie Yang

2024.06.20

Background

w/o covering technical details

🔹

Glossary
Iteration of ML
Generative AI
Tension

Background
Introduction to AI, ML, DL, Generative AI and NLP and the relationship between them (with a focus on GenAI)
  • AI
  • ML
  • Deep Learning (DL)
  • Generative AI (GenAI)
  • Large Language Model (LLM)
  • Natural Language Processing (NLP)
Background

Iteration of Machine Learning (ML)

Text / Image / Video

Stage Stakeholder
1️⃣ Train AI service provider, Content creator
2️⃣ Evaluate AI service provider
3️⃣ Deploy AI service provider, End user
  • AI service provider: Stability AI / Github / OpenAI
  • End user: including content creator
Background

Generative AI

Background: Generative AI

Stable Diffusion

Background: Generative AI

DALL·E

Background

Tension

(Why) is Generative AI the breaking change?

  • Killer application: potential profitability
  • Permission for human to view vs. for robot to crawl
  • Fair use vs. tokenized training data
Background: Tension

Fair Use? ➡️ 著作權法 65

  • I. 著作之合理使用,不構成著作財產權之侵害。
  • II. 著作之利用是否合於第44條至第63條所定之合理範圍或其他合理使用之情形,應審酌一切情狀,尤應注意下列事項,以為判斷之基準:
    一、利用之目的及性質,包括係為商業目的或非營利教育目的。
    二、著作之性質。
    三、所利用之質量及其在整個著作所占之比例
    四、利用結果對著作潛在市場與現在價值之影響

Court

Content Creator v. AI Service Provider

🔹

Github Copilot
OpenAI ChatGPT
Stable Diffusion / Midjourney

Court

Github Copilot Text

Court: Github Copilot Text

Github Copilot Text 2022.11

  1. Digital Millennium Copyright Act: ingesting and distributing licensed open-source code w/o attribution, copyright notices, and license terms
  2. Lanham Act: passed off the licensed materials as its own creation, and received unjust enrichment because users paid fees for using Copilot
  3. California Consumer Privacy Act: unfair competition and privacy
  4. Contract: violation of licenses governing the use of open-source code for training, interference with contractual relations, and fraud related to GitHub violating its own terms of service
Court: Github Copilot Text
  • Developers: (1) GitHub violated its terms of service by monetizing user code and that (2) all of the defendants violated the open source licenses attached to the code emitted by Copilot and Codex

So the plaintiff developers are moving ahead with those claims. They also now have the opportunity to seek monetary damages, and may yet look for a way to revive the unfair competition claim if copyright rules permit.

Court

OpenAI ChatGPT Text 2023.12

Does ChatGPT violate New York Times' copyrights? 2024.03
  • The Times: ChatGPT spits out portions of its articles verbatim or shares key parts of its content, violating copyright law and undercutting its business model.
  • Legal arguments: (1) training = infringement of copyright, (2) LLM = a copy or a derivative work of the Times’ works, and (3) memorization = unauthorized use of copyrighted material.
Court: OpenAI ChatGPT Text
OpenAI and journalism 2024.01
  1. OpenAI collaborate with news organizations and are creating new opportunities
  2. Training is fair use, but we provide an opt-out because it’s the right thing to do
  3. “Regurgitation” is a rare bug that we are working to drive to zero
  4. The New York Times is not telling the full story
Court: OpenAI ChatGPT Text
Authors file a lawsuit against OpenAI for unlawfully ‘ingesting’ their books 2023.07
  • Two authors: the AI tool breached copyright law by training its model on novels without permission.
  • OpenAI accused of using "shadow libraries" such as Library Genesis and Z-Library for training.
Court

Stable Diffusion / Midjourney Image

  • Getty Images: copyright infringement, accusing the AI tool of copying and processing millions of its images without proper licensing.
Court: Stable Diffusion / Midjourney Image
  • The artists: infringed the rights of millions of artists by training their AI tools on millions of images scraped from the web without their consent.

Joint Forces

AI Service Provider + Content Creator

🔹

(Microsoft + Github)
OpenAI + Shutterstock
OpenAI + Financial Times / Reddit / News Corp
Google + Reddit / Stack Overflow

Joint Forces: AI Service Provider + Content Creator

Team OpenAI

Joint Forces: AI Service Provider + Content Creator

Team Google

Joint Forces

AI Service Provider + End User

🔹

Microsoft / Github
OpenAI
Google
Adobe

Joint Forces: AI Service Provider + End User

There are important conditions to this program, recognizing that there are potential ways that our technology could intentionally be misused to generate harmful content. To protect against this, customers must use the content filters and other safety systems built into the product and must not attempt to generate infringing materials, including not providing input to a Copilot service that the customer does not have appropriate rights to use.

Joint Forces: AI Service Provider + End User

OpenAI ChatGPT Copyright Shield New models and developer products announced at DevDay 2023.11

OpenAI: Business Terms 2023.11

1️⃣ We agree to defend and indemnify you for any damages finally awarded by a court of competent jurisdiction and any settlement amounts payable to a third party arising out of a third party claim alleging that the Services (including training data we use to train a model that powers the Services) infringe any third party intellectual property right.

Joint Forces: AI Service Provider + End User
OpenAI: Business Terms 2023.11

2️⃣ This excludes claims to the extent arising from: (a) combination of any Services with products, services, or software not provided by us or on our behalf, (b) fine-tuning, customization, or modification of the Services by any party other than us, (c) the Input or any training data you provide to us, (d) your failure to comply with this Agreement or laws, regulations, or industry standards applicable to you, or (e) Customer Applications (if the claim would not have arisen but for your Customer Application).

Joint Forces: AI Service Provider + End User
OpenAI: Service terms 2024.01

1️⃣ OpenAI’s indemnification obligations to API customers under the Agreement include any third party claim that Customer’s use or distribution of Output infringes a third party’s intellectual property right.

Joint Forces: AI Service Provider + End User
OpenAI: Service terms 2024.01

2️⃣ This indemnity does not apply where: (i) Customer or Customer’s End Users knew or should have known the Output was infringing or likely to infringe, (ii) Customer or Customer’s End Users disabled, ignored, or did not use any relevant citation, filtering or safety features or restrictions provided by OpenAI, (iii) Output was modified, transformed, or used in combination with products or services not provided by or on behalf of OpenAI, (iv) Customer or its End Users did not have the right to use the Input or fine-tuning files to generate the allegedly infringing Output, (v) the claim alleges violation of trademark or related rights based on Customer’s or its End Users’ use of Output in trade or commerce, and (vi) the allegedly infringing Output is from content from a Third Party Offering.

Joint Forces: AI Service Provider + End User

Google Cloud

Shared fate: Protecting customers with generative AI indemnification 2023.10

An important note here: you as a customer also have a part to play. For example, this indemnity only applies if you didn’t try to intentionally create or use generated output to infringe the rights of others, and similarly, are using existing and emerging tools, for example to cite sources to help use generated output responsibly.

Joint Forces: AI Service Provider + End User

Adobe Firefly

Adobe is so confident its Firefly generative AI won’t breach copyright that it’ll cover your legal bills 2023.06

Adobe Creative Cloud: Adobe Firefly Q&A

Where does Firefly get its data from?
The current Firefly generative AI model is trained on a dataset of licensed content, such as Adobe Stock and public domain content where copyright has expired.

Joint Forces: AI Service Provider + End User
A clarification on Adobe Terms of Use 2024.06

1️⃣ To be clear, Adobe requires a limited license to access content solely for the purpose of operating or improving the services and software and to enforce our terms and comply with law, such as to protect against abusive content.
2️⃣ Our commitments to our customers have not changed. Adobe does not train Firefly Gen AI models on customer content. Adobe will never assume ownership of a customer's work.

Pushback

Authority
Content Creator

Pushback

Authority: Regulation

EU AI Act: first regulation on artificial intelligence 2023.12

Generative AI will have to comply with transparency requirements and EU copyright law:

  • Disclosing that the content was generated by AI
  • Designing the model to prevent it from generating illegal content
  • Publishing summaries of copyrighted data used for training
Pushback

Content Creator: Data Poisoning

Further Reading

AI vs. Copyright
Safe Use of (Generative) AI
Criticism

Further Reading
Further Reading

Safe Use of (Generative) AI

  1. Be aware of confidential info
  2. Read the contract, pay for privacy
Further Reading

Criticism

OVER