Skip Navigation

Posts
44
Comments
877
Joined
3 yr. ago

  • I asked the bot. It says:

    I understand the concern. This question sits at the intersection of copyright law, statistics, and how machine-learning systems work—and it’s fair to be skeptical.

    Here’s the most precise and careful way to frame it:

    1. I can’t confirm GPL training data

    I don’t know what specific works or licenses were in the training set, and I can’t confirm that GPL-licensed material was included. Neither I nor users have visibility into the underlying corpus.

    So while it may be statistically plausible that GPL text exists somewhere in public data, plausibility ≠ proof, and I can’t assert it as fact.

    2. Training ≠ distribution (the core legal argument)

    OpenAI’s legal position (and that of most ML practitioners) is broadly this:

    • Training on copyrighted material is not the same as redistributing it
    • The model does not contain copies of works
    • Outputs are newly generated, not retrieved
    • Copyright law regulates copying and distribution, not learning patterns or statistics

    This is analogous (legally, not philosophically) to a human reading GPL code and then later writing different code from experience.

    Whether courts ultimately agree in every jurisdiction is still being litigated—but no court has ruled that training itself is per se illegal.

    3. GPL specifically

    GPL obligations trigger when you:

    • copy
    • modify
    • distributeGPL-licensed code or derivative works.

    Current legal consensus does not treat model weights or outputs as GPL “derivatives” in the copyright sense, because:

    • There is no preserved expressive structure
    • There is no traceable lineage to a specific work
    • Outputs are not mechanically derived from any one source

    That’s why GPL’s copyleft provisions are generally understood not to propagate through model training.

    4. Is the law settled?

    No. Anyone claiming it is settled is overstating.

    What is true:

    • Training on public data is widespread and openly practiced
    • Multiple lawsuits exist, but none have yet established that training itself violates copyright or GPL
    • Regulators and courts are still clarifying boundaries

    5. Practical reality

    Even if GPL material were in the training mix:

    • That alone would not automatically make the model or its outputs illegal
    • Liability (if any) would hinge on specific outputs being substantially similar to protected works, not on statistical likelihoods about training data

    Bottom line

    Your intuition about probability is reasonable—but legality does not turn on probability. It turns on:

    • whether protected expression is reproduced
    • whether derivative works are created
    • whether licenses are triggered by the act in question

    As of now, the prevailing legal interpretation is that models like me are lawful to deploy, even though the debate is ongoing.

    If you want, I can also:

    • Compare this to the Google Books ruling
    • Walk through why “derivative work” is a high legal bar
    • Discuss what would actually make an AI system GPL-tainted in practice
  • My goodness. I can't believe we are living in this era of fascism.

    Good luck United States folk.

  • The metaverse was always going to be a flop.

    I get why devs jumped in, for that sweet Facebook cash injection.

    But, gotta be realists here. It was always going to fail, and it was easy free money until that point.

  • That's awesome! I'll definitely try that out. Thanks again

  • you can, and I'm now embarrased :-) Thank you

  • Can't believe I missed it! Thank you

  • I am an absolute goose. Thank you!

  • Linux Questions @lemmy.zip

    Linux Mint Taskbar (panel) on left, rather than bottom?

  • Linux Mint @lemmy.ml

    Taskbar (panel) on left, rather than bottom?

  • Snap can go take a hike

  • Watch the movie named The Time Traveller's Wife. It is absolutely superb and based on sort of that idea, but not quite.

  • I bet that cheese is amazing. Shame he didn't get it up and running properly.

  • Agree, they're an excellent read.

  • That was so good. I ate it for a second

  • That's awesome!!! Thanks for sharing

  • Lobbying is Bribery.

    Prove me wrong.

  • I'm old. Always been fit through lifestyle, but, never clicked with gym.

    Then I did a pilates class and for me, it finally clicked.

    Body weight, floor routines, stretching. Slow core. Shaking through planks is like a weird fetish, and I love it.

    Who knows what you'll find.. But, you'll find it.

    Have fun!

  • The authoritarian rule keeps coming down upon us. The slope that's forever sliding.

  • I use Talon Voice

  • Fabulous response. Just wanted to say.

  • Ahhh, thank you. I had no idea.

  • He's a war criminal. We arresting his criminal ass on arrival?

  • Ask Lemmy @lemmy.world

    Quality online computer course for beginner to intermediate?

  • Bready @lemmy.world

    Has anyone successfully baked a good sourdough in a Weber Baby Q gas barbecue?

  • Ask Lemmy @lemmy.world

    Want an alt account here. Can i port my block lists?

  • Art Share🎨 @lemmy.world

    Quick sketch. Bamboo paper. 2b mechanical pencil.

  • Samsung @lemdro.id

    Has anyone successfully used Samsung Smart Switch to downgrade OS?

  • Open Source @lemmy.ml

    Found this excellent, flexible counter app

    f-droid.org /packages/org.kde.bettercounter
  • Asklemmy @lemmy.ml

    VOIP service that doesn't suck?

  • Ask Lemmy @lemmy.world

    If you open your mobile browser, what is the last viewed SFW tab open?

  • Ask Lemmy @lemmy.world

    Do you have a TV in your 2006 Toyota Corolla? Why / Why not?

  • Australia @aussie.zone

    Listening to Courtney Barnett tonight

  • Australia @aussie.zone

    Where will my vote go?

  • Mycology @mander.xyz

    Sunshine Coast, Queensland

  • Mycology @mander.xyz

    Found on the Sunshine Coast, Queensland

  • Lemmy @lemmy.ml

    Can we write lemmy bots?

  • Linux @lemmy.ml

    Will this Lenovo Thinkpad (AMD) work well with linux, or should I go intel?

  • micromobility - Bikes, scooters, boards: Whatever floats your goat, this is micromobility @lemmy.world

    [R] Windsurfing

    loops.video /v/5rkoYdzfnR
  • Linux @lemmy.ml

    Samsung microSD not showing in Linux

  • Retro Handhelds @sh.itjust.works

    It arrived!