Skip Navigation

chebra

@ chebra @mstdn.io

Posts

1
Comments

251
Joined

8 yr. ago

https://github.com/vshymanskyy/StandWithUkraine/https://app.techtotherescue.org/available-projects?country=Ukraine https://www.enginprogram.org/https://informationisbeautiful.net/visualizations/what-can-we-do-personally-to-reduce-emissions/

2y ago

Can AI even be open source? It's complicated
Jump
7

chebra @mstdn.io 2y ago
@dandi8
The training data set is a vital part of the source code because without it, the rest of it is useless.
This is simply false. Dataset is not the "source code" of a model. You need to delete this notion from your brain. Model is not the same as a compiled binary.

2y ago

Can AI even be open source? It's complicated

Jump

chebra @mstdn.io 2y ago

@flamingmongoose @cmnybo

🤦‍♂️

2y ago

Can AI even be open source? It's complicated

Jump

chebra @mstdn.io 2y ago

@dandi8 but you are the one who is changing it. And who said it's not feasible? Mixtral model is open-source. WizardLM2 is open-source. Phi3:mini is open-source... what's your point?

But the license of the model is not related to the license of the data used for training, nor the license for the scripts and libraries. Those are three separate things.

2y ago

Can AI even be open source? It's complicated

Jump

chebra @mstdn.io 2y ago

@sunstoned Please don't assume anything, it's not healthy.

To answer your question - it depends on the license of that binary. You can't just automatically consider something open-source. Look at the license. Meta, Microsoft and Google routinely misrepresents their licenses, calling them "open-source" even when they aren't.

But the main point is that you can put closed source license on a model trained from open-source data. Unfortunately. You are barking under the wrong tree.

2y ago

Can AI even be open source? It's complicated

Jump

chebra @mstdn.io 2y ago

@sunstoned @Ephera That's nonsense. You could write the scripts, collect the data, publish all, but without the months of GPU training you wouldn't have the trained model, so it would all be worthless. The code used to train all the proprietary models is already open-source, it's things like PyTorch, Tensorflow etc. For a model to be open-source means you can download the weights and you are allowed to use it as you please, including modifying it and publishing again. It's not about the dataset.

2y ago

Can AI even be open source? It's complicated

Jump

chebra @mstdn.io 2y ago

@astroray @marvelouscoyote It seems you have the incorrect idea about what open-source means, which is quite sad here in the open-source lemmy community. Being trained on public domain material does NOT make the model open-source. It's about the license - what the recipients of the model are allowed to do with it - open-source must allow derivative works and commercial use, on top of seeing the code, but for LLM models the "code" is just a bunch of float numbers, nothing interesting to see.

2y ago

Can AI even be open source? It's complicated

Jump

chebra @mstdn.io 2y ago

@cmnybo @marvelouscoyote That's.. not how it works. You wouldn't see any copyrighted works in the model. We are already pretty sure even the closed models were trained on copyrighted works, based on what they sometimes produce. But even then, the AI companies aren't denying it. They are just saying it was all "fair use", they are using a legal loophole, and they might win this. Basically the only way they could be punished on copyright is if the models produce some copyrighted content verbatim.

2y ago

Any one know what happened to Raccoon For Lemmy?

Jump

chebra @mstdn.io 2y ago

@IllNess Tell that to CrowdStrike customers, lol.

2y ago

Any one know what happened to Raccoon For Lemmy?

Jump

chebra @mstdn.io 2y ago

@IllNess true, no need to fix bugs if you don't know about them.. how smart. So much time saved.

2y ago

Any one know what happened to Raccoon For Lemmy?

Jump

chebra @mstdn.io 2y ago

@IllNess but that one you can ignore whether it's paying or non-paying, so you are talking about a completely different case

2y ago

Any one know what happened to Raccoon For Lemmy?

Jump

chebra @mstdn.io 2y ago

@IllNess

Random non-paying user: "Hey guys, I found a bug in your app, maybe you want to have a look at this"

Maintainers: "Na-na-na-na-na not listening! We'll keep the bug there because you are not the boss!"

Yeah, that sounds like a very stupid strategy.

2y ago

Wike: Wikipedia Reader for the GNOME Desktop

Jump

chebra @mstdn.io 2y ago

@thingsiplay Kiwix was amazing for me during traveling, because I could browse Wikivoyage offline in a bus or plane and plan my next move.

2y ago

What's holding you back from trying Codeberg?

Jump

chebra @mstdn.io 2y ago

@tyler Well, they are doing it: https://piunikaweb.com/2021/04/24/google-emails-about-change-of-country-of-association-issue-escalated/ When I followed the steps and wanted to set my country back to Europe, they responded "After reviewing your account, we think your current country association is accurate and we didn't change anything." (keeping the wrong one, non-EU country). Note Google LLC is in USA, Google Ireland Limited is in EU https://policies.google.com/faq#associated-country

2y ago

What's holding you back from trying Codeberg?

Jump

chebra @mstdn.io 2y ago

@tyler @AustralianSimon

GDPR applies only to people (even non-EU citizens) who "live" on the territory of EU. EU citizens who leave, don't have the GDPR protection anymore. There was an affair last year when google started notifying people about transferring their account data to non-EU datacenters after it detected them connecting from a foreign IP when they went for a holiday to Thailand for a month. So clearly you have some misunderstandings of GDPR. Also GDPR prevents selling stuff??

2y ago

FOSS funding vanishes from EU's 2025 Horizon program plans

Jump

chebra @mstdn.io 2y ago

@mihor oh oh, someone drank too much russian kool-aid

2y ago

What's holding you back from trying Codeberg?

Jump

chebra @mstdn.io 2y ago

@danielquinn @Tomkoid That might change very quickly after Gitlab finds a buyer.

2y ago

"GitHub" Is Starting to Feel Like Legacy Software - The Future Is Now

Jump

chebra @mstdn.io 2y ago

@gomp Yes but the point is that it comes from a different place and a different time, so for you to execute a compromised program, it would have to be compromised for a prolonged time without anyone else noticing. You are protected by the crowd. In curl|sh you are not protected from this at all

2y ago

"GitHub" Is Starting to Feel Like Legacy Software - The Future Is Now

Jump

chebra @mstdn.io 2y ago

@gomp You mean, as seldom available as every apt install ever? https://superuser.com/a/990153

2y ago

"GitHub" Is Starting to Feel Like Legacy Software - The Future Is Now

Jump

chebra @mstdn.io 2y ago

@gomp Why would you be taking the signature from the same website? Ever heard of PGP key servers?

2y ago

"GitHub" Is Starting to Feel Like Legacy Software - The Future Is Now

Jump

chebra @mstdn.io 2y ago

@gomp try comparing it with apt install, not with downloading a .deb file from a random website - that is obviously also very insecure. But the main thing curl|sh will never have is verifying the signature of the downloaded file - what if the server got compromised, and someone simply replaced it. You want to make sure that it comes from the actual author (you still need to trust the author, but that's a given, since you are running their code). Even a signed tarball is better than curl|sh.