Scibot!

fossilesque@mander.xyz · 3 days ago

Scibot!

Avicenna@programming.dev · 21 hours ago

academic publishers that charge thousands of euros for publishing articles are scum of the earth.

webghost0101@sopuli.xyz · 2 days ago

No no, you see they trained an ai on it. Therefore this “pirating” is a 100% legitimate practice.

stormeuh@lemmy.world · 2 days ago

The way the law is being enforced now, this should be an entirely legitimate argument. A snowball’s chance in hell though that it holds up without a legal team like OpenAI has.

MithranArkanere@lemmy.world · 2 days ago

If research was funded with public money, be it government money or from people buying their products, then that research belongs to the people.

FundMECFS@piefed.zip · 2 days ago

I tried it on a couple things that are controversial or problematic in the literature and its about what I expected. It parrots the literature, for better or worse. Which means it’s great at getting an overview of the literature and finding citations and stuff. But it’s not gonna magically figure out which papers are quality and which ones are rubbish. It’ll just parrot all of them, even if they contradict each other. Very interesting, and possibly quite a useful tool. But I really wouldn’t use it as an arbiter of truth.

JcbAzPx@lemmy.world · 2 days ago

That’s all it should do. We’re nowhere near an AI that could be an arbiter of truth. Hell, most AI couldn’t even be trusted to parrot the literature accurately.

chiliedogg@lemmy.world · 2 days ago

I would find this extremely useful as a tool to help me find sources that I then review myself - similar to how I use Wikipedia. But the danger is in people trying to use it for more.

PalmTreeIsBestTree@lemmy.world · 2 days ago

This is all it’s good for.

fossilesque@mander.xyz · 2 days ago

Chat bots are a starting place. I find them useful for rubber ducking.

NιƙƙιDιɱҽʂ@lemmy.world · 21 hours ago

It’s nice to be able to blab to the machine about shit I know no one actually wants to listen to. My partner has been saved countless hours of me going in circles about broken code lol.

fossilesque@mander.xyz · 21 hours ago

Unironically it’s helped me a lot too. Most people, I think, don’t know how it works fundamentally. I found that once I understood the basics, I could ask better questions. It’s a starting place rubber duck and can help organize things, it’s not magic. Though there’s something marvelous about teaching bits of rock and metal language.

https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/

Knock_Knock_Lemmy_In@lemmy.world · 2 days ago

It seems like a good way to kick off a literature review.

Deebster@programming.dev · 3 days ago

Have they taken out the AI generated papers? We know that training LLMs on LLM-generated text leads to an absolute collapse in quality, and we also know that AI has been showing up in papers so if they haven’t, then this will be quite unreliable.

dustycups@aussie.zone · 2 days ago

brucethemoose@lemmy.world · 2 days ago

We know that training LLMs on LLM-generated text leads to an absolute collapse in quality.

This is often repeated, and true. But needs to be qualified.

Modern LLMs use tons and tons of “augmented” data, which is code for LLM generated or massaged data. Some is even generated during training, and judged; papers on that are what made Deepseek famous.

Training on LLM trash will, of course, yield greater trash, and obviously good text has to come from something real. But that’s because slop is slop. And there are issues with “deep frying” LLMs, yes, but simply training on LLM on LLM output does not necessarily reduce quality. It often helps, significantly.

And we also know that AI has been showing up in papers so if they haven’t, then this will be quite unreliable.

Now this is a problem.

TBH LLMs would be pretty good at flagging papers for humans to check, similar to what Wikipedia is already doing. But yeah, if you just feed a prompt bad papers, LLMs just assume the context is true, generally, and that’s a tremendous problem.

T156@lemmy.world · 2 days ago

I would be surprised if it was something that they trained themselves, and not an off the shelf model hooked up to a search.

brucethemoose@lemmy.world · 2 days ago

It’s probably their own search/RAG backend, or at least their configuration of some open source project.

And that’s the important part. Get the article retrieval right, and the LLM performance isn’t that important; they could self-host Qwen 27B or something and it’d work fine.

melsaskca@lemmy.ca · 2 days ago

Those chilling FBI warnings on old videotapes mean absolutely nothing to me now.

Chakravanti@monero.town · 1 day ago

I can’t tell the difference between those and OP’s.

Not_mikey@lemmy.dbzer0.com · edit-2 2 days ago

Asked it the following to test it:

What caused the cooling at the end of the cenezoic that lead to the glacial quarternary period?

Took a while, actively showed the source articles it was looking into while it was processing which were clickable. Here’s a pdf of the response which is long, and well referenced, pretty interesting IMO, but here’s the initial overview:

The cooling at the end of the Cenozoic Era — which culminated in the glacial-interglacial cycles of the Quaternary Period — is one of Earth’s most profound climate transitions. This was not a single event but a stepwise process driven by interconnected mechanisms operating over tens of millions of years. The primary cause was a long-term decline in atmospheric CO₂ (pCO₂), driven fundamentally by plate tectonic processes that altered the global carbon cycle. Oceanic gateway openings and orbital variations played important modulating roles.

Which my partner, whose taken some climate classes in college, said sounds right. If anyone thinks this is wrong please feel free to call it out.

WorldsDumbestMan@lemmy.today · 2 days ago

You have to go into each article and check the key points, trust me.

It is a god-tier liar.

bonenode@piefed.social · 2 days ago

To be fair though, even if you read the abstracts of papers you need to go in and check the actual data itself to confirm what the authors describe is actually there.

Likewise if a paper cites another study in support and it seems weird what they say, you need to go and check that paper too.

Scientists have been inflating their claims as long as the impact factor exists (and probably longer). This now just makes it even easier to receive lies.

Evotech@lemmy.world · 2 days ago

Yeah, it’s just a model with a semantic database it can query (RAG)

belated_frog_pants@beehaw.org · 2 days ago

AI doesnt understand truth, it averages on data points. It cannot tell the “truth”. it can be right sometimes based on frequency of mentioned words and related ones.

Oriion@jlai.lu · 3 days ago

And without hallucinations ??? That sounds freaking awesome

a_non_monotonic_function@lemmy.world · 3 days ago

Of course not.

OfCourseNot@fedia.io · 3 days ago

Aye?

WhyIHateTheInternet@lemmy.world · 2 days ago

You’re them! You’re the person! Holy shit!!

msage@programming.dev · 2 days ago

That’s why you hate the internet???

Klear@quokk.au · 2 days ago

Clearly.

0ops@piefed.zip · 2 days ago

Sorry 'bout that

Madrigal@lemmy.world · 3 days ago

Yeah they added “Don’t hallucinate” to the prompt.

fartographer@lemmy.world · 2 days ago

Seems like the kind of prompt a hallucination would say

FiskFisk33@startrek.website · 2 days ago

Have they solved the huge unsolved problem no one else has solved

yeah, no.

morto@piefed.social · 3 days ago

And without hallucinations ???

Likely not

iceberg314@slrpnk.net · 3 days ago

It probably uses Retrieval Augmented Generation, which can still hallucinate, but usually does a better job for niche questions and it can even provide a source sometimes depending on how you set it up

Bongles@lemmy.zip · 3 days ago

deleted by creator

Leon@pawb.social · 3 days ago

I hate it when people use unnecessary terms to describe something.

It’s a script that runs a search and then the LLM takes the output of that and reformats it into an answer. It’s the same as feeding it a document and having it rephrase something.

TheTechnician27@lemmy.world · 3 days ago

It’s a script that runs a search and then the LLM takes the output of that and reformats it into an answer.

“I hate when people use concise, reasonably common, and understandable terminology. Why can’t we just expand everything into full sentences that are also oversimplified?”

Leon@pawb.social · 3 days ago

that aren’t even entirely accurate?

Point it out then.

RAG is literally just polling for information and rewriting it. It’s the same garbage that gave us Gemini telling us to put glue on pizza to prevent the cheese from slipping off.

You can, and should be more critical of where you source the information but it’s not going to magically make language models actually intelligent. It’s not going to make them reason, or be able to properly select what is relevant or not. Just because you give it a bunch of scientific papers doesn’t mean that the stuff they output will be accurate or not misleading.

They’re still just token prediction engines.

TheTechnician27@lemmy.world · edit-2 3 days ago

Point it out then.

Literally here. And sorry, before you posted this, I did quickly edit my comment to “oversimplified”. Because technically yes, it’s searching and using what it’s retrieved mixed with a (modified) user prompt to generate an output. But it’s searching based on a prompt (rewriting it to aid retrieval), often reranking results, stripping the query-specific context from the results into chunks, attempting to resolve contradictions between sources (which is objectively more than just rephrasing), and then synthesizing between whatever its pretraining is and what its retrieval results are (thus “retrieval-augmented generation”). That’s why I amended it to “oversimplified”: you’re, for no explicable reason, taking well-established terminology that you think people shouldn’t use (for being “unnecessary”), expanding it out to sentence-length, and even then oversimplifying the process.

Leon@pawb.social · 3 days ago

LLMs do not possess the ability to reason over the information that it is fed. It converts it to numbers and performs arithmetics on it. Augmenting it with scripts won’t change the fundamental nature of how it works.

It takes information and regurgitates it. There is no analytical capability present that makes it able to distinguish the importance between a small segue and the main points. They can just as easily combine several separate facts into a single point, and phrase things in a way that a footnote has as much weight as the main subjects.

Hiding the actual workings behind silly marketing buzzwords serves to sensationalise what these things actually do. It feeds the AI hysteria and further muddles the discussion around them. It’s why laymen think these models are basically magic and buy into the idea that they’re somehow going to solve all our problems.

I love machine learning. It is, and has historically been a fantastic tool for plenty of tasks, but it isn’t magic.

If I implement a script to automate database migrations during application deployment I could definitely market that as Deployment Ready Database Optimisations or some other BS term, but that doesn’t make it more than a simple automation.

TheTechnician27@lemmy.world · 3 days ago

LLMs do not possess the ability to reason over the information that it is fed.

Ah, yes, I forgot that if an LLM has no conscious ability to reason, then we shouldn’t have any terminology to describe the general process it’s using to create an output. Case closed. I’m glad you’ve enlightened us about how useful jargon isn’t actually useful. Data goes in, data goes out; you can’t explain that.

BigDiction@lemmy.world · 3 days ago

Need the deets asap with all that hot tea low key context? Get on the RAG!

Pre-order access for $5.99/USD month for your first 12 months. You know the next one comin’ soon!

psycotica0@lemmy.ca · 2 days ago

Sure, but RAG has a Wikipedia article about the specifics of the process, history of its use, links to papers and articles about it and its advantages and drawbacks. It’s also useful as a feature on a matrix for comparing one tool or model’s capabilities to another. None of that is true of the sentence.

Virtually all of computing could be reduced to voltages across terminals changing over time, but it can still be useful to give specific terms to specific applications of this process, so we have something to talk about.

Not_mikey@lemmy.dbzer0.com · 2 days ago

Retrieval augmented generation

is way easier to search then:

a script that runs a search and then the LLM takes the output of that and reformats it into an answer.

So if people want to look into it further and research what it is, instead of taking some persons 1 sentence explanation, they can.

Ironically trying to search for that phrase would work better in a RAG then a standard key word search.

Fmstrat@lemmy.world · 2 days ago

So… Search… Assisted… Generation?

RAG is a name from a research paper that very accurately describes what happens, but your argument seems to say you just don’t like acronyms.

VeryFrugal@sh.itjust.works · 2 days ago

Looking at the whole thing as a workflow, you’d be correct.

But RAG can be a bit more than just running a search, which implies keyword based regex style search.

brbposting@sh.itjust.works · 2 days ago

“RAG”’s tough in acronym form though concept is quite popular right now - decent summary btw I’d say (fully non-expert)

expr@piefed.social · 3 days ago

Obviously not, because that’s not possible.

TrackinDaKraken@lemmy.world · 3 days ago

What fun would that be?

Atelopus-zeteki@fedia.io · 3 days ago

I’ll keep the hallucinations for myself, tyvm.

Per sci-hub.ru this has been available since March 6th.

"Hear the good news: recent advances in artificial intelligence enabled Sci-Hub to launch a robot that gives scientifically-grounded responses to questions. The robot starts with searching for relevant literature in Sci-Hub database, then turns to selecting and reading most recent studies, and composes the answer based on this information. The answer includes all the references, and each referenced article can be read on Sci-Hub with one click.

Unlike question-answering robots that were based upon the early generation of neural networks, Sci-Hub bot does not hallucinate and is not making up scientific facts and does not cite sources that do not exist. To support its statements, Sci-Bot uses articles from Sci-Hub database. Questions can be asked in any language, and answers can be saved on server and shared.

The alpha version only supports answerig one question, and a more advanced variation that supports conversation mode is coming soon. Right column displays example questions that has been answered by robot - push the question to see the generated answer."

Oriion@jlai.lu · 3 days ago

Thanks for doing what I should have done, I actually red that and thought it sounded great. The claim of “no hallucination” should of course be taken with a grain of salt, as other comments have pointed out.

Atelopus-zeteki@fedia.io · 2 days ago

Sci-hub has been an invaluable resource. I posted a question yesterday at work. There was a queue, and it was time to leave, so I’ll see what the result was when I get over there. I’ve avoided using AI, but this was too tempting. My question was in a area where I have some knowledge, so I’m hoping I’ll be able to spot any problems in the reply.

Oriion@jlai.lu · 2 days ago

I’d be interested in having your feedback !!

takeda@lemmy.dbzer0.com · 2 days ago

LOL, of course not.

Speaking of hallucinations, I think the best way to see them is to go to Google Gemini (Reddit is selling them Reddit posts) and start a conversation about Reddit account you have and act as you don’t know anything. It usually starts good but as it progresses you can see how it is making shit up. The more you ask the more insane it gets.

And this is supposedly having all the comments at its disposal.

I also tried Lemmy as I’m sure they are also indexing it. It is telling me that I’m actually admin who created Lemmy.dbzer0.com

IrateAnteater@sh.itjust.works · 3 days ago

From what I understand from the sales brochure, these types of “AI” that are modeled on highly curated data are far less prone to hallucinations.

sobchak@programming.dev · 2 days ago

I doubt it’s fine-tuned, it’s likely just one of the open-weight LLMs with RAG. I’ve done similar things, and they don’t really work as well as I’d like (the most relevant chunks of text aren’t always ranked the highest/have the least embedding distance, and the models still hallucinate sometimes).

VeryFrugal@sh.itjust.works · 2 days ago

Hallucination is Inevitable.

Tollana1234567@lemmy.today · 2 days ago

nothing more evil than have prestigious journals gatekeep, and paywall research articles without even the scientists knowledge, which only universities and research teams are privy to. looking at nature, phytotaxa.

Psychodelic@lemmy.world · 2 days ago

Uh… it gave me ~45 min wait time and then gave up. lol

Sounds neat tho

SnarkoPolo@lemmy.world · 2 days ago

Too right! Why, if regular people can get science for free, Capitalism might not profit!

Chakravanti@monero.town · 1 day ago

Heads are rolling one way or the other no matter how many quarters you ch…err…uh…flip off.

gh0stb4tz@lemmy.world · 3 days ago

Why does the URL have a Russian government domain (.ru)? Consider me highly skeptical.

FaceDeer@fedia.io · 3 days ago

It’s where a lot of the pirate sites have found refuge from the Western copyright cartels. It’s not necessarily a government-affiliated site just because it’s got an .ru domain.

GorGor@startrek.website · 3 days ago

I want to say Russia doesn’t consider it a crime to hack as long as the system/IP you are accessing is outside Russia. No source on that cause Im lazy, so take it with a boulder of salt.

Nurse_Robot@lemmy.world · 3 days ago

I second this reply, with an additional boulder

thejml@sh.itjust.works · 3 days ago

Great, now I need a over the shoulder double boulder holder.

wylinka@szmer.info · 3 days ago

.ru is not government domain, it’s just the normal russian domain… Literally every country except for america uses the country code top level domain for everyday use.

hobovision@mander.xyz · 2 days ago

The US country code .us is also general use here. Government organizations in the US use .gov because we invented the internet and fuck you to every other government.

D1re_W0lf@piefed.social · 2 days ago

… in collaboration with France and the UK. Being “the web” later on invented at the European Organization for Nuclear Research.

hobovision@mander.xyz · 13 hours ago

Hey maybe you didn’t pick up my attempt at humor. It’s kinda messed up actually.

exixx@lemmy.world · 3 days ago

Because Alexandra Elbakyan lives in Russia. One of the official sci hub homes is .ru also

fossilesque@mander.xyz · 3 days ago

⬆️⬆️⬆️⬆️⬆️⬆️⬆️⬆️⬆️

spinnetrouble@sh.itjust.works · 3 days ago

You can check the validity of your skepticism by asking it a question and verifying its sources.

potatoguy@mbin.potato-guy.space · 3 days ago

deleted by creator

foiledAgain@lemmy.world · 3 days ago

Getting hugged to death

FiniteBanjo@feddit.online · 2 days ago

AI Sloppers lacking awareness is so sickening.

TrackinDaKraken@lemmy.world · 3 days ago

I stared at it, and didn’t know what to ask, so I closed it.