'People will buy intelligence from us on a meter': ChatGPT's Sam Altman's AI vision worries critics

lechekaflan@lemmy.world · 4 days ago

'People will buy intelligence from us on a meter': ChatGPT's Sam Altman's AI vision worries critics

spectrums_coherence@piefed.social · edit-2 3 days ago

Many academics around me have a paid plan of LLM of some sort, most are on $100 plan some are on $200, all of them are getting reimbursed for their plan.

Most of them uses it to optimize code, generate visualization, or formalize pen and paper proof.

I hated it, and don’t use much of it myself. But it seems too useful for these people and it is hard to stop them. As an example, formalizing a pen and paper proof can take an expert weeks, if not month of work, whereas it only takes codex a week.

But I do feel this success is tied to the nature and value of academia, and might not transfer to other fields or industrial projects:

we usually have tiny codebases: it is not uncommon to have a 10-line algorithm with a 70-page paper explaining its correctness
90%, if not more, of the codes are proof of concepts, without the expectation for long term maintainance.
the work is highly specialized, everyone is running out of time, and there is high expectation of the outcome of the work: in our recent work, we do have an expert in formalization, but he doesn’t have enough time, so the grad student formalized the project using codex. The overall architecture is probably much much worse than what would have been done by the expert. One interesting outcome is that codex is able to prove a more general result than the expert intended: not because it found a better proof, but because it is much better at bruteforcing a solution than human.

ZDL@lazysoci.al · 3 days ago

As an example, formalizing a pen and paper proof can take an expert weeks, if not month of work, whereas it only takes codex a week.

And how badly does it hallucinate while doing so?

spectrums_coherence@piefed.social · edit-2 3 days ago

That is the thing about formal proof: if the definition is correct, which usually is relatively short and should be written by human, there is almost no chance of the prove being wrong. The only exception are when the LLM exploits a bug in the proof assistant kernel, and these kernel are usually designed to be exceptionally small, thus making bugs unlikely.

That being said, opus 4.6 found a bug that eventually lead to the proof of false (opus is unable to produce the proof of false, hence unlikely to exploit it): https://github.com/rocq-prover/rocq/issues/21682

However, like I said, the code quality of the llm is usually not on par with an expert, and they have a tendency to produce unnecessary lemmas and complications that will need to be cleaned up by human.

Also, we have a very detailed pen and paper proof, which are designed to be easily translatable to proof assistants. We have also setup all the lemma and theorems to reach the end goal. All of these are done by humans, without these, I don’t believe any LLM can make much progress on this project.

redhorsejacket@lemmy.world · 3 days ago

I hate to hit you with such a tangentially related query to your main point, but what is “formalizing a proof”, and why is there such a discrepancy between the time it might take an expert (weeks if not months) vs an LLM?

(It probably goes without saying, but my college career was spent in the Humanities, where there was not much emphasis on proofs, formal or informal, so I’m curious how the other half lives.)

spectrums_coherence@piefed.social · edit-2 3 days ago

There are proof assistants https://en.wikipedia.org/wiki/Proof_assistant that would encode a mathematical proof as code, and verify its correctness for you.

Writing completely formal proof is very painstaking, because it means we will need to flash out a lot and a lot of details (which are mostly trivial for experts) for computers to accept it, and we also need to know how to work with proof assistants.

Human proofs often ignore these details to make it readable, yet also make it more prone to mistakes. Whereas formalized prove in proof assistant can very rarely be wrong (unless there is an unlikely bug in the assistant kernel), but mostly unreadable (unless the proof is incredibly elegant).

So in general, translating good human proof to computer proof requires more expert labor than huge conceptual innovation, yet it usually require the steep learning curve of understanding the ins and outs of a proof assistant, which can take years of experience.

LLM used to be pretty bad at this because even filling in trivial details can quickly derail them. Recently a few flagship coding model are finally able to do this, albeit with a large amount of token consumption in thinking.

redhorsejacket@lemmy.world · 3 days ago

Fascinating. My godfather is a mathematician at a local university, I’ll ask him whether this is something he runs into. I’ve heard him grumble about formatting his work in the past, but usually in the context of using Latex, which I gather is something else entirely.

spectrums_coherence@piefed.social · 3 days ago

Yeah LaTeX is a bit different, think about describing a process in MS Word, v.s. writing a program that performs such process on a computer: LaTeX is more like description aimed for human consumption, where as formal proof is more like a program that computers can rigorously execute.

Proof assistant have only attracted the attension of mathematicians very recently, thanks to the organization surrounding MathLib in Lean, and the promption of Terance Tao.

It also rides the AI train quite a bit, as AI have a tendency to confidently be wrong, having a computer to check its proof can be very useful.