vrighter

@ vrighter @discuss.tchncs.de

Posts

0
Comments

844
Joined

3 yr. ago

9mo ago

Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all. They just memorize patterns really well.
Jump
8

vrighter @discuss.tchncs.de 9mo ago

"lacks internal computation" is not part of the definition of markov chains. Only that the output depends only on the current state (the whole context, not just the last token) and no previous history, just like llms do. They do not consider tokens that slid out of the current context, because they are not part of the state anymore.
And it wouldn't be a cache unless you decide to start invalidating entries, which you could just, not do.. it would be a table with token-alphabet-size^context length size, with each entry being a vector of size token_alphabet_size. Because that would be too big to realistically store, we do not precompute the whole thing, and just approximate what each table entry should be using a neural network.
The pi example was just to show that how you implement a function (any function) does not matter, as long as the inputs and outputs are the same. Or to put it another way if you give me an index, then you wouldn't know whether I got the result by doing some computations or using a precomputed table.
Likewise, if you give me a sequence of tokens and I give you a probability distribution, you can't tell whether I used A NN or just consulted a precomputed table. The point is that given the same input, the table will always give the same result, and crucially, so will an llm. A table is just one type of implementation for an arbitrary function.
There is also no requirement for the state transiiltion function (a table is a special type of function) to be understandable by humans. Just because it's big enough to be beyond human comprehension, doesn't change its nature.

9mo ago

Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all. They just memorize patterns really well.

yes, the matrix and several levels are the "decompression". At the end you get one probability distribution, deterministically. And the state is the whole context, not just the previous token. Yes, if we were to build the table manually with only available data, lots of cells would just be 0. That's why the compression is lossy. There would actually be nothing stopping anyone from filling those 0 cells out, it's just infeasible. you could still put states you never actually saw, but are theoretically possible in the table. And there's nothing stopping someone from putting thought into it and filling them out.

Also you seem obsessed by the word table. A table is just one type of function mapping a fixed input to a fixed output. If you replaced it with a function that gives the same outputs for all inputs, then it's functionally equivalent. It being a table or some code in a function is just an implementation detail.

As a thought exercise imagine setting temperature to 0, passing all the combinations of tokens of input, and record the output for every single one of them. put them all in a "table" (assuming you have practically infinite space) and you have a markov chain that is 100% functionally equivalent to the neural network with all its layers and complexity. But it does it without the neural network, and gives 100% identical results every single time in O(1). Because we don't have infinite time and space, we had to come up with a mapping function to replace the table. And because we have no idea how to make a good approximation of such a huge function, we use machine learning to come up with a suitable function for us, given tons of data. You can introduce some randomness in the sampling of that, and you now have nonzero temperature again.

Ex. A table containing the digits of pi, in order, could be transparently replaced with a spigot algorithm that calculates the nth digit on-demand. Output would be exactly the same

vrighter

@ vrighter @discuss.tchncs.de

Posts

0
Comments

844
Joined

3 yr. ago

vrighter

Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all. They just memorize patterns really well.

Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all. They just memorize patterns really well.

Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all. They just memorize patterns really well.

Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all. They just memorize patterns really well.

Looking for the perfect 5 year anniversary gift?

“Freedom of windows” 🤡

Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all. They just memorize patterns really well.

The meaning of `this`

The meaning of `this`

Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all. They just memorize patterns really well.

Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all. They just memorize patterns really well.

That's a good question

That's a good question

System requirements for me and not for thee

sudo su -

Will kernel-level anti-cheat ever work on linux?

Will kernel-level anti-cheat ever work on linux?

Could You Prove You’re a Citizen? For Americans wrongfully detained by ICE, it can be nearly impossible to escape.

What are the modern design trends you hate most?

What's something about Microsoft Outlook that drives you crazy?