Skip Navigation

InitialsDiceBearhttps://github.com/dicebear/dicebearhttps://creativecommons.org/publicdomain/zero/1.0/„Initials” (https://github.com/dicebear/dicebear) by „DiceBear”, licensed under „CC0 1.0” (https://creativecommons.org/publicdomain/zero/1.0/)V
Posts
0
Comments
844
Joined
3 yr. ago

  • yes, the matrix and several levels are the "decompression". At the end you get one probability distribution, deterministically. And the state is the whole context, not just the previous token. Yes, if we were to build the table manually with only available data, lots of cells would just be 0. That's why the compression is lossy. There would actually be nothing stopping anyone from filling those 0 cells out, it's just infeasible. you could still put states you never actually saw, but are theoretically possible in the table. And there's nothing stopping someone from putting thought into it and filling them out.

    Also you seem obsessed by the word table. A table is just one type of function mapping a fixed input to a fixed output. If you replaced it with a function that gives the same outputs for all inputs, then it's functionally equivalent. It being a table or some code in a function is just an implementation detail.

    As a thought exercise imagine setting temperature to 0, passing all the combinations of tokens of input, and record the output for every single one of them. put them all in a "table" (assuming you have practically infinite space) and you have a markov chain that is 100% functionally equivalent to the neural network with all its layers and complexity. But it does it without the neural network, and gives 100% identical results every single time in O(1). Because we don't have infinite time and space, we had to come up with a mapping function to replace the table. And because we have no idea how to make a good approximation of such a huge function, we use machine learning to come up with a suitable function for us, given tons of data. You can introduce some randomness in the sampling of that, and you now have nonzero temperature again.

    Ex. A table containing the digits of pi, in order, could be transparently replaced with a spigot algorithm that calculates the nth digit on-demand. Output would be exactly the same

  • the probabilities are also fixed after training. You seem to be conflating running the llm with different input to the model somehow adapting. The new context goes into the same fixed model. And yes, it can be reduced to fixed transition logic, you just need to have all possible token combinations in the table. This is obviously intractable due to space issues, so we came up with a lossy compression scheme for it. The table itself is learned once, then it's fixed. The training goes into generating a huge markov chain. Just because the table is learned from data, doesn't change what it actually is.

  • an llm works the same way! Once it's trained,none of what you said applies anymore. The same model can respond differently with the same inputs specifically because after the llm does its job, sometimes we intentionally don't pick the most likely token, but choose a different one instead. RANDOMLY. Set the temperature to 0 and it will always reply with the same answer. And llms also have a fixed order state transition. Just because you only typed one word doesn't mean that that token is not preceded by n-1 null tokens. The llm always receives the same number of tokens. It cannot work with an arbitrary number of tokens.

    all relevant information "remains in the prompt" only until it slides out of the context window, just like any markov chain.

  • Snaaaaaake!!!!!

  • apart from sharing a cpu architecture, that's pretty much where the similarity ends. Just because the cpu is x64 doesn't mean the software will work with the rest of the architecturally different hardware (ex. data buses that simply don' t exist on pc)

  • an llm also works on fixed transition probabilities. All the training is done during the generation of the weights, which are the compressed state transition table. After that, it's just a regular old markov chain. I don't know why you seem so fixated on getting different output if you provide different input (as I said, each token generated is a separate independent invocation of the llm with a different input). That is true of most computer programs.

    It's just an implementation detail. The markov chains we are used to has a very short context, due to combinatorial explosion when generating the state transition table. With llms, we can use a much much longer context. Put that context in, it runs through the completely immutable model, and out comes a probability distribution. Any calculations done during the calculation of this probability distribution is then discarded, the chosen token added to the context, and the program is run again with zero prior knowledge of any reasoning about the token it just generated. It's a seperate execution with absolutely nothing shared between them, so there can't be any "adapting" going on

  • a lambdo which can only contain one expression, and not even a statement is pretty much useless. For anything nontrivial you have to write a separate function and have the lambda be just a function call expression. Which completely defeats the point

  • I don't read spoken language, but I do read written ones. The problem with python's ternary is that it puts the condition in the middle, which means you have to visually parse the whole true:expression just to see where the condition starts. Which makes it hard to read for anything but the most trivial examples.

    The same goes for comprehensions and generators

  • their input is the context window. Markov chains also use their whole context window. Llms are a novel implementation that can work with much longer contexts, but as soon as something slides out of its window, it's forgotten. just like any other markov chain. They don't adapt. You add their token to the context, slide the oldest one out and then you have a different context, on which you run the same thing again. A normal markov chain will also give you a different outuut if you give it a different context. Their biggest weakness is that they don't and can't adapt. You are confusing the encoding of the context with the model itself. Just to see how static the model is, try setting temperature to 0, and giving it the same context. i.e. only try to predict one token with the exact same context each time. As soon as you try to predict a 2nd token, you've just changed the input and ran the thing again. It's not adapting, you asked it something different, so it came up with a different answer

  • previous input goes in. Completely static, prebuilt model processes it and comes up with a probability distribution.

    There is no "unlike markov chains". They are markov chains. Ones with a long context (a markov chain also kakes use of all the context provided to it, so I don't know what you're on about there). LLMs are just a (very) lossy compression scheme for the state transition table. Computed once, applied blindly to any context fed in.

  • from the unending death that he himself (holy trinity and all that) would have inflicted upon you.

  • he could have just, you know, forgiven them. Like he preached. If I kill myself over a grudge I hold towards you, that just makes me an idiot. And, If also I go around preaching forgiveness to everyone else, a hypocrite

  • every chat app needs over a gig of ram to itself for "developer productivity"

  • sudo -i

  • where did i say it's less secure? I said it will be coded around. as in forked and the changes patched out/worked around. The point is that it's pointless to even try. Because it won't work for those who do choose to use it, due to all the ones bypassing it

  • if it's linux, it has to be open source. If it's open source, people will code around it immediately. How about not trying to shoehorn this useless crap in the first place?

  • if only they had some sort of id card they kept on their person at all times.... The government would know who they are. They are explicitly against this. So, "leopards ate my face"

  • that's one of the reasons I specifically picked a bright lime green for my car

  • has been a long time since I used teams on my phone. It used to log me out constantly there too... but it kept sending notifications of all messages to the phone until I opened the actual app to be told I was logged out