Skip Navigation

Posts
7
Comments
2701
Joined
1 yr. ago

  • god, the reactions to eliza is such a harbinger of doom. real cassandra moment. it's an extra weird touchstone for me because we had it on our school computers in the late 90s. the program was called DOCTOR and basically behaved identically to the original, eg find a noun and use it in a sentence. as a 9-year old i found it to be ass, but i've only recently learned that some people anthropomorphise everything and can lose themselves totally in "tell me about boats" even if they rationally know what the program is actually doing.

    as a 30-something with some understanding of natural language processing, eliza is quite nifty.

  • so for the case of inference, eg talking to chatgpt, the model is completely static. training can take weeks to months, so the model does not change when it's in use.

    the novel connections appear in training. it's just a matter of concepts being unexpectedly close together on the map.

    the mapping is absurdly complex. when i said n-dimensional, n is on the order of hundreds of billions of dimensions. i don't know the exact size of chatgpt's models but i know they're at least an order of magnitude or two larger than what can currently be run on consumer hardware. my computer can handle models with around 20 billion parameters before they no longer fit in RAM, and it's pretty beefy.

    as for the conversational ability, the inference step basically works like this:

    1. convert the input into a vector
    2. run the weighted algorithm on the input
    3. the top three or so most probable next words appear
    4. select one of the words semi-randomly
    5. append the word to the input
    6. goto 1.

    models are just giant piles of weights that get applied over and over to the input until it morphs into the output. we don't know exactly how the vectors correspond to the output, mostly because there's just too many parameters to analyse. but what comes out looks like intelligent conversation because that's what went in during training. the model predicts the next word, or location on the map as it were, and most text it has access to is grammatically correct and intelligent, so it's reasonable to assume that statishically speaking it will sound intelligent. assuming that it's somehow self-aware is a lot harder when you actually see it do the loop-de-loop thing of farting out a few words with varying confidence levels and then selecting one at random.

    my experience with this is more focused on images, which i think makes it easier to understand because images are more directly multidimensional than text.when training an image generation model, you take an input image and accompanying text description. you then basically blur the image repeatedly until it's just noise (specifically you "diffuse" it).at every step you record what the blur operation actually did to the image into the weights. you then apply those weights on the text description.the result is two of those maps: one with words, one with images, both with identical "topography".when you generate an image, you give some text as coordinates in the "word map", an image consisting of only noise as coordinates in the "image map", then ask the model to walk towards the word map coordinates. you then update your image to match the new coordinates and go again. basically, you're asking "in the direction of this text, what came before this image in the training data" over and over again.

  • it's a whole branch of mathematics. looking at it from a pure language perspective isn't really useful because language models don't really think work in language. they think work in text. "llms are just language" is misleading because language implies a certain structure while language models use a completely different structure.

    i don't have any proper sources but here's a quick overview off of the top of my head:

    a large language model is a big pile of vectors (a vector here is basically a list of numbers). the "number of parameters" in a machine learning model refers to the number of dimensions of one of those vectors (or, in programming speak, the length of the list). these vectors represent coordinates on an n-dimensional "map of words". words that are related are "closer together" on this map. when you build this map, you can then use vector math to find word associations. This is important because vector math is all hardware accelerated (because of 3D graphics).

    the training process builds the map, by looking at how words and concepts appear in the input data and adjusting the numbers in the vectors until they fit. the more data, the more general the resulting map. the inference process then uses the input text as its starting point and "walks" the map.

    the emergent behaviour that some people call intelligence stems from the fact that the training process makes "novel" connections. words that are related are close together, but so are words that sound the same, for example. the more parameters a model has, the more connections it can make, and vice versa. this can lead to the "overfitting" problem, where there amount of input data is so small that the only associations are from the actual input document. using the map analogy, there may exist particular starting points where there is only one possible path. the data is not actually "in" the model, but it can be recreated exactly. the opposite can also happen, where there are so many connections for a given word that the actual topic can't be inferred from the input and the model just goes off on a tangent.

    why this is classed as intelligence i could not tell you.

    Edit: replaced some jargon that muddied the point.


    something related: you know how compressed jpegs always have visible little squares in them? jpeg compression works by making a mathematical pattern called the discrete cosine transform, slicing it into squares, and then replacing everything in the original image with references to those squares. the more you compress the more visible those squares become, because more and more parts of the image use the same square so it doesn't match as well.

    you can do this with text models as well. increasing jpeg compression is like lowering the amount of parameters. the fewer parameters the worse the model. if you compress to much, the model starts to blend concepts together or mistake words for one another.

    what the ai bros are saying now is that if you go the other way, the model may become self-aware. in my mind that's like saying that if you make a jpeg large enough, it will become real.

  • last i looked at what talon requires, their unresolved asks spanned 17 years of wayland development.

  • all of the issues listed are closed so any recent version is fine.

    also, you probably don't need to deploy this unless you have a problem with bots.

  • i think my problem is i can't think in fractions :(

  • right, and 23/7?

  • tried em all. i hate onetab because it removes the tabs from my vision. i also used panorama when that was a thing and the same thing happened. tree style tab and a low setting for discard works best.

  • i had to recreate my profile a few years ago because some settings and tabs were coming up on 20 years old and started affecting the performance of the browser.

  • weeeeeeeeell

  • yeah but i can remember that they exist if i can see them

  • sure, when i'm done with them

  • oh ew

  • we have separate sorting bins for plastic, paper, metal, cardboard, compostables, newspapers, clear glass, dark glass, batteries, lamps, and "household waste". bottles and cans go back to the store for credit. other stuff like oils, chemicals, electronics, wood, and appliances goes to a bigger recycling facility outside of town.

    it works fairly well. we don't really have landfills anymore. the biggest problem is people not giving a shit if the receptacles are full and either cramming their shit in or just leaving it outside.

  • okay i've never heard of that. inbox zero just means having nothing in your inbox.

  • you get it.

    i tried using bookmark tags for a while but it's just a lot of extra work.

    that's one thing firefox could actually improve with their insistence on pushing ai into everything: tag my bookmarks for me and allow searching through them by topic rather than title.

  • i use hundreds of tabs, have disabled desktop icons, and run inbox zero. i refuse to fit in your boxes!

  • "i wonder what that article said about the thing i was thinking of. what was the article about again? ...what site was it?"

    anyway, ask me about my 400 open tabs

  • how should it have been written?

  • corn starch is what you put on silicone toys to prevent them from being sticky. so yes!