Skip Navigation

InitialsDiceBearhttps://github.com/dicebear/dicebearhttps://creativecommons.org/publicdomain/zero/1.0/„Initials” (https://github.com/dicebear/dicebear) by „DiceBear”, licensed under „CC0 1.0” (https://creativecommons.org/publicdomain/zero/1.0/)M
Posts
2
Comments
1000
Joined
1 yr. ago

  • It’s good at writing it, ideally 50-250 lines at a time

    I find Claude Sonnet 4.5 to be good up to 800 lines at a chunk. If you structure your project into 800ish line chunks with well defined interfaces you can get 8 to 10 chunks working cooperatively pretty easily. Beyond about 2000 lines in a chunk, if it's not well defined, yeah - the hallucinations start to become seriously problematic.

    The new Opus 4.5 may have a higher complexity limit, I haven't really worked with it enough to characterize... I do find Opus 4.5 to get much slower than Sonnet 4.5 was for similar problems.

  • I frequently feel that urge to rebuild from ground (specifications) up, to remove the "old bad code" from the context window and get back to the "pure" specification as the source of truth. That only works up to a certain level of complexity. When it works it can be a very fast way to "fix" a batch of issues, but when the problem/solution is big enough the new implementation will have new issues that may take longer to identify as compared with just grinding through the existing issues. Devil whose face you know kind of choice.

  • I find this kind of performance to vary from one model to the next. I definitely have experienced the bad image getting worse phenomenon - especially with MS Copilot - but different models will perform differently.

  • There’s no point telling it not to do x because as soon as you mention it x it goes into its context window.

    Reminds me of the Sonny Bono high speed downhill skiing problem: don't fixate on that tree, if you fixate on that tree you're going to hit the tree, fixate on the open space to the side of the tree.

    LLMs do "understand" words like not, and don't, but they also seem to work better with positive examples than negative ones.

  • constantly fail to even compile because, for example, they mix usages of different SDK versions

    Try an agentic tool like Claude Code - it closes the loop by testing the compilation for you, and fixing its mistakes (like human programmers do) before bothering you for another prompt. I was where you are at 6 months ago, the tools have improved dramatically since then.

    From TFS > I needed to make a small change and realized I wasn’t confident I could do it. My own product, built under my direction, and I’d lost confidence in my ability to modify it.

    That sounds like a "fractional CTO problem" to me (IMO a fractional CTO is a guy who convinces several small companies that he's a brilliant tech genius who will help them make their important tech decisions without actually paying full-time attention to any of them. Actual tech experience: optional.)

    If you have lost confidence in your ability to modify your own creation, that's not a tools problem - you are the tool, that's a you problem. It doesn't matter if you're using an LLM coding tool, or a team of human developers, or a pack of monkeys to code your applications, if you don't document and test and formally develop an "understanding" of your product that not only you but all stakeholders can grasp to the extent they need to, you're just letting the development run wild - lacking a formal software development process maturity. LLMs can do that faster than a pack of monkeys, or a bunch of kids you hired off Craigslist, but it's the exact same problem no matter how you slice it.

  • where the massive decline in code quality catches up with big projects.

    That's going to depend, as always, on how the projects are managed.

    LLMs don't "get it right" on the first pass, ever in my experience - at least for anything of non-trivial complexity. But, their power is that they're right more than half of the time AND when they can be told they are wrong (whether by a compiler, or a syntax nanny tool, or a human tester) AND then they can try again, and again as long as necessary to get to a final state of "right," as defined by their operators.

    The trick, as always, is getting the managers to allow the developers to keep polishing the AI (or human developer's) output until it's actually good enough to ship.

    The question is: which will take longer, which will require more developer "head count" during that time to get it right - or at least good enough for business?

    I feel like the answers all depend on the particular scenarios - some places some applications current state of the art AI can deliver that "good enough" product that we have always had with lower developer head count and/or shorter delivery cycles. Other organizations with other product types, it will certainly take longer / more budget.

    However, the needle is off 0, there are some places where it really does help, a lot. The other thing I have seen over the past 12 months: it's improving rapidly.

    Will that needle ever pass 90% of all software development benefitting from LLM agent application? I doubt it. In my outlook, I see that needle passing +50% in the near future - but not being there quite yet.

  • Yeah, it's not "nowhere" - but it's really far from "everywhere" considering we've been rolling it out for 25 years now. I think you're right: glass is cheaper than copper these days, and if they've got to repair/replace the copper it's probably cheaper to just run the glass. They put a line down the main road 1/4 mile from our home last year (suburban area in a 1M pop city), and lots of people who live on that main road have gotten fiber to the home service, but they're not interested in running the extra 1500 feet to reach us yet. I'd guess in our city of 1M, maybe 200,000 have potential fiber to the home service if they want it, the rest of us are stuck with re-heated cable TV co-ax for our broadband.

  • They are starting to roll it out in fits and starts in the major metro areas at least, but yeah, 20 years late and nowhere near as universally as promised when our service providers took all those government grants and then didn't deliver, IMO.

  • After .com popped, all the money ran to install fiber data infrastructure - a lot of installs put in more capacity than they projected using for 100 years (glass fibers are cheap, digging trenches for them is expensive). The promise of "fiber to the home" is still mostly unrealized, but those trunk lines are out there with oodles of "dark fiber" ready to carry data... someday.

  • It's not even about money or financials that add up on balance sheets. It's about market share, political power. When you're Too Big To Fail, balance sheets cease to matter.

  • I may "be used to" Autotune, but it's not in 90%+ of the music I listen to.

  • My guest wifi network is automatically segregating (as default from Netgear). So, if three guests are on it, none can "see" the others, only the internet.

  • My guests get WiFi access to the internet when they ask. What they don't get is WiFi access to our home systems network. When they don't ask, I assume they're just fine paying for their own cellular data.

  • In the work I have done with Claude over the past months, I have not learned to trust it for big things - if anything the opposite. It's a great tool, but - to anthropomorphize - it's "hallucination rate" is down there with my less trustworthy colleagues. Ask it to find all instances of X in this code base of 100 files of 1000 lines each... yeah, it seems to get bored or off-track quite a bit, misses obvious instances, finds a lot but misses too much to say it's really done a thorough review. If you can get it to develop a "deterministic process" for you (shell script or program) and test that program, then that you can trust more, but when the LLM is in the loop it just isn't all there all the time, and worse: it'll do some really cool and powerful things 19/20 times, then when you think you can trust it it will screw up an identical sounding task horribly.

    I was just messing around with it and I had it doing a files organization and commit process for me, was working pretty good for a couple of weeks, then one day it just screwed up and irretrievably deleted a bunch of new work. Luckily it was just 5 minutes of its own work, but still... that's not a great result.

  • Agree, I've been using claude extensively for about a month, before that for little stuff for about 3 months. It is great at little stuff. It can whip out a program to do X in 5 minutes flat, as long as X doesn't amount to more than about 1000 lines of code. Need a parser to sift through some crazy combination of logic in thousands of log files: Claude is your man for that job. Want to scan audio files to identify silence gaps and report how many are found? Again, Claude can write the program and generate the report for you in 5 minutes flat (plus whatever time the program takes to decode the audio...)

    Need something more complex, nuanced, multi-faceted? Yeah, it is still easier to do most of the upper level design stuff yourself, but if you can build a system out of a bunch of little modules, AI is getting pretty good at writing the little modules.

  • If you install Microsoft Windows 11 AI edition on your PC and let these AI features run, you get what you deserve.

    regardless of whether or not customers want it. They don’t have a say in the matter except the more tech savvy of them who will find ways to edge around the restrictions

    The tech savvy will run Linux. They all (tech savvy or not) have a say in the matter. Even my non-tech savvy wife has been using an Ubuntu laptop, purchased direct from Dell, pre-configured by the factory with Ubuntu 22.04 for the past 3 years. I recently talked her off of her Samsung fetish into the slightly less evil Pixel line of phones. It's a purchase and use decision. Walking away from Windows isn't all that hard for most people, if they would just do it. Most are so bloody apathetic, they get what they deserve.

    (Some) corporations are going to go hard for the AI in Windows on corporate IT managed machines because "magic free productivity fairy dust..." no, they don't know how it works, or if it will work, or if it will be a bigger waste of time than the Solitaire app, but it's new and a lot of corporations embrace the new simply based on Fear Of Missing Out.

    The lock performs a singular function adequately enough for the risk involved for most people. And it does it passively.

    The AI is not the same no matter how often or how hard you try to shoehorn it into your silly analogy.

    Technology marches on, the world does get more complicated. Before we had metal keys that had to be made by keysmiths, there were more simple latches that people could open but most animals couldn't. Metal keys introduced all kinds of complexity and inter-dependencies and failure modes, but generally we have adopted them as the preferred solution over a peg through two holes.

    stop drinking the flavorade for five minutes and just think about the fact that people don’t want this

    A lot of people do want it, I'm not saying that people who don't want it should be forced to use it, far from that. But people have to start standing up for themselves when it comes to what tech they do and don't allow into their lives. Nobody is making people wear smartwatches, or have smart-speaker(microphones) in their homes, and you're not actually forced to use any particular desktop operating system either. Maybe your job forces you to use one for work, that's why you get the paycheck - for doing what they want.

    Microsoft is saying that they know it’s problematic but they are forcing it on people anyway.

    Only the people who let themselves be forced. Our local dominant grocery chain started inflating their prices radically about 7 years ago, we have plenty of other stores around town, but over half are this dominant chain. I shopped in that chain my whole life, since my grandmother pushed me around in the cart, I stocked shelves in one during college, and it was our 95%+ source of food up until about 7 years ago. I finally had enough with the price abuse when they were about 30% higher than the competition, we stopped going there. They're over 100% higher than the competition now in most prices and people STILL shop there in droves. Nobody is forcing them to, they're volunteering to pay double to keep using their familiar grocery store.

    I hope the world of desktop operating systems is different, but it's probably not. People who put up with intrusive agents on their PCs doing things they don't understand: get what they deserve.

  • the door lock is not doing anything of its own volition

    Neither does an AI agent. You give it power (electricity), you give it access to your computer / phone, any cloud storage accounts you may have, local NAS, network connectivity. You do all these things just like you install a lock on a door, or don't. Once the lock is installed and you leave the premises, you are trusting the lock to do what it does.

    If you hand an AI your CC#, you get what you deserve.

    If you hand an AI access to your hard drive and you store your CC# on your hard drive, you get what you deserve.

    If you leave your door unlocked and the school bus lets a bunch of 14 year olds off by your house while you're away, you get what you deserve.

    If you install Microsoft Windows 11 AI edition on your PC and let these AI features run, you get what you deserve.

    I have many "smart home" appliances and features. They do not: control things that make fire, control the lights on our staircase, control the house door locks. I give them such access as I trust them with. I do "overtrust" one with alarm clock features, and the morning our power went out at 4AM we overslept, just like would have happened if we used an old 1960s style electric alarm clock. You can go back to wind-up with bells, if you like, or you can accept that the modern world isn't always more reliable than the older ways.

    The AI LLM is doing stuff both of its own volition

    The AI stuff I have been working with has an explicit switch: Agent mode vs Plan mode. In Agent mode it can (and frequently does) do all sorts of surprising things, some good, some bad. In Plan mode all it does is throw responses up on the screen for me to read, no modification of files on my system. I effectively ran in "Plan mode" for a few months, copy-pasting stuff by hand back and forth - it was still more useful than web-search, imperfect, annoyingly incorrect at times, but I was in "total control" over what got written to (and read from) files on my system. I've had Agent mode access for about 6 weeks now. All in all, Agent mode is 10x more productive. And I have never, ever, even slightly considered the thought of handing it my CC#, though I'm sure many people will, and eventually we'll get a story about how one of these wonky agents ordered three lifetime supplies of Tide Pods on Amazon when it was asked to get some detergent.

  • A door lock can’t buy up Amazon’s entire stock of tide pods on my credit card.

    But it can let in a burglar who can find your credit card inside and do the same. And why are you giving AI access to your CC#? You'd better post it here in a reply so I can keep it safe for you.

    A door lock can’t turn on someone’s iot oven while they’re out of town.

    But it can let in neighborhood children who will turn on your gas stove without lighting it while you're out of town.

    A door lock can’t publish every email some journalist has ever received to xitter.

    True, the journalist, or his soon-to-be-ex-spouse, can "accidentally" do that themselves - and I suppose the ex-spouse who still has a copy of the key can "fool" the lock with that undisclosed copy of the key while the journalist is out having sushi with his mistress.

    A mechanical door lock doesn’t hallucinate extra fingers, and draw them into all the family photos saved on a person’s hard drive.

    I've worked with AI for a while now, it's not going to up and hallucinate to do that - unless you ask it to do something related.