2025, My Year of The Linux Desktop

afk_strats@lemmy.world · 10 days ago

I use a lot of PCIe extenders to abuse my consumer motherboard with too many GPUs. You should be fine as long as your processor and chipset have enough lanes and power. Cheap extenders can be had for like $10.

afk_strats@lemmy.world · 14 days ago

Oh yeah. That makes sense. I feel like people still will upload equivalent quantity but its much more curated. Back then, we’d just post shit and there wasnt a culture or an algorithm. Now people post like influencers because that’s what the algorithm likes and what social media culture is.

afk_strats@lemmy.world · 14 days ago

Can someone explain this to me? I don’t know how Facebook works

afk_strats@lemmy.world · 21 days ago

That episode is maximum cringe when you consider that the prostitute is played by Jason Bateman’s real-life sister

afk_strats@lemmy.world · 24 days ago

Oh shit. I thought you were talking about Dream Theater or some shit

afk_strats@lemmy.world · 27 days ago

This seems to be a wrapper over Kokoro which is an Apache-licenced TTS model and library which has been out for over a year. That model lists about 10% of its approx 1000-hour training data but also says that all of its training data was CC, synthetic (by way of closed source TTs), or otherwise permissively-licensed.

huggingface kokoro

afk_strats@lemmy.world · 1 month ago

Depends on how old you are

Edit: … old or very old?

afk_strats@lemmy.world · edit-2 1 month ago

I noticed the same thing. I went and tried it just now and found that there’s a reasoning switch on the web ui (it looks like a light bulb in the chat box💡). It defaults to off

afk_strats@lemmy.world · 2 months ago

I’ve been self-hosting forgejo for about 2 months for 3 people. Its been great so hard. We haven’t set up actions yet so I can’t talk to that capability, but everything has been great otherwise

afk_strats@lemmy.world · 2 months ago

I would like to file a complaint

afk_strats@lemmy.world · 2 months ago

Slopware Development

afk_strats@lemmy.world · 3 months ago

Velma vibes

afk_strats@lemmy.world · 3 months ago

I think if we lived in a sane world this would be a constant discussion in every corner of daily life

afk_strats@lemmy.world · 3 months ago

I find llama.cpp with Vulkan EXTREMELY reliable. I can have it running for days at once without a problem. As far as tokens/sec that’s that’s a complicated question because it depends on model, quant, sepculative, kv quant, context length, and card distribution. Generally:

Models’ typical speeds at deep context for agentic use. Simple chats will be faster

Model	Quant	Prompt Processing (tok/s)	Token Generation (tok/s)	Hardware	Quality
Qwen 3.5 397B	Q2_K_M	100-120	18-22	2 x 7900 + 4 x Mi50	★★★★★
Gemma4 31B or Qwen3.5 27B	Q8_0	400-800	20-25	2 x 7900xtx	★★★★
Qwen 3.6 35B	Q5_K_M	1000-2500	60-100	2 x 7900xtx	★★★★
Qwen 3.5 122B	Q4_0	200-300	30-35	4 x MI50	★★★★
gpt-oss 120b	mxfp4 (native)	500-800	50-60	3 x Mi50	★★
Nemotron 3 Nano 30B	IQ3_K_XXS	2500-3000	150-180	1 x 7900xtx	★

afk_strats@lemmy.world · 3 months ago

I’ve used Gemma4-31B for agentic coding and its actually very good as far as local models go. Its less verbose than qwen3.5 so it ends up being faster too. Gemma4-26B can do agentic but its noticeably worse so you have to go slow with it. I haven’t had any coherence issues like other commenters mention but I’ve only been using higher quality quants from unsloth on llama.cpp

afk_strats@lemmy.world · 3 months ago

I wish I bought an epyc board last year instead of my rig. Would have been far fewer headaches and, with the price of RAM, I would have quintupled in value now!

afk_strats@lemmy.world · edit-2 3 months ago

This is something I learned the hard way.

Consumer hardware is limited by multiple factors when it comes to PCIe connectivity.

Physical layout. Easy how many slots you have to plu into, their size, and configuration.
Supported lanes from the CPU
chipset (motherboard) limitations

Your graphics card might be a 16 lane card (referred to as “x16”), but sometimes, not all of them are used. Aforementioned 5060ti - I believe only uses x8. Some devices like graphics cards can use a physically smaller slot with an adapter for a loss in performance (a few frames in game play performance)

Similarly, your motherboard might have a x16 slot and another x16 at the bottom. That second slot might only function as x8 or even x4. Does this matter? Sort of. Inta-card communication aka peer to peer communication can affect affect performance and that can compound with multiple cards.

Even worse, some motherboards may have all sorts of connectivity but may have limitations like only 2 out of the bottom 4 slots, PCIe and m.2, can work at a time. ASK ME HOW I KNOW.

Your CPU controls PCIe. It has a hard cap in how many PCIe devices it can handle and what speed. AMD tends to be better here.

Enterprise gear suffers from none of this bs. Enterprise CPUs have a ton of PCIe lanes and enterprise motherboards usually match the physical size of their PCIe slots to their capacity and support full bifurcation*

PCIe lanes are used up by and consumable by m.2, MCIO, and occulink to name a few. That means that you can connect a graphics card to either one is those of you can figure out the wires and power**

** Bonus: bifurcation and how my $200 consumer motherboard runs 6 graphics cards.

Bifurcation is a motherboard feature that lets you split PCIe capacity, so a 16x slot can support two x8 devices. My motherboard lets me do this on just the main slot and in a strange x8x4x4 configuration. I have an MCIO adapter (google it) which plugs into the PCIe and gives me 3 PCIe adapters with those corresponding speeds.

it also has 2 m.2 slots which connect to the CPU. One is them, I use for a nvme ssd like a normal person. The other is an m.2 to PCIe adapter which gives me an x4 PCIe slot. For those keeping track, that’s 24 PCIe lanes so far. That’s the maximum my processor Intel 265k can handle

But wait! The motherboard also has a kind of PCIe router and that thing can handle 8 more lanes! So I use the bottom 2 PCIe lanes on my motherboard for 2 cards at x4 each. The thing that kills me is that there are more m.2 ports. But the mobo will not be able to use any more than 2 devices at once. AND even though that bottom PCIe slot is sized at x16, electrically, its x4.

Do your research (level1techs is great) and read the manuals to really understand this stuff before you buy

My mobo for reference ASUS: TUF GAMING Z890-PRO WIFI

afk_strats@lemmy.world · edit-2 3 months ago

Vulkan helps with speed. Must benchmarks prove that out. Concurrency is a mixed bag. You can get some with llama.cpp bit vllm is concurrency king.

Just a couple of weeks ago llama.cpp released tensor parallelism which helps, but its still a experimental feature.

Unfortunately, I don’t know of any diffusion runners that work in vulkan. If someone has expertise, let me know!

afk_strats@lemmy.world · 3 months ago

I’m going to be brutal with you. I spent a few thousand dollars on 176GB of AMD vram because I was happy with getting vram for cheap and I hate Nvidia. It works and its nice to be able to run bigger models at usable performance, but if you need serious concurrency or good support for diffusion, you NEED Nvidia. AMD(and likewise Intel) just doesn’t have the environment support for non-server GPUs. Again, coming from someone who’s using this shit daily.

If you understand this limitation, then yes those B70s are cool as are AMD Pro 9700 which might have slightly better support rn. You may consider nvidia V100s which are old and cheap. I always recommend people start with 3090s (as a general powerhouse) or a pair of 5060tis (for really hood llm support) though. It will make your life easy if you can afford the vram limitation

afk_strats@lemmy.world · 3 months ago

Yeah. People will notice. People will speculate. Wild differences between people’s INTERESTS tend to lead to relationship problems… usually. I think wild age differences are only weird when combined with differences in power and interest. Imo