They don't have the same quirks in some cases, but do in others.
Part of the shared quirks are due to architecture similarities.
Like the "oh look they can't tell how many 'r's in strawberry" is due to how tokenizers work, and when when the tokenizer is slightly different, with one breaking it up into 'straw'+'berry' and another breaking it into 'str'+'aw'+'berry' it still leads to counting two tokens containing 'r's but inability to see the individual letters.
In other cases, it's because models that have been released influence other models through presence in updated training sets. Noticing how a lot of comments these days were written by ChatGPT ("it's not X — it's Y")? Well the volume of those comments have an impact on transformers being trained with data that includes them.
So the state of LLMs is this kind of flux between the idiosyncrasies that each model develops which in turn ends up in a training melting pot and sometimes passes on to new models and other times don't. Usually it's related to what's adaptive to the training filters, but it isn't always can often what gets picked up can be things piggybacking on what was adaptive (like if o3 was better at passing tests than 4o, maybe gpt-5 picks up other o3 tendencies unrelated to passing tests).
Though to me the differences are even more interesting than the similarities.
It's more like they are a sophisticated world modeling program that builds a world model (or approximate "bag of heuristics") modeling the state of the context provided and the kind of environment that produced it, and then synthesize that world model into extending the context one token at a time.
But the models have been found to be predicting further than one token at a time and have all sorts of wild internal mechanisms for how they are modeling text context, like building full board states for predicting board game moves in Othello-GPT or the number comparison helixes in Haiku 3.5.
The popular reductive "next token" rhetoric is pretty outdated at this point, and is kind of like saying that what a calculator is doing is just taking numbers correlating from button presses and displaying different numbers on a screen. While yes, technically correct, it's glossing over a lot of important complexity in between the two steps and that absence leads to an overall misleading explanation.