The idea is that it isn't just operating the vending machine itself, it's operating the entire vending machine business. It decides what to stock and what price to charge based on market trends and/or user feedback.
It's a stress test for LLM autonomy. Obviously a vending machine doesn't need this level of autonomy, you usually just stock it with the same thing every time. But a vending machine works as a very simple "business" that can be simulated without much stakes, and it shows how LLM agents behave when left to operate on their own like this, and can be used to test guardrails in the field.
It's only "running" the business so much. The physical stocking and purchasing happens by human hands, who would presumably not buy anything that would bankrupt the company because then it's on them.
Here's Anthropic's article about the previous stage of this project that explains it pretty well. Part two is a good read too though.
https://www.anthropic.com/research/project-vend-1