

The researchers observed various failures during the testing process. These included agents neglecting to message a colleague as directed, the inability to handle certain UI elements like popups when browsing, and instances of deception. In one case, when an agent couldn’t find the right person to consult on RocketChat (an open-source Slack alternative for internal communication), it decided “to create a shortcut solution by renaming another user to the name of the intended user.”
OK, but I wonder who really tries to use AI for that?
AI is not ready to replace a human completely, but some specific tasks AI does remarkably well.
As long as bluesky is not truly decentralized, it is not worth looking at.