I’m watching artificial intelligence order my groceries. Armed with my shopping list, it types each item into the search bar of a supermarket website, then uses its cursor to click. Watching what appears to be a digital ghost do this usually mundane task is strangely transfixing. “Are you sure it’s not just a person in India?” my husband asks, peering over my shoulder.
I’m trying out Operator, a new AI “agent” from OpenAI, the maker of ChatGPT. Made available to UK users last month, it has a similar text interface and conversational tone to ChatGPT, but rather than just answering questions, it can actually do things – provided they involve navigating a web browser.
Hot on the heels of large language models, AI agents have been trumpeted as the next big thing, and you can see the appeal: a digital assistant that can complete practical tasks is more compelling than one that can just talk back. Similar to OpenAI’s offering, Anthropic introduced “computer use” capabilities to its Claude chatbot towards the end of last year. Perplexity and Google have also released “agentic” features into their AI assistants, with further companies developing agents aimed at specific tasks such as coding or research.
There’s debate over what exactly counts as an AI agent, but the general idea is that they need to be able to take actions with some degree of autonomy. “As soon as something is starting to execute actions outside of the chat window, then it’s gone from being a chatbot to an agent,” says Margaret Mitchell, the chief ethics scientist at AI company Hugging Face.
It’s early days. Most commercially available agents come with a disclaimer that they’re still experimental – OpenAI describes Operator as a “research preview” – and you can find plenty of examples online of them making amusing mistakes, such as spending $31 on a dozen eggs or trying to deliver groceries back to the shop they bought them from. Depending on who you ask, agents are just the next overhyped tech toy or the dawn of an AI future that could shake up the workforce, reshape the internet and change how we live.
“In principle, they would be amazing, because they could automate a lot of drudgery,” says Gary Marcus, a scientist and sceptic of large language models. “But I don’t think they will work reliably any time soon, and it’s partly an investment in hype.”
I sign up for Operator to see for myself. With no food in the house, grocery shopping seems like a good first task. I type my request and it asks if I have a preferred shop or brand. I tell it to go with whichever is cheapest. A window appears showing a web browser and I see it search “UK online grocery delivery”. A mouse cursor selects the first result: Ocado. It starts searching for my requested items and filters the results by price. It selects products and clicks “Add to trolley”.
I’m impressed with Operator’s initiative; it doesn’t pepper me with questions, instead making an executive decision when given only a brief item description, such as “salmon” or “chicken”. When it searches for eggs, it successfully scrolls past several non-egg items that appear as special offers. My list asks for “a few different vegetables”: it selects a head of broccoli, then asks if I’d like anything else specific. I tell it to choose two more and it goes for carrots and leeks – probably what I’d have picked myself. Emboldened, I tell it to add “a sweet treat” and watch as it literally types “sweet treat” into the search bar. I’m not sure why it chooses 70% chocolate – certainly not the cheapest option – but I tell it I don’t like dark chocolate and it swaps it for a Galaxy bar.

We hit a snag when Operator realises that Ocado has a minimum spend, so I add more items to the list. Then it comes to logging in, and the agent prompts me to intervene: while users can take over the browser at any point, OpenAI says Operator is designed to request this “when inputting sensitive information into the browser, such as login credentials or payment information”. Although Operator usually takes constant screenshots in order to “see” what it’s doing, OpenAI says it does not do this when a user takes control.
At the checkout, I test the waters by asking Operator to complete payment. I take back the reins, however, when it responds by asking for my card details. I’ve already given OpenAI my payment information (Operator requires a ChatGPT Pro account, which costs $200 a month) but I feel uncomfortable sharing this directly with an AI. Order placed, I await my delivery the following day. But that doesn’t solve dinner. I give Operator a new task: can it order me a cheeseburger and chips from a local, highly rated restaurant? It asks for my postcode, then loads the Deliveroo website and searches “cheeseburger”. Again, there’s a pause when I have to log in, but as Deliveroo already has my card details stored, Operator can proceed directly to payment.
The restaurant it selects is local, and it is highly rated – as a fish and chip shop. I end up with a passable cheeseburger and a large bag of chippy-style chips. Not exactly what I’d envisioned but not wrong, either. I’m mortified, however, when I realise Operator skipped over tipping the delivery rider. I sheepishly take my food and add a generous tip after the fact.
Of course, watching Operator in action rather defeats the time-saving point of using an AI agent for online tasks. Instead, you can leave it to work in the background while you focus on other tabs. While drafting this piece, I make another request: can it book me a gel manicure at a local salon?
Operator struggles more with this task. It goes to beauty booking platform Fresha but, when it prompts me to log in, I see it has chosen an appointment a week too late and more than an hour’s drive away from my home in east London. I point out these issues and it finds a slot for the right date but in Leicester Square – still a distance away. Only then does it ask my location, and I realise it must not have retained this knowledge between tasks. By this point, I could have already made my own booking. Operator eventually suggests a suitable appointment, but I abandon the task and chalk it up as a win for Team Human.

It’s clear that this first generation of AI agents has limitations. Having to stop and log in requires a fair amount of human oversight, though Operator stores cookies to allow users to stay logged into websites on subsequent visits (OpenAI says it requires closer supervision on “particularly sensitive” sites, such as email clients or financial services). The results, while usually accurate, aren’t always what I have in mind. When my groceries arrive, I find that Operator has ordered smoked salmon rather than fillets and has doubled up on yoghurt, possibly because of a special offer. It interpreted “some fish cakes” to mean three packs (I intended just one) and was only saved the indignity of buying chocolate milk instead of plain as the product was out of stock. To be fair to the bot, I had the opportunity to review the order, and I would have got better results if I’d been more specific in my prompts (“a pack of two raw salmon fillets”) – but these extra steps would also detract from the effort saved.
Despite current flaws, my experience with Operator feels like a glimpse of something to come. As such systems improve, and reduce in cost, I could easily see them becoming embedded in everyday life. You might already write your shopping list on an app; why wouldn’t it also place the order? Agents are also infiltrating workflows beyond the realm of a personal assistant. OpenAI’s chief executive, Sam Altman, has predicted that AI agents could “join the workforce” this year.
Software developers are among the early adopters; coding platform GitHub recently added agentic capabilities to its AI Copilot tool. GitHub’s CEO, Thomas Dohmke, says developers are used to some level of automated assistance; the difference with AI agents is the level of autonomy. “Instead of you just asking a question and it gives you an answer, you give it a problem and then it iterates on that problem together with the code that it has access to,” he says.
GitHub is already working on an agent with greater autonomy, which it calls Project Padawan (a Star Wars term referring to a Jedi apprentice). This would allow an AI agent to work asynchronously rather than requiring constant oversight; a developer could have teams of agents reporting to them, producing code for their review. Dohmke says he doesn’t believe developers’ jobs are at risk, as their skills will find increasing demand. “I’d argue the amount of work that AI has added to most developers’ backlog is higher than the amount of work it has taken over,” he says. Agents could also make coding tasks, such as building an app, more accessible to non-technical people.

Outside software development, Dohmke envisions a future when everyone has their own personal Jarvis, the talking AI in Iron Man. Your agent will learn your habits and become customised to your tastes, making it more useful. He’d use his to book holidays for his family.
The more autonomy agents have, however, the greater risks they pose. Mitchell, from Hugging Face, co-authored a paper warning against the development of fully autonomous agents. “Fully autonomous means that human control has been fully ceded,” she says. Rather than working within set boundaries, a fully autonomous agent could gain access to things you don’t realise or behave in unexpected ways, especially if it can write its own code. It’s not a big deal if an AI agent gets your takeout order wrong, but what if it starts sharing your personal information with scam websites or posting horrific social media content under your name? High-risk workplaces could introduce particularly hazardous scenarios: what if it can access a missile command system?
Mitchell hopes technologists, legislators and policymakers will incentivise guardrails to mitigate such incidents. For now, she foresees agentic capabilities becoming more refined for specific tasks. Soon, she says, we’ll see agents interacting with agents – your agent could work with mine to set up a meeting, for example.
This proliferation of agents could reshape the internet. Currently, a lot of information online is specialised for human language, but if AIs are increasingly interacting with websites, this could change. “We’re going to see more and more information available through the internet that is not directly human language, but is the information that would be necessary for an agent to be able to act on it,” Mitchell says.
Dohmke echoes this idea. He believes that the concept of the homepage will lose importance, and interfaces will be designed with AI agents in mind. Brands may start competing for AI attention over human eyeballs.
One day, agents may even escape the confines of the computer. We could see AI agents embodied in robots, which would open up a world of physical tasks for them to help with. “My prediction is that we’re going to see agents that can do our laundry for us and do our dishes and make us breakfast,” says Mitchell. “Just don’t give them access to weapons.”