Anthropic has introduced a significant upgrade to its AI lineup with the Claude 3.5 Sonnet model, which boasts an unprecedented ability for an AI to control a computer like a human. This new feature, aptly named “computer use,” is currently available in public beta, allowing developers to direct Claude to interact with desktops, click buttons, and even type out text by observing screenshots and replicating human actions.
Unlike other tech giants, such as Microsoft and OpenAI, which have showcased similar functionalities but limited their tools to viewing screens without full operational control, Anthropic has taken a bold step. Claude 3.5 can now fully engage with applications and automate workflows – potentially transforming processes from research to routine administrative tasks.
The idea of an AI working directly on a computer like a human isn’t entirely novel. Companies specializing in Robotic Process Automation (RPA) have offered similar tools for years, yet Anthropic’s approach integrates AI with a level of generality and flexibility that RPA traditionally lacks. Rather than using pre-set automation scripts, Claude 3.5’s computer use feature offers developers the ability to direct the AI using natural language, instructing it to handle repetitive tasks, conduct open-ended research, and even perform more complex operations.
Anthropic has integrated this feature through an API, allowing users to ask Claude to, for example, gather data from various sources and fill out a form, or compile information from multiple apps. The model operates by “seeing” what’s on a screen through a series of screenshots that it pieces together to form a cohesive view of the desktop. Then, based on the instructions provided, it simulates actions like moving a cursor, clicking buttons, or typing.
Though promising, the feature remains experimental. Claude’s reliance on a series of still images rather than a real-time video stream can make quick actions, like reacting to notifications, challenging. Anthropic warns that some tasks, such as dragging and zooming, still present hurdles, and there are plans for continual improvements based on feedback from early adopters.
Claude 3.5 Sonnet has demonstrated impressive results on industry benchmarks, with improved scores on tasks requiring coding and specific tool use. It scores notably higher on SWE-bench Verified, a coding benchmark, increasing its performance to 49% – better than leading publicly available AI models. On TAU-bench, which evaluates how well AI can handle real-world tasks in domains like retail and airlines, Claude’s accuracy also rose significantly.
Security and ethical considerations have been a top priority for Anthropic in releasing this technology. In response to concerns about potential misuse, such as the spread of misinformation or election interference, Anthropic has designed Claude to avoid engaging with social media, government websites, or domains associated with sensitive data. Specific prompts that could lead to risky behaviors are flagged, and Claude is designed to avoid high-risk actions unless explicitly directed by a human operator.
Additionally, the model comes equipped with classifiers that monitor its activity. These classifiers detect any attempts at social media posting, or domain registration. For further accountability, Anthropic retains screenshots from Claude’s sessions for a minimum of 30 days, ensuring a trail of its actions that could be reviewed if needed.
Anthropic recognizes that this is just the beginning. The current version of Claude 3.5 Sonnet serves as a testing ground, and the insights gained from user feedback will help the company enhance its performance and safety protocols. While the model’s ability to replicate human-like interaction with desktops opens up exciting possibilities, it also presents new challenges. Anthropic is closely monitoring its adoption to balance innovation with responsible AI use.
To cater to more price-sensitive customers, Anthropic is also preparing to release Claude 3.5 Haiku, a more cost-effective version of the model, which will offer similar benchmark performance but at a lower latency. Claude 3.5 Haiku will initially be available as a text-only model but will eventually expand to support multimodal applications, handling both text and image analysis.