Claude Launches Opus 4.7 and Crushes Every Coding Agent (Except Mythos)

The race for AI-powered coding reaches a new milestone. Anthropic has released Claude Opus 4.7, a model that promises to revolutionize how we delegate software development work to AI. But there's an interesting detail: while representing a significant qualitative leap over its predecessor, this model still bows to Claude Mythos Preview, Anthropic's true crown jewel which, for security reasons, remains confined to limited access.

An Evolution Targeted at Advanced Software Engineering

Claude Opus 4.7 isn't simply an incremental update. Anthropic presents it as a substantial improvement over Opus 4.6, particularly in advanced software engineering tasks. The most significant difference emerges precisely in the most complex tasks—those that traditionally required constant supervision from human developers.

Users report being able to delegate the most demanding coding work to Opus 4.7 with unprecedented confidence. The model handles complex, long-running tasks with rigor and consistency, pays precise attention to instructions, and—particularly interesting—develops methods to autonomously verify its own outputs before returning results.

This self-verification capability represents a crucial evolutionary step. We're no longer talking about a simple tool that generates code, but about a system capable of meta-reasoning about its own productions, increasingly approaching the cognitive process of a senior developer.

Enhanced Visual Capabilities and Professional Creativity

Beyond coding advances, Opus 4.7 introduces substantial improvements in computer vision. The model can now analyze images with significantly higher resolution, opening new possibilities in UI development, presentation generation, and documentation creation.

Anthropic emphasizes how the model demonstrates greater aesthetic taste and creativity in completing professional tasks, producing higher-quality interfaces, more effective slides, and more polished documentation. This aspect is particularly relevant in an enterprise context, where presentation quality often carries weight equal to the technical solidity of the code.

Benchmarks and Comparative Performance

While Opus 4.7 doesn't reach the general capabilities of Claude Mythos Preview, the model shows superior results to Opus 4.6 across a wide range of standardized benchmarks. This measured progression suggests a methodical approach from Anthropic, which prefers tested incremental releases rather than sudden and potentially problematic leaps.

Project Glasswing and Intentional Limitations

Here the narrative becomes particularly interesting from an AI safety perspective. The week before Opus 4.7's launch, Anthropic announced Project Glasswing, highlighting both the risks and benefits of AI models in the cybersecurity field.

The company explicitly stated it would keep Claude Mythos Preview's release limited and test new cybersecurity measures on less capable models first. Opus 4.7 represents precisely this: the first testing ground for advanced safeguards.

Differential Reduction of Cyber Capabilities

A particularly interesting technical aspect is that Opus 4.7's cybersecurity capabilities are deliberately inferior to those of Mythos Preview. During the training phase, Anthropic experimented with techniques to differentially reduce these capabilities, an approach we might call selective "detuning."

This represents uncharted territory in training large language models. Traditionally, the goal has always been to maximize capabilities across all fronts. Here instead, we're witnessing surgical intervention on the model's competencies, maintaining high general performance while limiting specific abilities considered high-risk.

Automatic Safeguards and the Cyber Verification Program

Opus 4.7 is released with safeguards that automatically detect and block requests indicative of prohibited or high-risk uses in the cybersecurity field. This protection system represents a real-scale laboratory: what Anthropic learns from deploying these measures will inform the development of more sophisticated protections for the future public release of Mythos-class models.

For security professionals who need to use Opus 4.7 for legitimate purposes, such as vulnerability research, penetration testing, and red-teaming, Anthropic has established the Cyber Verification Program. An accreditation system that balances the need for powerful tools for white hats with the responsibility to prevent abuse.

Availability, Platforms, and Pricing

From an accessibility standpoint, Opus 4.7 is available through a broad ecosystem:

All Claude products
Claude API (identifier: claude-opus-4-7)
Amazon Bedrock
Google Cloud Vertex AI
Microsoft Foundry

Pricing remains unchanged from Opus 4.6:

$5 per million input tokens
$25 per million output tokens

This price stability in a moment of rapid technological evolution suggests that Anthropic is absorbing the additional computational costs, likely to consolidate its market position against competitors like OpenAI and Google.

Mythos Remains Unbeaten: Strategic Implications

This article's provocative title highlights a fundamental truth: despite Opus 4.7 representing a notable step forward from the previous generation of coding agents, Claude Mythos Preview maintains its dominant position.

This intentional hierarchy reveals Anthropic's strategy: create differentiated capability levels, test security measures on mid-tier models, and keep the most advanced capabilities under tight control until adequate protections are developed.

The Future of Mythos-Class Models

Anthropic's stated goal is eventual broad release of Mythos-class models. The path goes through incremental deployment of safeguards on progressively more capable models. Opus 4.7 is the first step on this ladder, a controlled experiment that will provide valuable data for future releases.

Technical Reflections on Capability Differential

As an AI expert, I find the approach of differential capability reduction particularly fascinating. It raises complex technical and philosophical questions:

How is selective "detuning" technically implemented? Probably through targeted fine-tuning techniques with datasets that exclude or penalize specific behavior patterns, perhaps combined with reinforcement learning from human feedback (RLHF) strongly oriented toward safety.

What hidden trade-offs does this operation entail? Limiting capabilities in a specific domain can have unforeseen side effects on seemingly unrelated tasks, given the interconnected nature of representations in transformers.

Is it a scalable solution? As models become more capable, maintaining clean compartmentalizations between "safe" and "dangerous" capabilities could prove increasingly difficult.

Conclusions: A Delicate Balance Between Power and Responsibility

Claude Opus 4.7 represents a carefully calibrated equilibrium point. It offers advanced coding capabilities that respond to real developer needs, while maintaining a manageable risk profile through intentional limitations and automatic safeguards.

But the true protagonist of this story remains Claude Mythos Preview, the model Anthropic considers too powerful for general release without further precautions. This caution, which might seem excessive in an industry obsessed with release speed, could prove to be prophetic wisdom when history looks back at this crucial period of AI development.

The question we must ask ourselves isn't so much when we'll see even more capable models, but whether we'll be able to develop security frameworks robust enough to permit their widespread use without unintended consequences. Opus 4.7 is an important piece in this still-incomplete puzzle.