Think faster models have to be heavier and pricier?
Cohere’s August 2024 command-r-plus-08-2024 update says otherwise.
It delivers about 50% more throughput, 20% lower latency, and halves the hardware footprint, while the smaller command-r-08-2024 now matches prior flagship performance at far lower cost.
Developers and enterprise teams should care because faster, cheaper inference and more reliable JSON, multilingual RAG, and Safety Modes cut costs and integration work.
This post breaks down the feature changes, performance gains, pricing trade-offs, and quick steps you should take to test or migrate.

Latest Command R Plus Updates (Version 08-2024, August 2024)

0ppHmr32R52NJjkGJdgZsw

Cohere dropped command-r-plus-08-2024 in August 2024, giving its flagship enterprise model some serious upgrades. The update brought 50% better throughput, 20% lower latency, and cut the hardware footprint in half compared to what came before. They also released command-r-08-2024 alongside it. Here’s the thing: this smaller version now matches what the old Command R Plus could do, but uses way less infrastructure. Both models got fine-tuned for coding, math, logical reasoning, structured data work, and JSON generation.

The 08-2024 release expanded multilingual retrieval-augmented generation across 23 languages. Better in-line citations mean less hallucination risk. Models can plan, query tools, and answer in whatever language you’re using without needing extra prompting. Cohere also rolled out Safety Modes in beta. You get Strict Mode for tight content guardrails and Contextual Mode for creative and academic work that still keeps core protections in place. Structured Outputs got better at following preamble instructions, and the model doesn’t freak out over non-semantic prompt changes anymore, so you’ll get more consistent outputs when queries are basically the same thing worded differently.

Top changes in the August 2024 release:

  • 50% throughput increase and 20% latency reduction
  • 50% smaller hardware footprint for equivalent performance
  • Support for 23 languages with enhanced multilingual RAG and citation
  • Safety Modes (beta): Strict and Contextual toggles for enterprise governance
  • Fine-tuned coding, math, reasoning, and structured data capabilities
  • Improved JSON generation and reduced prompt brittleness
  • Pricing set at $2.50 per million input tokens and $10.00 per million output tokens

For developers, this means faster inference at lower infrastructure cost. JSON outputs are more reliable, so integration gets easier. The safety controls are enterprise-ready and you can tune them to specific use cases. Throughput and hardware gains translate directly into cost savings for high-volume applications. The latency drop improves real-time interaction quality in chat and support workflows.

Performance and Benchmark Improvements

GQRxnxDBSGe8UMLkvKZPHA

Benchmarking the 08-2024 release focuses on speed, efficiency, and task accuracy compared to earlier Command R Plus versions and competing large language models.

The headline numbers show real operational gains. Throughput jumped by 50%, so the model processes way more tokens per second on the same hardware. Latency dropped by 20%, cutting response times for interactive applications like customer support and real-time code assistance. The 50% hardware footprint reduction lets teams run the model on smaller clusters or get higher concurrency on existing infrastructure. These improvements combine to lower total cost of ownership for high-throughput workloads, especially in enterprise RAG systems handling thousands of queries per hour.

Cohere tuned the model for better performance in coding, mathematics, and logical reasoning tasks. Structured data analysis improved. The model handles tabular data ingestion and generates insights more reliably now. Multilingual RAG accuracy increased across all 23 supported languages, and in-line citation quality improved to reduce hallucinations when synthesizing information from multiple documents. The updated Structured Outputs feature produces more accurate JSON, making downstream automation and API chaining more reliable. The model also became less sensitive to non-semantic prompt variations (like extra whitespace or synonym swaps), delivering more consistent outputs across equivalent queries.

Performance metric comparisons (previous version vs 08-2024):

  • Throughput: 50% increase in tokens processed per second
  • Latency: 20% reduction in average response time
  • Hardware efficiency: 50% smaller compute footprint for equivalent performance
  • Multilingual RAG accuracy: Improved citation behavior and reduced hallucination rates across 23 languages

API and Integration Updates

2fuMwz4hS66AEXIeqw0rjQ

The 08-2024 release introduced new version identifiers and improved behavior in retrieval-augmented and tool-calling workflows. Developers now specify command-r-plus-08-2024 or command-r-08-2024 in API requests to access the updated models. The models are accessible via Cohere’s hosted API and are also available on Amazon SageMaker. Plans to expand to additional cloud platforms are in the works. This cross-platform availability simplifies integration for teams already using AWS infrastructure and enables hybrid or multi-cloud deployment strategies.

Structured Outputs received updates to improve JSON generation reliability. The model follows preamble instructions more accurately now, reducing the need for post-processing or retry logic when extracting structured data from unstructured text. Multilingual RAG workflows benefit from automatic language detection and in-line citation generation, reducing the need for custom prompt engineering to handle non-English queries. The model’s reduced sensitivity to non-semantic prompt changes means fewer edge-case failures when queries are slightly rephrased or formatted differently by upstream systems.

Integration and API improvements:

  • Version identifiers updated to command-r-plus-08-2024 and command-r-08-2024
  • Availability expanded to Amazon SageMaker with additional cloud platform rollouts planned
  • Improved Structured Outputs for more reliable JSON generation and schema adherence
  • Enhanced multilingual RAG with automatic language handling and citation generation
  • Reduced prompt brittleness for more consistent outputs across minor query variations

Pricing and Access Changes

5vMxaN57T_6Fnb1z8JIMWg

Cohere set pricing for the August 2024 release at $2.50 per million input tokens and $10.00 per million output tokens for command-r-plus-08-2024. The smaller command-r-08-2024 model, now upgraded to rival the previous Command R Plus in overall performance, costs $0.15 per million input tokens and $0.60 per million output tokens. This pricing structure gives teams a clear cost-performance trade-off: command-r-08-2024 offers equivalent capability to the prior flagship at dramatically lower cost, while command-r-plus-08-2024 provides higher capacity for workloads requiring maximum accuracy and reasoning depth.

Access expanded with availability on Amazon SageMaker, letting enterprise users deploy models within their existing AWS environments. Cohere stated plans to roll out availability to additional cloud platforms, though specific timelines and regions weren’t provided. Teams using Cohere’s hosted API can access both models immediately. The Safety Modes feature launched in beta, meaning teams should test governance behavior in non-production environments before enabling Strict or Contextual Mode in customer-facing applications.

Pricing and access updates:

  • command-r-plus-08-2024: $2.50 per million input tokens, $10.00 per million output tokens
  • command-r-08-2024: $0.15 per million input tokens, $0.60 per million output tokens
  • Amazon SageMaker integration now live, with additional cloud platform expansions planned

Release History Overview

z5uV-g-ATbSMyvC3xU7qjA

Cohere’s Command R series evolved through multiple releases, each adding performance, language support, or safety features. The August 2024 update represents the most significant performance and efficiency gain to date, but earlier releases established the model’s multilingual RAG capabilities and enterprise focus.

Version Release Month/Year Key Changes
command-r-plus-08-2024 August 2024 50% throughput increase, 20% latency reduction, 50% hardware footprint reduction, Safety Modes beta, improved JSON generation
command-r-08-2024 August 2024 Upgraded to rival previous Command R Plus performance, fine-tuned coding and math, multilingual RAG across 23 languages
command-r-plus (prior) Pre-August 2024 Enterprise-focused flagship with multilingual support, structured outputs, and retrieval-augmented generation capabilities
command-r (prior) Pre-August 2024 Smaller model for cost-efficient workloads, multilingual support, basic RAG and tool-calling features

The table above shows the August 2024 release cycle introduced two new version tags and delivered the first quantified performance improvements Cohere published for the series. Earlier Command R Plus versions established the model’s reputation for multilingual retrieval and citation quality, while the smaller Command R served cost-sensitive use cases. The 08-2024 update blurs the performance gap between the two tiers, with command-r-08-2024 now matching the prior flagship’s capability at a fraction of the cost.

Practical Applications After the Latest Update

XyauQ4LMSqudkTIRZ-TtRw

The August 2024 improvements make Command R Plus faster and more reliable in retrieval-augmented workflows, structured data analysis, and multilingual applications. The 50% throughput gain and 20% latency reduction directly improve user experience in customer support chatbots, where response time affects satisfaction and agent productivity. Enterprise RAG systems benefit from improved in-line citation accuracy, reducing the risk of hallucinated references when synthesizing information from internal knowledge bases or regulatory documents.

Use cases enhanced by the 08-2024 update:

  • Enterprise RAG systems: Faster document retrieval and more accurate citations reduce hallucination risk in compliance, legal, and financial research workflows
  • Multilingual customer support: Automatic language detection and in-line citation across 23 languages simplify global support operations without custom prompt engineering
  • Code generation and analysis: Fine-tuned coding improvements make the model more reliable for generating snippets, reviewing pull requests, and explaining legacy code
  • Structured data extraction: Improved JSON generation and tabular data analysis streamline ETL pipelines, API integration, and report automation
  • Content moderation and safety: Safety Modes (Strict and Contextual) allow enterprise teams to configure governance trade-offs for customer-facing versus internal creative applications

Teams running high-volume workflows see the most immediate benefit from the throughput and hardware efficiency gains. A RAG system processing 10,000 queries per hour can now handle the same load on half the infrastructure, cutting cloud compute costs while maintaining or improving response quality. The reduced latency improves real-time interaction in chat applications, where users expect sub-second responses. For developers building on Cohere’s API, the improved JSON reliability and reduced prompt brittleness mean fewer edge cases and less post-processing logic, speeding up integration and reducing maintenance overhead.

Final Words

in the action, Cohere released Command R Plus vX.X in Month Year with a larger context window, speed boosts, accuracy gains, refreshed training-data cycles, improved multilingual support, and tightened instruction-following and safety updates.

The post walked through benchmark improvements, API and integration changes, pricing and access tweaks, a release-history overview, and practical wins like faster RAG, sharper summarization, and more reliable code generation.

If you build on the model, test the new endpoints and adjust rate limits; cohere command r plus updates make developer workflows faster and more dependable.

FAQ

Q: What does command plus r do?

A: The Command R Plus fetches relevant context, reasons over it, and returns concise, instruction-following outputs—improving retrieval-augmented generation accuracy, multilingual handling, latency, and reducing hallucinations for apps.

Q: What is cohere command r? What models does Cohere have? Does Cohere have a reasoning model?

A: Cohere’s Command R is a retrieval-focused model; Cohere offers general-purpose Command models, Command R/Command R Plus for reasoning with retrieved context, and embedding models for search—yes, the Command R series is their reasoning line.

TECH CONTENT

Latest article

More article