Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
AI reasoning models — those that produce “chains-of-thought” (CoT) in text and reflect on their own analysis to try and catch errors midstream before outputting a response — are all the rage now thanks to the likes of DeepSeek and OpenAI’s “o” series.
Still, it’s pretty incredible to me the speed at which the reasoning model approach has spread across the AI industry, with this week’s announcement that there’s yet another new model to try, this one from the mysterious yet laudably principled Nous Research collective of engineers, whose entire mission since launching in New York City in 2023 has been to make “personalized, unrestricted” AI models — often by taking and fine-tuning or retraining open-source models such as Meta’s Llama series and those from French startup Mistral.
As posted on the Nous Research account on X and in the firm’s Discord channel, this new open reasoning model is called “DeepHermes-3 Preview,” and is described as an “LLM [large language model] that unifies reasoning and intuitive language model capabilities,” and allows the user to switch at will between longer reasoning processes and shorter, faster, less computationally demanding responses.
It’s an 8-billion parameter (settings count) variant of Hermes 3, itself a variant of Meta’s Llama released by Nous back in August 2024. Sample exchanges have shown that it could enter into metacognition-like displays of thinking about itself and the role of AI compared to human consciousness, trigging something approaching an existential crisis in the model’s outputs.
Users can download the full model code on HuggingFace and a version that’s been quantized (reduced bit count) and saved in the GPT-generated unified format (GGUF), which is designed to run model inferences (the actual production build, as opposed to training) on consumer-grade PCs and servers.
Nous today wrote that its researchers “hope our unique approach to user controlled, toggleable reasoning mode furthers our mission of giving those who use DeepHermes more steerability for whatever need they have.”
Building on Hermes 3: The data and training approach
DeepHermes-3 builds on the Hermes 3, a meticulously curated multi-domain dataset that Nous Research developed for the broader Hermes 3 series.
According to the Hermes 3 Technical Report released in August, this dataset is composed of approximately 390 million tokens spanning diverse instructional and reasoning-based domains.
The dataset is broken down into the following key categories:
- General instructions (60.6%): Broad, open-ended prompts similar to those found in general-purpose AI chat models.
- Domain expert data (12.8%): Specialized knowledge in fields like science, law and engineering.
- Mathematics (6.7%): Advanced problem-solving datasets aimed at improving numerical and logical reasoning.
- Roleplaying and creative writing (6.1%): Data designed to enhance storytelling and simulated dialogue.
- Coding and software development (4.5%): Code generation and debugging tasks.
- Tool use, agentic reasoning and retrieval-augmented generation (RAG) (4.3%): Training on function calling, planning and knowledge retrieval.
- Content generation (3.0%): Writing, summarization and structured output tasks.
- Steering and alignment (2.5%): Data focused on making the model highly steerable and responsive to user prompts.
In addition, the pseudonymous Nous Research team member @Teknium (@Teknium1 on X) wrote in response to a user of the company’s Discord server that the model was trained on “1M non cots and 150K cots,” or 1 million non-CoT outputs and 150,000 CoT outputs.
This data mixture supports DeepHermes-3’s unique ability to toggle between intuitive responses and deep, structured reasoning, a key feature that distinguishes it from other LLMs.
How toggleable reasoning mode works
DeepHermes-3 allows users to control its reasoning depth using a system prompt. The user must enter the following text before a prompt to “toggle on” the model’s reasoning mode:
“You are a deep thinking AI, you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside tags, and then provide your solution or response to the problem.“
When reasoning mode is enabled, the model processes information in long CoTs, allowing it to deliberate systematically before generating an answer.
This is achieved using the
In standard response mode, the model operates more like a traditional AI chatbot, providing quicker, intuition-based responses without deep logical processing.
Performance insights and community feedback
Early benchmarking and community testing have provided key insights into DeepHermes-3’s capabilities:
- Mathematical reasoning: DeepHermes-3 scores 67% on MATH benchmarks, compared to 89.1% for DeepSeek’s R1-distilled model. While DeepSeek outperforms it in pure math tasks, Nous Research positions DeepHermes-3 as a more generalist model with broader conversational and reasoning skills.
- Multi-turn conversations: Some testers report that reasoning mode activates correctly on the first response, but may fail to persist in extended conversations. Community members suggest enforcing
\n at the start of each response, a method also used in DeepSeek-R1. - Function calling: DeepHermes-3 supports tool use, although it was not explicitly trained to integrate reasoning mode and function calling simultaneously. Some users report that while combining both features improves accuracy in executing tools, results remain inconsistent.
Nous Research is actively gathering user feedback to refine reasoning persistence and improve multi-turn interactions.
Deployment and hardware performance
DeepHermes-3 is available for testing on Hugging Face, with GGUF quantized versions optimized for low-power hardware. The model is compatible with vLLM for inference and uses Llama-Chat format for multi-turn dialogue.
One user reported a processing speed of 28.98 tokens per second on a MacBook Pro M4 Max, demonstrating that the model can run efficiently on consumer hardware.
DeepHermes-3 is based on Meta’s Llama 3 model and is governed by the Meta Llama 3 Community License. While the model is freely available for use, modification and redistribution, certain conditions apply:
- Redistribution: Any derivative models or deployments must include the original license and prominently display “Built with Meta Llama 3.”
- Restrictions on model training: Users cannot use DeepHermes-3 (or Llama 3) to train other LLMs, except for derivative works explicitly based on Llama 3.
- Commercial licensing for large companies: Organizations with more than 700 million monthly active users must obtain explicit approval from Meta before using the model commercially.
- Acceptable use policy: Users must comply with Meta’s AI usage restrictions, which prohibit applications in areas like misinformation, surveillance and harmful content generation.
These redistribution rules and commercial limitations mean that DeepHermes-3 is not fully open-source in the traditional sense, despite its availability on Hugging Face, unlike Chinese rival DeepSeek’s hit R1 reasoning model, which is available under a permissive MIT License.
Looking ahead to Hermes 4
DeepHermes-3 was developed by @teknium, @emozilla, @Gifted Gummy Bee, @hjc-puro and @jsupha, with Nous Research crediting the open-source community for contributions to datasets, evaluation tools and model training.
Nous Research sees this preview model as a stepping stone toward the next major release, Hermes 4, which is expected to further refine its reasoning and conversational abilities.
READ SOURCE