German AI consulting firm TNG Technology Consulting GmbH has released DeepSeek-TNG R1T2 Chimera, a significantly faster variant of DeepSeek’s popular open-source reasoning model R1-0528. The new model delivers 90% of the original’s intelligence while generating responses with 60% fewer tokens, translating to 200% faster inference and dramatically lower compute costs for enterprises.
What you should know: R1T2 represents a breakthrough in AI model efficiency through TNG’s Assembly-of-Experts (AoE) methodology, which merges multiple pre-trained models without additional training.
- The model combines three parent models: DeepSeek-R1-0528, DeepSeek-R1, and DeepSeek-V3-0324, creating what TNG calls a “Tri-Mind” configuration.
- Unlike traditional training approaches, AoE selectively merges weight tensors from existing models, preserving reasoning capabilities while reducing verbosity.
- R1T2 maintains 90-92% of R1-0528’s performance on reasoning benchmarks like AIME-24, AIME-25, and GPQA-Diamond while using only 40% of the output tokens.
How Assembly-of-Experts differs from Mixture-of-Experts: AoE is a model merging technique rather than an architectural design, setting it apart from the more common MoE approach.
- MoE models like DeepSeek-V3 conditionally activate different expert components during inference, with only a subset of experts active per token.
- AoE creates new models by interpolating weight tensors from multiple pre-trained models, focusing on merging the routed expert tensors responsible for specialized reasoning.
- TNG’s implementation retains efficient shared and attention layers from faster models while incorporating reasoning strength from more capable parents.
In plain English: Think of MoE like a large company where different departments handle different tasks as needed—only the relevant teams work on each project. AoE is more like creating a new employee by combining the best skills from three existing employees, without having to train someone from scratch.
Performance benchmarks: The speed improvements come from dramatically reduced output verbosity rather than raw processing acceleration.
- R1T2 generates responses using approximately 40% of the tokens required by R1-0528, directly reducing inference time and compute load.
- The model is 20% more concise than the original DeepSeek-R1 while maintaining similar reasoning quality.
- TNG measures “speed” in terms of output token count per answer, which serves as a practical proxy for both cost and latency.
What the AI community is saying: Early response from developers has been overwhelmingly positive, with industry leaders praising the technical achievement.
- “DAMN! DeepSeek R1T2 – 200% faster than R1-0528 & 20% faster than R1,” wrote Vaibhav Srivastav, a senior leader at Hugging Face, a popular AI model sharing platform, on X.
- “Significantly better than R1 on GPQA & AIME 24, made via Assembly of Experts with DS V3, R1 & R1-0528 — and it’s MIT-licensed, available on Hugging Face.”
Deployment considerations: The model is available under an MIT License with some important limitations and regulatory considerations.
- R1T2 is not recommended for function calling or tool use applications due to inherited limitations from its DeepSeek-R1 lineage.
- European users must assess compliance with the EU AI Act, which takes effect August 2, 2025.
- U.S. companies operating domestically face no EU AI Act restrictions, though provisions may apply if serving EU users.
About TNG Technology Consulting: The 24-year-old German firm operates as a values-based consulting partnership with over 900 employees, including a high concentration of PhDs and technical specialists.
- Founded in January 2001 and based in Bavaria, TNG serves major enterprise clients across telecommunications, insurance, automotive, e-commerce, and logistics.
- The company actively contributes to open-source communities and research, with previous Chimera variants processing billions of tokens daily through platforms like OpenRouter and Chutes.
- TNG’s unique structure, grounded in operational research and self-management principles, supports a culture of technical innovation.
Why this matters for enterprises: R1T2 offers tangible benefits for technical decision-makers looking to balance AI performance with operational efficiency.
- Lower inference costs through reduced GPU time and energy consumption, especially valuable in high-throughput environments.
- High reasoning quality without the overhead of verbose responses, ideal for structured tasks requiring concise answers.
- Open MIT licensing allows full deployment control and customization within regulated or air-gapped environments.
- The AoE approach suggests a future where enterprises can build specialized AI variants by recombining existing model strengths rather than training from scratch.
HOLY SMOKES! A new, 200% faster DeepSeek R1-0528 variant appears from German lab TNG Technology Consulting GmbH