Key Takeaway:
Chinese researchers at the Chinese Academy of Sciences (Institute of Automation, Beijing) introduced SpikingBrain1.0, a neuromorphic brain-inspired model that mimics brain activity by firing only necessary neurons. Running entirely on domestic MetaX chips, the brain-inspired model reportedly achieves 25–100x faster performance than conventional transformer-based models while consuming less energy and requiring under 2% of training data compared to standard systems.
Brain-Inspired Model “SpikingBrain1.0” – Key Points
Brain-Inspired Design
SpikingBrain1.0 uses spiking computation with adaptive threshold neurons and event-driven spike coding (binary/ternary/bitwise), so neurons activate only when triggered, akin to the brain’s ~20 W efficiency. It replaces full global attention with linear and hybrid-linear attention emphasizing nearby context and compressed, continuously updated memory states, cutting unnecessary computation while maintaining accuracy.
Performance Gains
Tests showed major speed improvements:
The 7B model processed a 4-million-token input more than 100× faster when measuring Time to First Token (TTFT).
In another benchmark, it achieved a 26.5× speed-up when generating the first token from a 1-million-token context.
Two main versions were built: a 7B linear model and a 76B hybrid-linear MoE (“A12B”) model. Both were trained on about 150 billion tokens—less than 2% of the data usually needed—yet still matched open-source Transformer baselines in performance.
A smaller 1B compressed variant, optimized for mobile CPUs, delivered a 15.39× speed-up on sequences of 256k tokens.
Hardware Independence
Built and trained on MetaX Integrated Circuits Co. hardware (C550 GPUs) rather than Nvidia. Training ran stably for weeks on hundreds of MetaX chips, reaching 23.4% MFU (Model FLOPs Utilization) on the 7B model. Researchers adapted vLLM for inference on MetaX, demonstrating an end-to-end non-Nvidia pipeline.
Overcoming Current LLM Limitations
Traditional Transformer models get much slower and more memory-hungry as the input text gets longer:
Training cost grows very quickly (quadratically) with sequence length.
Memory use during inference grows steadily (linearly) with sequence length.
SpikingBrain reduces these problems by:
Using hybrid linear attention: the 7B model processes attention layer by layer, while the 76B model does it in parallel inside layers, making it more efficient.
Applying efficient model conversion: it transforms heavy quadratic attention into lighter methods like sliding-window and low-rank linear attention, and uses MoE upcycling to turn dense weights into sparse experts, saving compute.
Exploiting 69.15% sparsity in spiking neurons, which lowers power use and helps keep memory needs nearly constant, even with very long contexts (up to 128k tokens).
Applications and Deployment Potential
Cited targets include long legal/medical document analysis, DNA sequencing, high-energy physics, and other long-sequence workloads. Weeks-long stable training on hundreds of MetaX GPUs and successful long-context handling indicate readiness for scaled deployment within China’s domestic AI stack.
Open Resources, Public Access, and Validation Status
The work is non–peer-reviewed (arXiv, Sept 2025) with code released for SpikingBrain-7B (GitHub: BICLab/SpikingBrain-7B) and a public test page announced in local media. Independent energy-metered benchmarks and standardized suite evaluations remain necessary to verify real-world efficiency beyond reported TTFT and long-context metrics for the brain-inspired model.
Leadership Statements and Ecosystem Context
Xu Bo, director of the Institute of Automation, framed SpikingBrain as a non-Transformer path that could inform next-generation neuromorphic chips with lower power consumption. The institute previously co-developed the “Speck” neuromorphic sensing-computing chip (reported in Nature Communications) with ~0.42 mW resting power context for China’s broader push into brain-inspired hardware.
Strategic Implications for China
By combining homegrown chips (MetaX) with brain-inspired architectures, SpikingBrain advances technological sovereignty amid U.S. export controls on advanced AI hardware. It strengthens China’s position in neuromorphic computing and offers a path to Nvidia-independent large-model training and inference centered on a brain-inspired model paradigm.
Why This Matters:
SpikingBrain1.0 represents a technical and geopolitical pivot: a credible route to near-linear long-context efficiency and low-power inference on non-Nvidia platforms. If replicated, it lowers compute/data barriers for large models, accelerates on-device and sovereign AI stacks, and intensifies competition in architecture and hardware innovation led by the brain-inspired model design philosophy.
This article was drafted with the assistance of generative AI. All facts and details were reviewed and confirmed by an editor prior to publication.
Explore the global race for AI supremacy as the US, China, EU, and India deploy competing strategies in compute, capital, and talent to lead the new era.
The AI landscape is increasingly defined by the contrasting approaches of open source and closed source models. This article examines the nuances of each approach, exploring their benefits, challenges, and implications for businesses, developers, and the future of AI.
Everything you need to know about the AI chip business and the geopolitical factors shaping the global race for AI (chip) dominance
Read a comprehensive monthly roundup of the latest AI news!






