Technical Specifications

8C1

A 13B decoder only model tuned for predictable capability and low latency inference across internal tools and customer facing products.

Architecture

Parameters: 13 Billion
Architecture: Decoder Only Transformer
Layers: 40
Hidden Size: 5,120
Attention Heads: 40
Context Window: 16K tokens
Vocabulary: 32K BPE tokens
Training Tokens: 2.1 Trillion

Performance

Time to First Token: 150 to 300ms*
Throughput: 50 to 100 tokens/sec
Infrastructure: A100/H100 GPU Clusters

Benchmarks

MMLU: 49.5%*
HumanEval: 41.2%*
GSM8K Math: 43.9%*
TruthfulQA: 38.5%*
Code Generation: 44.9%*

*Performance metrics measured on server infrastructure. Latency includes network round trip and varies by location / connection quality. Benchmark scores are projected estimates based on model architecture and training approach.