Technical Specifications

8C1

A 13B decoder only model tuned for predictable capability and low latency inference across internal tools and customer facing products.

Architecture

Parameters
13 Billion
Architecture
Decoder Only Transformer
Layers
40
Hidden Size
5,120
Attention Heads
40
Context Window
16K tokens
Vocabulary
32K BPE tokens
Training Tokens
2.1 Trillion

Performance

Time to First Token
150 to 300ms*
Throughput
50 to 100 tokens/sec
Infrastructure
A100/H100 GPU Clusters

Benchmarks

MMLU
49.5%*
HumanEval
41.2%*
GSM8K Math
43.9%*
TruthfulQA
38.5%*
Code Generation
44.9%*

*Performance metrics measured on server infrastructure. Latency includes network round trip and varies by location / connection quality. Benchmark scores are projected estimates based on model architecture and training approach.