Harnessing GPU Efficiency
at CPU Scale

We work at the intersection of Mathematics and Distributed System Algorithms
to make Foundational Models come alive on the CPU:
Inference, Fine Tune, Train, RAG and Agentic.

contact@ziroh.com

Models Already CPU-fied

DeepSeek-R1-Distill-Llama-8B

DeepSeek-R1-Distill-Qwen-32B

Llama 2 7B

DeepSeek-R1-Distill-Qwen-14B

Llama 3.2 1B

Llama 3.2 3B

Code Llama 7B

Code Llama 13B

Code Llama 34B

Qwen 1.5 7B

Qwen 2.5 0.5B

Qwen 2.5 1.5B

Qwen 2.5 3B

Qwen 2.5 14B

Qwen 2.5 32B

CodeQwen 1.5 7B

BERT

Phi 3 -3.8B

Modern BERT

DeepSeek-R1-Distill-Qwen-1.5B

Llama 2 13B

Nous Hermes Llama2 13B

Llama 3 8B

Llama 3.1 8B

Qwen 2.5 7B

Phi-2 3B

Nous Hermes Llama2 7B

Phi 3.5 -3.8B

DeepSeek-R1-Distill-Qwen-7B

Models on the way

T5
Bloom-7b
Gemma 2 Base
Gemma 2 Instruct
Gemma Base
Gemma Instruct
Mistral Nemo 2407 Base
Mistral Nemo 2407 instruct
Mistral V0.1 Base
Mistral V0.1 Instruct
SmolLM-135m
StarCoder2
TinyLlama_v1.1
DBRX Instruct
Llama 3.1 Instruct
Mixtral 8x7B v0.1 Base
Mixtral 8x7B v0.1 Instruct
OPT-1.3B
Flan-T5
Mistral Nemo 12B
Falcon 3 
RobertA
Krutrim
BART
Mistral v0.2 
Mistral v0.3 
OpenELM-3B
Pythia
IndicLID
Llama 3.2 Vision Instruct
Qwen2 VL Instruct
FLUX.1 -dev
Phi 3.5 Vision 
Whisper V3
HuBERT-large
Speecht5-tts
Wav2vec2
WavLM-large
Moonshine
Conformers
parler-tts-mini
Stable Diffusion 3 Large
Playground v2.5 1024
DETR-resnet-50
ViT
AOT-GAN
Beit
ConvNext-Base
ConvNext-tiny
DDRNet23-slim
DeepLabV3-plus-MobileNet
DeepLabV3-ResNet50
DenseNet-121
Depth-Anything-v2-large-hf
DETR-ResNet101
Dla102x
EfficientNet-b2
ESRGAN
Facial-Attribute-detection
Facial Landmark detection
FastSam
FFNet
GoogleNet
HRNet
Inception
LaMa
MediaPipe
Midas-V2
MNASNet
MobileNet
OpenPose
QuickSRNetLarge
Real-ESRGAN
ResNet
Segment-Anything-Model
Segformer
Shufflenet
XLSR
YOLO
LayoutLM
LLM2CLIP
DETR

Get in touch

contact@ziroh.com