Harnessing GPU Efficiency
at CPU Scale

We work at the intersection of Mathematics and Distributed System Algorithms
to make Foundational Models come alive on the CPU:
Inference, Fine Tune, Train, RAG and Agentic.

contact@ziroh.com

Models Already CPU-fied

DeepSeek-R1-Distill-Llama-8B

DeepSeek-R1-Distill-Qwen-32B

Llama 2 7B

DeepSeek-R1-Distill-Qwen-14B

Llama 3.2 1B

Llama 3.2 3B

Code Llama 7B

Code Llama 13B

Code Llama 34B

Qwen 1.5 7B

Qwen 2.5 0.5B

Qwen 2.5 1.5B

Qwen 2.5 3B

Qwen 2.5 14B

Qwen 2.5 32B

CodeQwen 1.5 7B

BERT

Phi 3 -3.8B

Modern BERT

DeepSeek-R1-Distill-Qwen-1.5B

Llama 2 13B

Nous Hermes Llama2 13B

Llama 3 8B

Llama 3.1 8B

Qwen 2.5 7B

Phi-2 3B

Nous Hermes Llama2 7B

Phi 3.5 -3.8B

DeepSeek-R1-Distill-Qwen-7B

Models on the way

Bloom-7b

Gemma 2 Base

Gemma 2 Instruct

Gemma Base

Gemma Instruct

Mistral Nemo 2407 Base

Mistral Nemo 2407 instruct

Mistral V0.1 Base

Mistral V0.1 Instruct

SmolLM-135m

StarCoder2

TinyLlama_v1.1

DBRX Instruct

Llama 3.1 Instruct

Mixtral 8x7B v0.1 Base

Mixtral 8x7B v0.1 Instruct

OPT-1.3B

Flan-T5

Mistral Nemo 12B

Falcon 3

RobertA

Krutrim

BART

Mistral v0.2

Mistral v0.3

OpenELM-3B

Pythia

IndicLID

Llama 3.2 Vision Instruct

Qwen2 VL Instruct

FLUX.1 -dev

Phi 3.5 Vision

Whisper V3

HuBERT-large

Speecht5-tts

Wav2vec2

WavLM-large

Moonshine

Conformers

parler-tts-mini

Stable Diffusion 3 Large

Playground v2.5 1024

DETR-resnet-50

ViT

AOT-GAN

Beit

ConvNext-Base

ConvNext-tiny

DDRNet23-slim

DeepLabV3-plus-MobileNet

DeepLabV3-ResNet50

DenseNet-121

Depth-Anything-v2-large-hf

DETR-ResNet101

Dla102x

EfficientNet-b2

ESRGAN

Facial-Attribute-detection

Facial Landmark detection

FastSam

FFNet

GoogleNet

HRNet

Inception

LaMa

MediaPipe

Midas-V2

MNASNet

MobileNet

OpenPose

QuickSRNetLarge

Real-ESRGAN

ResNet

Segment-Anything-Model

Segformer

Shufflenet

XLSR

YOLO

LayoutLM

LLM2CLIP

DETR

Harnessing GPU Efficiency at CPU Scale