LLM Fine-Tuning

Saba LLM Auto-Train

Train smarter, not harder

Automated fine-tuning pipelines that transform base LLMs into production-ready models. From dataset preparation to deployment — fully automated, measurable results.

Train Your Model See Benchmarks

Benchmark Results

Real Performance Gains

Measured improvements from our internal fine-tuning pipeline on Gemma 4 2B

67.4%

Faster Inference

Latency reduction from baseline to fine-tuned

2.51x

Higher Throughput

Tokens per second improvement

100%

Success Rate

Zero failed inferences in benchmark runs

Baseline

gemma4:e2b

Stock Gemma 4 2B via Ollama

Avg Latency 18,438ms

Throughput 30.7 tok/s

Success Rate 100%

Fine-Tuned

saba-gemma4-2b

Custom fine-tuned by SabaTech Auto-Train

Avg Latency 6,000ms

Throughput 77.0 tok/s

Success Rate 100%

Training Pipeline

From Base to Production

A streamlined 4-step pipeline that automates the entire fine-tuning journey

Dataset Prep

Collect, clean, and format training data. Automated deduplication, filtering, and quality scoring.

Fine-Tuning

LoRA/QLoRA training on base model. Hyperparameter optimization, early stopping, and checkpointing.

Evaluation

Automated benchmark runs against baseline. Latency, throughput, and quality metrics comparison.

Deployment

Export to GGUF/Ollama format. Push to production with zero-downtime model swap.

Technology Stack

Built With

🧬

Gemma 4 2B

Base model from Google DeepMind

🦥

Unsloth

2x faster training, 50% less memory

🦙

Ollama

Local model serving & inference

🔧

Axolotl

Config-driven fine-tuning framework

⚙️

Llama.cpp

GGUF quantization & inference

🤖

OpenCode

Automated pipeline orchestration

Case Study

SabaTech Internal: Gemma 4 2B Fine-Tune

How we used our own pipeline to build a faster, cheaper inference model

Live Case Study

SabaTech (Internal)

Replacing llama.cpp Qwen with a custom fine-tuned Gemma 4 2B

We applied our own Auto-Train pipeline to fine-tune Gemma 4 2B for our internal agent workloads. The goal: reduce inference latency and cost while maintaining quality. The baseline model (gemma4:e2b via Ollama) served as our reference point. After automated LoRA training, evaluation, and GGUF quantization, saba-gemma4-2b delivered a 67.4% reduction in latency and 2.51x higher throughput — all within a single automated pipeline run.

Hours Training

12GB

VRAM Used

Q4_K_M

Quantization

Hour Deploy

"Within 2 hours of pipeline execution, we had a production-ready model with 67% better performance than stock. Zero manual intervention."

Ready to optimize your models?

Get a free consultation on setting up an automated fine-tuning pipeline for your use case.

Start Training jmsabaris@sabatech.dev