Run Gemma with Ollama on Cloud Run

February 27, 2024

I created a sample that shows how to deploy the Ollama API with gemma:2b on Cloud Run, to run inference using CPU only.

Gemma is Google’s open model built from the same research and technology used to create the Gemini models. The 2B version is the smallest version. I didn’t try running 7B yet.

Ollama is a framework that makes it easy for developers to prototype apps with open models, including gemma. It comes with a REST API and this sample deploys that API with a Cloud Run service.

Find it here: github.com/wietsevenema/samples/ollama-gemma