Running private LLM on CloudFerro Virtual machine
by Mateusz Ślaski, Sales Support Engineer
Introduction
Running a Large Language Model (LLM) on your own virtual machine with a high-performance GPU offers several advantages:
- Privacy and Security: You maintain control over your data, reducing the risk of exposure associated with third-party platforms.
- Performance Optimization: You can optimize and configure your environment specifically for your workload, potentially achieving lower latency and faster processing times.
- Customization: You have the flexibility to adjust system resources and settings to tailor the performance and capabilities of the LLM to your specific needs.
- Cost Efficiency: By controlling the computing resources, you can manage costs more effectively, especially if you have fluctuating demands or take advantage of SPOT instances. Additionally VM with LLM shared thru API between team members will replace need of equiping them wuth local GPU able to run LLM.
- Scalability: You can scale your resources up or down based on demand, allowing you to handle varying workloads efficiently.
- Reduced Dependency: Operating your LLM reduces reliance on third-party infrastructure (in this case you would be dependent only on independent Cloud Provider operating in Europe under EU law), giving you greater independence and control over its operation and maintenance.
- Access to Advanced Features: Cloud operator is able to provide high-performance GPU difficult to purchase by smaller companies, you can test and leverage advanced features and capabilities of LLMs that require significant computational power.
- Continuous Availability: You achieve high availability and reliability, as the virtual machine can be configured to meet uptime requirements without interruptions often associated with shared platforms.
What will you learn from this document?
- You will learn how to run a private Large Language Model (LLM) on a CloudFerro virtual machine using the self hosted Ollama platform.
- You will start by creating a VM on the CREODIAS platform by selecting the appropriate GPU and AI-related options
- Once you set up SSH access, you will verify the GPU visibility to ensure the NVIDIA drivers load correctly.
- You will then proceed with the Ollama installation and verify its ability to recognize the GPU.
- Furthermore, you will be guided on downloading and testing small LLM models from the Ollama Library.
- Next you get details on advanced configurations, including how to expose the Ollama API for network access and set up a reverse proxy with SSL certificates and Basic Authentication for added security.
- Additionally, you will address potential security considerations when you expose the API, either within a cloud tenant or publicly.
Manual procedure
VM creation
To create the VM, please follow this document:
How to create a new Linux VM in OpenStack Dashboard Horizon on CREODIAS
Please note that whaen making the two steps you must choose the GPU and AI related options.
1. When a source image is selected, please use one of the *_NVIDIA_AI images (two Ubuntu and one CentOS are available).
2. An instance must be created with one of the following flavors:
(as available at the end of March 2025)
- WAW3-1
- vm.a6000.1 (1/8 of shared A6000 card)
- vm.a6000.2 (1/4 of shared A6000 card)
- vm.a6000.4 (1/2 of shared A6000 card)
- vm.a6000.8 (full shared A6000 card)
- WAW3-2
Standard instances- gpu.h100 (One H100 card available)
- gpu.h100x2 (Two H100 cards available)
- gpu.h100x4 (Four H100 cards available)
- gpu.l40sx2 (Two L40s cards available)
- gpu.l40sx8 (Eight L40s cards available)
- vm.l40s.1 (1/8 of shared L40s card)
- vm.l40s.2 (1/4 of shared L40s card)
- vm.l40s.4 (1/2 of shared L40s card)
- vm.l40s.8 (full shared L40s card)
Spot instances - spot.vm.l40s.1 (1/8 of shared L40s card)
- spot.vm.l40s.2 (1/4 of shared L40s card)
- spot.vm.l40s.4 (1/2 of shared L40s card)
- spot.vm.l40s.8 (full shared L40s card)
- FRA1-2
- vm.l40s.2 (1/4 of shared L40s card)
- vm.l40s.8 (full shared L40s card)
- WAW4-1
- A new GPU flavors for H100 and L40s NVIDIA GPUs will be available soon (~ end of April 2025).
This tutorial was prepared using "vm.a6000.8" flavor and "Ubuntu 22.04 NVIDIA_AI" image on WAW3-1 region.
Accessing VM with SSH
To configure just created instance, we will access it using SSH.
Depending on the operating system that you use on your local computer, choose one of the below documents:
GPU check
The first step is checking if the GPU is visible by the system and if NVIDIA drivers are loaded properly.
You should be able to run the command:
nvidia-smi
And you should get the following output:
Fri Mar 21 17:28:32 2025
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.07 Driver Version: 535.161.07 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA RTXA6000-48Q On | 00000000:00:05.0 Off | 0 |
| N/A N/A P8 N/A / N/A | 0MiB / 49152MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
Please note that GPU memory usage is 0MiB of the amount available per selected flavor because it is not used yet.
Ollama installation
According to the official instruction at [Ollama dowload page for Linux](https://ollama.com/download/linux) it is enough to run a single installation script:
curl -fsSL https://ollama.com/install.sh | sh -->
You should see the following output with the last message stating that Ollama sees GPU.
>>> Installing ollama to /usr/local
>>> Downloading Linux amd64 bundle
######################################################################## 100.0%
>>> Creating ollama user...
>>> Adding ollama user to render group...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> Enabling and starting ollama service...
Created symlink /etc/systemd/system/default.target.wants/ollama.service → /etc/systemd/system/ollama.service.
>>> NVIDIA GPU installed.
Please note that this installation script not only downloads and installs packages, but additionally runs Ollama web service locally.
If you execute the command:
systemctl status ollama
you will get this output:
● ollama.service - Ollama Service
Loaded: loaded (/etc/systemd/system/ollama.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2025-03-21 19:35:50 UTC; 2 days ago
Main PID: 110150 (ollama)
Tasks: 22 (limit: 135297)
Memory: 1.7G
CPU: 33.690s
CGroup: /system.slice/ollama.service
└─110150 /usr/local/bin/ollama serve
Mar 21 20:57:45 llm-tests ollama[110150]: llama_init_from_model: graph splits = 2
Mar 21 20:57:45 llm-tests ollama[110150]: key clip.use_silu not found in file
Mar 21 20:57:45 llm-tests ollama[110150]: key clip.vision.image_grid_pinpoints not found in file
Mar 21 20:57:45 llm-tests ollama[110150]: key clip.vision.feature_layer not found in file
Mar 21 20:57:45 llm-tests ollama[110150]: key clip.vision.mm_patch_merge_type not found in file
Mar 21 20:57:45 llm-tests ollama[110150]: key clip.vision.image_crop_resolution not found in file
Mar 21 20:57:45 llm-tests ollama[110150]: time=2025-03-21T20:57:45.432Z level=INFO source=server.go:619 msg="llama runner started in 1.01 seconds"
Mar 21 20:57:46 llm-tests ollama[110150]: [GIN] 2025/03/21 - 20:57:46 | 200 | 2.032983756s | 127.0.0.1 | POST "/api/generate"
Mar 23 19:36:29 llm-tests ollama[110150]: [GIN] 2025/03/23 - 19:36:29 | 200 | 59.41µs | 127.0.0.1 | HEAD "/"
Mar 23 19:36:29 llm-tests ollama[110150]: [GIN] 2025/03/23 - 19:36:29 | 200 | 538.938µs | 127.0.0.1 | GET "/api/tags"
To test out Ollama installation, we will download 2 small models from Ollama Library.
ollama pull llama3.2:1b
ollama pull moondream
Each of them should give a similar output:
pulling manifest
pulling 74701a8c35f6... 100% ▕█████████████████████████████████████▏ 1.3 GB
pulling 966de95ca8a6... 100% ▕█████████████████████████████████████▏ 1.4 KB
pulling fcc5a6bec9da... 100% ▕█████████████████████████████████████▏ 7.7 KB
pulling a70ff7e570d9... 100% ▕█████████████████████████████████████▏ 6.0 KB
pulling 4f659a1e86d7... 100% ▕█████████████████████████████████████▏ 485 B
verifying sha256 digest
writing manifest
success
Verify that they are visible:
ollama list
You should see them on the list
NAME ID SIZE MODIFIED
moondream:latest 55fc3abd3867 1.7 GB 47 hours ago
llama3.2:1b baf6a787fdff 1.3 GB 2 days ago
Please test them by executing one or both commands below.
Remember that to exit the chat ,you need to use /bye
command.
ollama run moondream
Or
ollama run llama3.2:1b
Now, if you execute:
nvidia-smi
Command.
Then you should have a similar output:
Fri Mar 21 20:58:40 2025
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.07 Driver Version: 535.161.07 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA RTXA6000-48Q On | 00000000:00:05.0 Off | 0 |
| N/A N/A P8 N/A / N/A | 6497MiB / 49152MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1514 C /usr/local/bin/ollama 4099MiB |
| 0 N/A N/A 1568 C /usr/local/bin/ollama 2395MiB |
+---------------------------------------------------------------------------------------+
It shows Ollama processes on the list and memory consumption being sum of loaded models.
As mentioned before, the Linux service with Ollama API server should already run in the background.
You may test it with the following Curl request:
curl http://localhost:11434/api/generate -d '{
"model": "moondream",
"prompt": "Why milk is white?"
}'
You will receive a bunch of json response messages containing a model answer
{"model":"moondream","created_at":"2025-03-23T19:50:31.694190903Z","response":"\n","done":false}
{"model":"moondream","created_at":"2025-03-23T19:50:31.701052938Z","response":"Mil","done":false}
{"model":"moondream","created_at":"2025-03-23T19:50:31.704855264Z","response":"k","done":false}
{"model":"moondream","created_at":"2025-03-23T19:50:31.70867345Z","response":" is","done":false}
{"model":"moondream","created_at":"2025-03-23T19:50:31.712496186Z","response":" white","done":false}
{"model":"moondream","created_at":"2025-03-23T19:50:31.716349912Z","response":" because","done":false}
...
Bigger size models
For now to make this tutorial fluent, we used small models with the size of about 1 GB.
If we have a GPU with more memory we may do test using bigger model. Let's try Llama3.3 with size of 42GB.
When you type name of model in search box on Ollama Libray then you get a list of models with this text in name. Copy model tag and use it locally.
You may activate the download of the model and then run it by a single command.
ollama run llama3.3:latest
Or only download the model for further usage:
ollama pull llama3.3:latest
Tag "llama3.3:latest" should be also used in Curl query when communicating with API.
Additional setup if necessary
If you execute command
ollama serve --help
You will see a list of environment variables allowing to tune configuration according to your requirements and the hardware used.
In the next section we will set up one of them.
Start ollama
Usage:
ollama serve [flags]
Aliases:
serve, start
Flags:
-h, --help help for serve
Environment Variables:
OLLAMA_DEBUG Show additional debug information (e.g. OLLAMA_DEBUG=1)
OLLAMA_HOST IP Address for the ollama server (default 127.0.0.1:11434)
OLLAMA_KEEP_ALIVE The duration that models stay loaded in memory (default "5m")
OLLAMA_MAX_LOADED_MODELS Maximum number of loaded models per GPU
OLLAMA_MAX_QUEUE Maximum number of queued requests
OLLAMA_MODELS The path to the models directory
OLLAMA_NUM_PARALLEL Maximum number of parallel requests
OLLAMA_NOPRUNE Do not prune model blobs on startup
OLLAMA_ORIGINS A comma separated list of allowed origins
OLLAMA_SCHED_SPREAD Always schedule model across all GPUs
OLLAMA_FLASH_ATTENTION Enabled flash attention
OLLAMA_KV_CACHE_TYPE Quantization type for the K/V cache (default: f16)
OLLAMA_LLM_LIBRARY Set LLM library to bypass autodetection
OLLAMA_GPU_OVERHEAD Reserve a portion of VRAM per GPU (bytes)
OLLAMA_LOAD_TIMEOUT How long to allow model loads to stall before giving up (default "5m")
Exposing Ollama API for other hosts in network - Internal use
Edit the file with Ollama service configuration (if necessary replace vim with your editor of choice).
sudo vim /etc/systemd/system/ollama.service
By default Ollama is exposed on localhost and port 11434, so it can not be accessed from other hosts in the project. To change the default behavior we add a line, setting Ollama to expose API on All interfaces and lower range port. For this article, we choose port 8765.
Environment="OLLAMA_HOST=0.0.0.0:8765"
In [service]
section.
So updated File would look like this:
[Unit]
Description=Ollama Service
After=network-online.target
[Service]
ExecStart=/usr/local/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
Environment="OLLAMA_HOST=0.0.0.0:8765"
Environment="PATH=/opt/miniconda3/condabin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin"
[Install]
WantedBy=default.target
After this change we have to update the services.
sudo systemctl daemon-reload
systemctl restart ollama.service
And check if it is running properly.
systemctl status ollama.service
If we go now to another VM in the same network and execute a similar Curl request - modified only by changing IP address and port.
curl http://LLM_TEST_VM_IP:8765/api/generate -d '{
"model": "moondream",
"prompt": "Why milk is white?"
}'
Important remark:
If we expose API directly in this way in other port, then command ollama
wouldn't work. The message will be:
Error: could not connect to ollama app, is it running?
It is because the command uses the same API and tries to access it on the default port 11434.
We have to execute the command:
export OLLAMA_HOST=0.0.0.0:8765
Or even add it to ~/.bashrc
file to make the change permanent.
API security
You have to consider one important thing. Now Ollama API is exposed not only to a single network but also to all hosts in other networks in your project.
If this is not acceptable, you should consider some security settings.
- The first is to create a separate external router and network according to this document:
How to create a network with router in Horizon Dashboard on CREODIAS
API still will be exposed but only inside of a single network. - If it is still not acceptable, then use guidelines from the next chapter.
Exposing Ollama API
In this case we will leave default API sttings for localhost and port 11434. Instead we add reverse proxy that expose API on other port and eventually add some authorization.
sudo apt install nginx
sudo apt install apache2-utils
Set Basic Authentication password. You must retype the password twice.
cd /etc
sudo htpasswd -c .htpasswd ollama
Exposing in cloud tenant
Simple NGINX configuration with basic auth but http only.
For internal usage only!
Strongly not recommending using when exposing API in public Internet.
user www-data;
worker_processes auto;
pid /run/nginx.pid;
include /etc/nginx/modules-enabled/*.conf;
events {
worker_connections 768;
}
http {
server {
listen 8765;
# Basic authentication setup
auth_basic "Restricted Area";
auth_basic_user_file /etc/.htpasswd; # File containing usernames and hashed passwords
location / {
proxy_pass http://127.0.0.1:11434;
}
}
}
Test Curl request:
curl -u "ollama:YOUR_PASSWORD" http://10.0.0.148:8765/api/generate -d '{
"model": "llama3.3:latest",
"prompt": "Who is Peter Watts?"
}'
Exposing API with encryption
Assign public IP to your machine with Ollama using this guide:
How to Add or Remove Floating IP’s to your VM on CREODIAS
Obtain SSL certificate for this IP or domain name and put it in two files on VM:
/etc/ssl/certs/YOUR_CERT_NAME.crt
/etc/ssl/private/YOUR_CERT_NAME.key
Or generate a self-signed certificate.
It would be enough for personal or small team usage, but not if you want to expose API for customers or business partners.
sudo openssl req -x509 -nodes -days 365 -newkey rsa:4096 -keyout /etc/ssl/private/YOUR_CERT_NAME.key -out /etc/ssl/certs/YOUR_CERT_NAME.crt -subj "/C=PL/ST=Mazowieckie/L=Warsaw/O=CloudFerro/OU=Tech/CN=OllamaTest"
Simple NGINX config with basic auth and https.
user www-data;
worker_processes auto;
pid /run/nginx.pid;
include /etc/nginx/modules-enabled/*.conf;
events {
worker_connections 768;
# multi_accept on;
}
http {
server {
listen 8765 ssl;
server_name testing-ollama;
# Path to SSL certificates
ssl_certificate /etc/ssl/certs/YOUR_CERT_NAME.crt;
ssl_certificate_key /etc/ssl/private/YOUR_CERT_NAME.key;
# Basic authentication setup
auth_basic "Restricted Area";
auth_basic_user_file /etc/.htpasswd; # File containing usernames and hashed passwords
location / {
proxy_pass http://127.0.0.1:11434;
}
}
}
Curl test request.
With accepting self signed certificate by -k
option:
curl -k -u "ollama:YOUR_PASSWORD" https://YOUR_IP_OR_DOMAIN:8765/api/generate -d '{
"model": "llama3.3:latest",
"prompt": "Who is Peter Watts?"
}'
Automated workflow with Terraform
Prerequisites / Preparation
Before you start, please read the documents:
- "Generating and Authorizing Terraform using a Keycloak User on CREODIAS" https://creodias.docs.cloudferro.com/en/latest/openstackdev/Generating-and-authorizing-Terraform-using-Keycloak-user-on-Creodias.html
- "How to Generate or Use Application Credentials via CLI on CREODIAS": https://creodias.docs.cloudferro.com/en/latest/cloud/How-to-generate-or-use-Application-Credentials-via-CLI-on-Creodias.html We will use them to authenticate the Terraform OpenStack provider.
- Additionally, you may review:
- Official Terraform documentation: https://developer.hashicorp.com/terraform
- Terraform OpenStack Provider documentation: https://registry.terraform.io/providers/terraform-provider-openstack/openstack/latest/docs
If necessary, you may also refresh some details about the manual management of: projects, key-pairs, networks, and security groups:
- https://creodias.docs.cloudferro.com/en/latest/networking/Generating-a-SSH-keypair-in-Linux-on-Creodias.html
- https://creodias.docs.cloudferro.com/en/latest/cloud/How-to-create-key-pair-in-OpenStack-Dashboard-on-Creodias.html
- https://creodias.docs.cloudferro.com/en/latest/networking/How-to-Import-SSH-Public-Key-to-OpenStack-Horizon-on-Creodias.html
- https://creodias.docs.cloudferro.com/en/latest/cloud/How-to-use-Security-Groups-in-Horizon-on-Creodias.html
- https://creodias.docs.cloudferro.com/en/latest/networking/How-to-create-a-network-with-router-in-Horizon-Dashboard-on-Creodias.html
Step 1 - Select or Create a Project
You may use the default project in your tenant (usually named "cloud_aaaaa_bb") or create a new one by following the document mentioned below. https://creodias.docs.cloudferro.com/en/latest/openstackcli/How-To-Create-and-Configure-New-Project-on-Creodias-Cloud.html
Step 2 - Install Terraform
There are various ways to install Terraform, some of them are described in the documentation mentioned in the "Preparation" chapter.
If you are using Ubuntu 22.04 LTS or newer and you do not need the latest Terraform release (for the Terraform OpenStack provider, it is not necessary), the easiest way is to use Snap.
First, install Snap:
sudo apt install snapd
Then install Terraform:
sudo snap install terraform --classic
Step 3 - Allowing Access to Project from Terraform
Now create Application Credentials.
Please follow the document: "How to Generate or Use Application Credentials via CLI on CREODIAS": https://creodias.docs.cloudferro.com/en/latest/cloud/How-to-generate-or-use-Application-Credentials-via-CLI-on-Creodias.html
When you have them ready, save them in a secure location (i.e., password manager) and fill in the variables in the "llm_vm.tfvars" file.
Step 4 - Prepare Configuration Files
As Terraform operates on the entire directory and automatically merges all "*.tf" files into one codebase, we may split our Terraform code into a few files to manage the code more easily.
- main.tf
- variables.tf
- resources.tf
- locals.tf
Additionally, we need two other files:
- llm_vm_user_data.yaml
- llm_api_nginx.conf
- llm_vm.tfvars
File 1 - main.tf
In this file, we keep the main definitions for Terraform and the OpenStack provider.
terraform {
required_version = ">= 0.14.0"
required_providers {
openstack = {
source = "terraform-provider-openstack/openstack"
version = "~> 1.51.1"
}
}
}
provider "openstack" {
auth_url = var.auth_url
region = var.region
user_name = "${var.os_user_name}"
application_credential_id = "${var.os_application_credential_id}"
application_credential_secret = "${var.os_application_credential_secret}"
}
File 2 - variables.tf
In this file, we will keep variable definitions.
variable os_user_name {
type = string
}
variable tenant_project_name {
type = string
}
variable os_application_credential_id {
type = string
}
variable os_application_credential_secret {
type = string
}
variable auth_url {
type = string
default = "https://keystone.cloudferro.com:5000"
}
variable region {
type = string
validation {
condition = contains(["WAW3-1", "WAW3-2", "FRA1", "FRA1-2", "WAW4-1"], var.region)
error_message = "Proper region names are: WAW3-1, WAW3-2, FRA1, FRA1-2, WAW4-1"
}
}
#Our friendly name for entire environment.
variable env_id {
type = string
}
# Key-pair created in previous steps
variable env_keypair {
type = string
}
variable internal_network {
type = string
default = "192.168.12.0"
validation {
condition = can(regex("^(10\\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])|192\\.168\\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9]))$", var.internal_network))
error_message = "Provide proper network address for class 10.a.b.c or 192.168.a.b"
}
}
variable internal_netmask {
type = string
default = "/24"
validation {
condition = can(regex("^\\/(1[6-9]|2[0-4])$", var.internal_netmask))
error_message = "Please use mask size from /16 to /24."
}
}
variable external_network {
type = string
default = "10.8.0.0"
validation {
condition = can(regex("^(10\\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])|192\\.168\\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9]))$", var.external_network))
error_message = "Provide proper network address for class 10.a.b.c or 192.168.a.b"
}
}
variable llm_image {
type = string
default = "Ubuntu 22.04 NVIDIA_AI"
}
variable llm_flavor {
type = string
}
variable llm_api_port {
type = number
default = 8765
}
variable llm_tag {
type = string
}
variable cert_data {
type = string
default = "/C=colar_system/ST=earth/L=europe/O=good_people/OU=smart_people/CN=OllamaTest"
}
File 3 - resources.tf
This is the most significant file where definitions of all entities and resources are stored.
resource "random_password" "ollama_api_pass" {
length = 24
special = true
min_upper = 8
min_lower = 8
min_numeric = 6
min_special = 2
override_special = "-"
keepers = {
tenant = var.tenant_project_name
}
}
output "ollama_api_pass_output" {
value = random_password.ollama_api_pass.result
sensitive = true
}
data "openstack_networking_network_v2" "external_network" {
name = "external"
}
resource "openstack_networking_router_v2" "external_router" {
name = "${var.env_id}-router"
external_network_id = data.openstack_networking_network_v2.external_network.id
}
resource "openstack_networking_network_v2" "env_net" {
name = "${var.env_id}-net"
}
resource "openstack_networking_subnet_v2" "env_net_subnet" {
name = "${var.env_id}-net-subnet"
network_id = openstack_networking_network_v2.env_net.id
cidr = "${var.internal_network}${var.internal_netmask}"
gateway_ip = cidrhost("${var.internal_network}${var.internal_netmask}", 1)
ip_version = 4
enable_dhcp = true
}
resource "openstack_networking_router_interface_v2" "router_interface_external" {
router_id = openstack_networking_router_v2.external_router.id
subnet_id = openstack_networking_subnet_v2.env_net_subnet.id
}
resource "openstack_networking_floatingip_v2" "llm_public_ip" {
pool = "external"
}
resource "openstack_networking_secgroup_v2" "sg_llm_api" {
name = "${var.env_id}-sg-llm-api"
description = "Ollama API"
}
resource "openstack_networking_secgroup_rule_v2" "sg_llm_api_rule_1" {
direction = "ingress"
ethertype = "IPv4"
protocol = "tcp"
port_range_min = var.llm_api_port
port_range_max = var.llm_api_port
remote_ip_prefix = "0.0.0.0/0"
security_group_id = openstack_networking_secgroup_v2.sg_llm_api.id
}
resource "openstack_compute_instance_v2" "llm_server" {
name = "${var.env_id}-server"
image_name = var.llm_image
flavor_name = var.llm_flavor
security_groups = [
"default",
"allow_ping_ssh_icmp_rdp",
openstack_networking_secgroup_v2.sg_llm_api.name
]
key_pair = var.env_keypair
depends_on = [
openstack_networking_subnet_v2.env_net_subnet
]
user_data = local.llm_vm_user_data
network {
uuid = openstack_networking_network_v2.env_net.id
fixed_ip_v4 = cidrhost("${var.internal_network}${var.internal_netmask}", 3)
}
}
resource "openstack_compute_floatingip_associate_v2" "llm_ip_associate" {
floating_ip = openstack_networking_floatingip_v2.llm_public_ip.address
instance_id = openstack_compute_instance_v2.llm_server.id
}
File 4 - locals.tf
In this file we keep all values recalculated from any type of input data (variables, templates ...).
locals {
nginx_config = "${templatefile("./llm_api_nginx.conf",
{
ollama_api_port = "${var.llm_api_port}"
}
)}"
llm_vm_user_data = "${templatefile("./llm_vm_user_data.yaml",
{
llm_tag = "${var.llm_tag}"
cert_data = "${var.cert_data}"
ollama_api_pass = "${random_password.ollama_api_pass.result}"
nginx_config_content = "${indent(6, local.nginx_config)}"
}
)}"
}
File 5 - llm_vm_user_data.yaml
This is a template of user-data that would be injected into our instance hosting Ollama.
#cloud-config
package_update: true
package_upgrade: true
packages:
- vim
- openssh-server
- nginx
- apache2-utils
write_files:
- path: /etc/nginx/nginx.conf
permissions: '0700'
content: |
${nginx_config_content}
- path: /run/scripts/prepare_llm_vm
permissions: '0700'
defer: true
content: |
#!/bin/bash
curl -fsSL https://ollama.com/install.sh | sh
sleep 5s
systemctl enable ollama.service
systemctl start ollama.service
sleep 5s
export HOME=/root
ollama pull ${llm_tag}
sudo openssl req -x509 -nodes -days 365 -newkey rsa:4096 -keyout /etc/ssl/private/ollama_api.key -out /etc/ssl/certs/ollama_api.crt -subj "${cert_data}"
sudo htpasswd -b -c /etc/.htpasswd ollama ${ollama_api_pass}
systemctl enable nginx
systemctl start nginx
echo 'Ollama ready!' > /var/log/ollama_ready.log
runcmd:
- ["/bin/bash", "/run/scripts/prepare_llm_vm"]
File 5 - llm_vm.tfvars
In this file, we will provide values for Terraform variables:
- os_user_name - Enter your username used to authenticate in CREODIAS here.
- tenant_project_name - Name of the project selected or created in step 1.
- os_application_credential_id
- os_application_credential_secret
- region - CloudFerro Cloud region name. Allowed values are: WAW3-1, WAW3-2, FRA1-2, WAW4-1.
- env_id - Name that will prefix all resources created in OpenStack.
- env_keypair - Keypair available in OpenStack. You will use it to log in via SSH to the LLM machine if this would be necessary - For example to use model directly with
ollama run MODEL_TAG
command. - internal_network - Network class for our environment. Any of 10.a.b.c or 192.168.b.c.
- internal_netmask - Network mask. Allowed values: /24, /16.
- llm_flavor - VM flavor for our Ollama host.
- llm_image - Operating system image to be deployed on our instance.
- llm_tag - Tag from Ollama Library of model that we want automatically download during provisioning.
- cert_data - Values for self signed certificate.
Some of the included data, such as credentials, are sensitive. So if you save this in a Git repository, it is strongly recommended to add the file pattern "*.tfvars" to ".gitignore".
You may also add to this file the variable "external_network".
Do not forget to fill or update variable values in the content below.
os_user_name = "user@domain"
tenant_project_name = "cloud_aaaaa_b"
os_application_credential_id = "enter_ac_id_here"
os_application_credential_secret = "enter_ac_secret_here"
region = ""
env_id = ""
env_keypair = ""
internal_network = "192.168.1.0"
internal_netmask = "/24"
llm_flavor = "vm.a6000.8"
llm_image = "Ubuntu 22.04 NVIDIA_AI"
llm_tag="llama3.2:1b"
cert_data = "/C=PL/ST=Mazowieckie/L=Warsaw/O=CloudFerro/OU=Tech/CN=OllamaTest"
Step 5 - Activate Terraform Workspace
A very useful Terraform functionality is workspaces. Using workspaces, you may manage multiple environments with the same code.
Create and enter a directory for our project by executing commands:
mkdir tf_llm
cd tf_llm
To initialize Terraform, execute:
terraform init
Then, check workspaces:
terraform workspace list
As an output of the command above, you should see output like this:
* default
As we want to use a dedicated workspace for our environment, we must create it. To do this, please execute the command:
terraform workspace new llm_vm
Terraform will create a new workspace and switch to it.
Step 6 - Validate Configuration
To ensure the prepared configuration is valid, do two things.
First, execute the command:
terraform validate
Then execute Terraform plan:
terraform plan -var-file=llm_vm.tfvars
You should get as an output a list of messages describing resources that would be created.
Step 7 - Provisioning of Resources
To provision all the resources, execute the command:
terraform apply -var-file=llm_vm.tfvars
As with the plan command, you should get as an output a list of messages describing resources that would be created, but now finished with a question if you want to apply changes.
You must answer with the full word "yes".
You will see a sequence of messages about the status of provisioning.
Please remember that when the above sequence successfully finishes, the Ollama host is still not ready!
A script configuring the Ollama and downloading selected model is still running on the instance.
The process may take several minutes.
We recommend waiting about 5 minutes.
Step 8 - Obtaining VM IP and basic authorization password
To obtain a public IP address of the created instance, use the following command:
terraform state show openstack_networking_floatingip_v2.llm_public_ip
Public IP of host will be in field "address"
Password to authorize may be displayed by command:
terraform output -json
Password text will be at the key "value".
Step 9 - Testing
You may use use LLM directly after accessing the created instance with SSH.
ssh -i ENV_KEY_PAIR eouser@LLM_VM_PUBLIC_IP
Then:
ollama run llama3.2:1b
If the instance is accessed from some application via API than, API Test may be done using similar Curl request as previously:
curl -k -u "ollama:GENERATED_PASSWORD" https://PUBLIC_IP:8765/api/generate -d '{
"model": "llama3.2:1b",
"prompt": "Who is Peter Watts?"
}'
Step 10 - Removing resources when they are not needed
As GPU instance is more expensive we may completely remove it when it is not needed. By executing the command below you remove only the VM instance. The rest of resources would not be affected.
terraform destroy -var-file=llm_vm.tfvars -target=openstack_compute_instance_v2.llm_server
You may recreate it simply by running:
terraform apply -var-file=llm_vm.tfvars
Step 11 - Usage
That's all! You may use the created virtual machine with GPU and LLM of your choice.
Happy prompting with your own AI 🙂