Running private LLM on CloudFerro Virtual machine
by Mateusz Ślaski, Sales Support Engineer
Introduction
Running a Large Language Model (LLM) on your own virtual machine with a high-performance GPU offers several advantages:
- Privacy and Security: You maintain control over your data, reducing the risk of exposure associated with third-party platforms.
- Performance Optimization: You can optimize and configure your environment specifically for your workload, potentially achieving lower latency and faster processing times.
- Customization: You have the flexibility to adjust system resources and settings to tailor the performance and capabilities of the LLM to your specific needs.
- Cost Efficiency: By controlling the computing resources, you can manage costs more effectively, especially if you have fluctuating demands or take advantage of SPOT instances. Additionally VM with LLM shared thru API between team members will replace need of equiping them wuth local GPU able to run LLM.
- Scalability: You can scale your resources up or down based on demand, allowing you to handle varying workloads efficiently.
- Reduced Dependency: Operating your LLM reduces reliance on third-party infrastructure (in this case you would be dependent only on independent Cloud Provider operating in Europe under EU law), giving you greater independence and control over its operation and maintenance.
- Access to Advanced Features: Cloud operator is able to provide high-performance GPU difficult to purchase by smaller companies, you can test and leverage advanced features and capabilities of LLMs that require significant computational power.
- Continuous Availability: You achieve high availability and reliability, as the virtual machine can be configured to meet uptime requirements without interruptions often associated with shared platforms.
What will you learn from this document?
- You will learn how to run a private Large Language Model (LLM) on a CloudFerro virtual machine using the self hosted Ollama platform.
- You will start by creating a VM on the CREODIAS platform by selecting the appropriate GPU and AI-related options
- Once you set up SSH access, you will verify the GPU visibility to ensure the NVIDIA drivers load correctly.
- You will then proceed with the Ollama installation and verify its ability to recognize the GPU.
- Furthermore, you will be guided on downloading and testing small LLM models from the Ollama Library.
- Next you get details on advanced configurations, including how to expose the Ollama API for network access and set up a reverse proxy with SSL certificates and Basic Authentication for added security.
- Additionally, you will address potential security considerations when you expose the API, either within a cloud tenant or publicly.
Manual procedure
VM creation
To create the VM, please follow this document:
How to create a new Linux VM in OpenStack Dashboard Horizon on CREODIAS
Please note that whaen making the two steps you must choose the GPU and AI related options.
1. When a source image is selected, please use one of the *_NVIDIA_AI images (two Ubuntu and one CentOS are available).
2. An instance must be created with one of the following flavors:
(as available at the end of March 2025)
- WAW3-1
- vm.a6000.1 (1/8 of shared A6000 card)
- vm.a6000.2 (1/4 of shared A6000 card)
- vm.a6000.4 (1/2 of shared A6000 card)
- vm.a6000.8 (full shared A6000 card)
- WAW3-2
Standard instances- gpu.h100 (One H100 card available)
- gpu.h100x2 (Two H100 cards available)
- gpu.h100x4 (Four H100 cards available)
- gpu.l40sx2 (Two L40s cards available)
- gpu.l40sx8 (Eight L40s cards available)
- vm.l40s.1 (1/8 of shared L40s card)
- vm.l40s.2 (1/4 of shared L40s card)
- vm.l40s.4 (1/2 of shared L40s card)
- vm.l40s.8 (full shared L40s card)
Spot instances - spot.vm.l40s.1 (1/8 of shared L40s card)
- spot.vm.l40s.2 (1/4 of shared L40s card)
- spot.vm.l40s.4 (1/2 of shared L40s card)
- spot.vm.l40s.8 (full shared L40s card)
- FRA1-2
- vm.l40s.2 (1/4 of shared L40s card)
- vm.l40s.8 (full shared L40s card)
- WAW4-1
- A new GPU flavors for H100 and L40s NVIDIA GPUs will be available soon (~ end of April 2025).
This tutorial was prepared using "vm.a6000.8" flavor and "Ubuntu 22.04 NVIDIA_AI" image on WAW3-1 region.
Accessing VM with SSH
To configure just created instance, we will access it using SSH.
Depending on the operating system that you use on your local computer, choose one of the below documents:
GPU check
The first step is checking if the GPU is visible by the system and if NVIDIA drivers are loaded properly.
You should be able to run the command:
nvidia-smi
And you should get the following output:
Please note that GPU memory usage is 0MiB of the amount available per selected flavor because it is not used yet.
Ollama installation
According to the official instruction at [Ollama dowload page for Linux](https://ollama.com/download/linux) it is enough to run a single installation script:
You should see the following output with the last message stating that Ollama sees GPU.
Please note that this installation script not only downloads and installs packages, but additionally runs Ollama web service locally.
If you execute the command:
you will get this output:
To test out Ollama installation, we will download 2 small models from Ollama Library.
ollama pull llama3.2:1b
ollama pull moondream
Each of them should give a similar output:
Verify that they are visible:
ollama list
You should see them on the list
Please test them by executing one or both commands below.
Remember that to exit the chat ,you need to use /bye
command.
ollama run moondream
Or
ollama run llama3.2:1b
Now, if you execute:
nvidia-smi
Command.
Then you should have a similar output:
It shows Ollama processes on the list and memory consumption being sum of loaded models.
As mentioned before, the Linux service with Ollama API server should already run in the background.
You may test it with the following Curl request:
You will receive a bunch of json response messages containing a model answer
Bigger size models
For now to make this tutorial fluent, we used small models with the size of about 1 GB.
If we have a GPU with more memory we may do test using bigger model. Let's try Llama3.3 with size of 42GB.
When you type name of model in search box on Ollama Libray then you get a list of models with this text in name. Copy model tag and use it locally.
You may activate the download of the model and then run it by a single command.
Or only download the model for further usage:
Tag "llama3.3:latest" should be also used in Curl query when communicating with API.
Additional setup if necessary
If you execute command
You will see a list of environment variables allowing to tune configuration according to your requirements and the hardware used.
In the next section we will set up one of them.
Exposing Ollama API for other hosts in network - Internal use
Edit the file with Ollama service configuration (if necessary replace vim with your editor of choice).
By default Ollama is exposed on localhost and port 11434, so it can not be accessed from other hosts in the project. To change the default behavior we add a line, setting Ollama to expose API on All interfaces and lower range port. For this article, we choose port 8765.
In [service]
section.
So updated File would look like this:
After this change we have to update the services.
sudo systemctl daemon-reload
systemctl restart ollama.service
And check if it is running properly.
If we go now to another VM in the same network and execute a similar Curl request - modified only by changing IP address and port.
Important remark:
If we expose API directly in this way in other port, then command ollama
wouldn't work. The message will be:
It is because the command uses the same API and tries to access it on the default port 11434.
We have to execute the command:
Or even add it to ~/.bashrc
file to make the change permanent.
API security
You have to consider one important thing. Now Ollama API is exposed not only to a single network but also to all hosts in other networks in your project.
If this is not acceptable, you should consider some security settings.
- The first is to create a separate external router and network according to this document:
How to create a network with router in Horizon Dashboard on CREODIAS
API still will be exposed but only inside of a single network. - If it is still not acceptable, then use guidelines from the next chapter.
Exposing Ollama API
In this case we will leave default API sttings for localhost and port 11434. Instead we add reverse proxy that expose API on other port and eventually add some authorization.
sudo apt install nginx
sudo apt install apache2-utils
Set Basic Authentication password. You must retype the password twice.
Exposing in cloud tenant
Simple NGINX configuration with basic auth but http only.
For internal usage only!
Strongly not recommending using when exposing API in public Internet.
Test Curl request:
Exposing API with encryption
Assign public IP to your machine with Ollama using this guide:
How to Add or Remove Floating IP’s to your VM on CREODIAS
Obtain SSL certificate for this IP or domain name and put it in two files on VM:
/etc/ssl/certs/YOUR_CERT_NAME.crt
/etc/ssl/private/YOUR_CERT_NAME.key
Or generate a self-signed certificate.
It would be enough for personal or small team usage, but not if you want to expose API for customers or business partners.
Simple NGINX config with basic auth and https.
Curl test request.
With accepting self signed certificate by -k
option:
Automated workflow with Terraform
Prerequisites / Preparation
Before you start, please read the documents:
- "Generating and Authorizing Terraform using a Keycloak User on CREODIAS" https://creodias.docs.cloudferro.com/en/latest/openstackdev/Generating-and-authorizing-Terraform-using-Keycloak-user-on-Creodias.html
- "How to Generate or Use Application Credentials via CLI on CREODIAS": https://creodias.docs.cloudferro.com/en/latest/cloud/How-to-generate-or-use-Application-Credentials-via-CLI-on-Creodias.html We will use them to authenticate the Terraform OpenStack provider.
- Additionally, you may review:
- Official Terraform documentation: https://developer.hashicorp.com/terraform
- Terraform OpenStack Provider documentation: https://registry.terraform.io/providers/terraform-provider-openstack/openstack/latest/docs
If necessary, you may also refresh some details about the manual management of: projects, key-pairs, networks, and security groups:
- https://creodias.docs.cloudferro.com/en/latest/networking/Generating-a-SSH-keypair-in-Linux-on-Creodias.html
- https://creodias.docs.cloudferro.com/en/latest/cloud/How-to-create-key-pair-in-OpenStack-Dashboard-on-Creodias.html
- https://creodias.docs.cloudferro.com/en/latest/networking/How-to-Import-SSH-Public-Key-to-OpenStack-Horizon-on-Creodias.html
- https://creodias.docs.cloudferro.com/en/latest/cloud/How-to-use-Security-Groups-in-Horizon-on-Creodias.html
- https://creodias.docs.cloudferro.com/en/latest/networking/How-to-create-a-network-with-router-in-Horizon-Dashboard-on-Creodias.html
Step 1 - Select or Create a Project
You may use the default project in your tenant (usually named "cloud_aaaaa_bb") or create a new one by following the document mentioned below. https://creodias.docs.cloudferro.com/en/latest/openstackcli/How-To-Create-and-Configure-New-Project-on-Creodias-Cloud.html
Step 2 - Install Terraform
There are various ways to install Terraform, some of them are described in the documentation mentioned in the "Preparation" chapter.
If you are using Ubuntu 22.04 LTS or newer and you do not need the latest Terraform release (for the Terraform OpenStack provider, it is not necessary), the easiest way is to use Snap.
First, install Snap:
sudo apt install snapd
Then install Terraform:
Step 3 - Allowing Access to Project from Terraform
Now create Application Credentials.
Please follow the document: "How to Generate or Use Application Credentials via CLI on CREODIAS": https://creodias.docs.cloudferro.com/en/latest/cloud/How-to-generate-or-use-Application-Credentials-via-CLI-on-Creodias.html
When you have them ready, save them in a secure location (i.e., password manager) and fill in the variables in the "llm_vm.tfvars" file.
Step 4 - Prepare Configuration Files
As Terraform operates on the entire directory and automatically merges all "*.tf" files into one codebase, we may split our Terraform code into a few files to manage the code more easily.
- main.tf
- variables.tf
- resources.tf
- locals.tf
Additionally, we need two other files:
- llm_vm_user_data.yaml
- llm_api_nginx.conf
- llm_vm.tfvars
File 1 - main.tf
In this file, we keep the main definitions for Terraform and the OpenStack provider.
File 2 - variables.tf
In this file, we will keep variable definitions.
File 3 - resources.tf
This is the most significant file where definitions of all entities and resources are stored.
File 4 - locals.tf
In this file we keep all values recalculated from any type of input data (variables, templates ...).
File 5 - llm_vm_user_data.yaml
This is a template of user-data that would be injected into our instance hosting Ollama.
File 5 - llm_vm.tfvars
In this file, we will provide values for Terraform variables:
- os_user_name - Enter your username used to authenticate in CREODIAS here.
- tenant_project_name - Name of the project selected or created in step 1.
- os_application_credential_id
- os_application_credential_secret
- region - CloudFerro Cloud region name. Allowed values are: WAW3-1, WAW3-2, FRA1-2, WAW4-1.
- env_id - Name that will prefix all resources created in OpenStack.
- env_keypair - Keypair available in OpenStack. You will use it to log in via SSH to the LLM machine if this would be necessary - For example to use model directly with
ollama run MODEL_TAG
command. - internal_network - Network class for our environment. Any of 10.a.b.c or 192.168.b.c.
- internal_netmask - Network mask. Allowed values: /24, /16.
- llm_flavor - VM flavor for our Ollama host.
- llm_image - Operating system image to be deployed on our instance.
- llm_tag - Tag from Ollama Library of model that we want automatically download during provisioning.
- cert_data - Values for self signed certificate.
Some of the included data, such as credentials, are sensitive. So if you save this in a Git repository, it is strongly recommended to add the file pattern "*.tfvars" to ".gitignore".
You may also add to this file the variable "external_network".
Do not forget to fill or update variable values in the content below.
Step 5 - Activate Terraform Workspace
A very useful Terraform functionality is workspaces. Using workspaces, you may manage multiple environments with the same code.
Create and enter a directory for our project by executing commands:
To initialize Terraform, execute:
Then, check workspaces:
terraform workspace list
As an output of the command above, you should see output like this:
As we want to use a dedicated workspace for our environment, we must create it. To do this, please execute the command:
Terraform will create a new workspace and switch to it.
Step 6 - Validate Configuration
To ensure the prepared configuration is valid, do two things.
First, execute the command:
terraform validate
Then execute Terraform plan:
You should get as an output a list of messages describing resources that would be created.
Step 7 - Provisioning of Resources
To provision all the resources, execute the command:
As with the plan command, you should get as an output a list of messages describing resources that would be created, but now finished with a question if you want to apply changes.
You must answer with the full word "yes".
You will see a sequence of messages about the status of provisioning.
Please remember that when the above sequence successfully finishes, the Ollama host is still not ready!
A script configuring the Ollama and downloading selected model is still running on the instance.
The process may take several minutes.
We recommend waiting about 5 minutes.
Step 8 - Obtaining VM IP and basic authorization password
To obtain a public IP address of the created instance, use the following command:
Public IP of host will be in field "address"
Password to authorize may be displayed by command:
Password text will be at the key "value".
Step 9 - Testing
You may use use LLM directly after accessing the created instance with SSH.
Then:
ollama run llama3.2:1b
If the instance is accessed from some application via API than, API Test may be done using similar Curl request as previously:
Step 10 - Removing resources when they are not needed
As GPU instance is more expensive we may completely remove it when it is not needed. By executing the command below you remove only the VM instance. The rest of resources would not be affected.
You may recreate it simply by running:
Step 11 - Usage
That's all! You may use the created virtual machine with GPU and LLM of your choice.
Happy prompting with your own AI 🙂