tags:
Contents
I had an old Radeon RX 5500 sitting around and wanted to see if I could get ollama to use it. It is not officially supported, but there is a way to get it working.
Setup
This was setup on a clean install of Fedora 41. I’m using Open WebUI to access the models, but will not go into detail on how that is setup nor how to install Fedora. Instead, I’ll explain how I got ollama using an officially unsupported Radeon RX 5500. This GPU only has 4Gi of RAM, but works pretty well with smaller models like deepseek-r1 1b
. Also, I’d like to point out that this setup does not is not containerized. I’ve added the versions of all software involved below in the details section.
Installing Fedora Packages
The first step is to install the ROCm Fedora packages. One reason I chose Fedora is that is generally has up-to-date packages available for install. The ROCm version on Fedora 41 is 6.2 which supports the RX 5500.
larntz@fedora:~$ sudo dnf install rocminfo rocm-clinfo rocm-opencl rocm-hip
The versions that were installed are:
rocminfo-6.2.1-1.fc41.x86_64
rocm-clinfo-6.2.1-5.fc41.x86_64
rocm-opencl-6.2.1-5.fc41.x86_64
rocm-hip-6.2.1-5.fc41.x86_64
We’re going to need some info from the output of rocminfo
. The system I ran this on has an AMD CPU with integrated graphics so I needed to pull information from rocminfo
in order to configure ollama.
On my system the RX 5500 was Agent 2
:
*******
Agent 2
*******
Name: gfx1010
Uuid: GPU-XX
Marketing Name: AMD Radeon RX 5500
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
<snip>
The important bits of info here are name gfx1010
and node 1
. We need to know the name gfx1010
in order to select the correct driver. We need to know the node if we want to force ollam to use this GPU.
Installing Ollama
The next thing I did was install ollama. The easiest way to do this is running the command on the ollama download page. This is a curl
command piped to a shell, but there are also manual install instructions. For testing purposes I chose to run the curl
command.
larntz@fedora:~$ curl -fsSL https://ollama.com/install.sh | sh
Installing ollama this way will set it up to start automatically via systemd. Looking the ollama logs (sudo journalctl -u ollama
) at this point will show the following warning:
level=WARN source=amd_linux.go:378 msg="amdgpu is not supported (supported types:[gfx1030 gfx1100 gfx1101 gfx1102 gfx900 gfx906 gfx908 gfx90a gfx940 gfx941 gfx942])"
Not what we want. To fix this we’ll need to set a couple environment variables in the systemd unit and create a symlink to trick ollama into using a compatible driver.
larntz@fedora:~$ sudo ln -s /usr/local/lib/ollama/rocblas/library/TensileLibrary_lazy_gfx{1030,1010}.dat
larntz@fedora:~$ ls -l /usr/local/lib/ollama/rocblas/library/TensileLibrary_lazy_gfx1010.dat
lrwxrwxrwx. 1 root root 69 Jan 23 13:47 /usr/local/lib/ollama/rocblas/library/TensileLibrary_lazy_gfx1010.dat -> /usr/local/lib/ollama/rocblas/library/TensileLibrary_lazy_gfx1030.dat
Configuring Ollama
Next we need to edit the systemd unit for ollama. We need to tell it which driver to use and, in my case, which GPU to use. The systemd unit file is located at /etc/systemd/system/ollama.service
. To specify the driver to use set this environment variable HSA_OVERRIDE_GFX_VERSION=10.1.0
. In my specific case I am also setting ROCR_VISIBLE_DEVICES=1
to tell ollama which GPU to use. I’ve also added OLLAMA_HOST=0.0.0.0
so the ollama API is available over the network.
This is my ollama.service
systemd unit:
[Unit]
Description=Ollama Service
After=network-online.target
[Service]
ExecStart=/usr/local/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
Environment="PATH=/home/larntz/.local/bin:/home/larntz/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin"
Environment="HSA_OVERRIDE_GFX_VERSION=10.1.0"
Environment="ROCR_VISIBLE_DEVICES=1"
Environment="OLLAMA_HOST=0.0.0.0"
[Install]
WantedBy=default.target
Now, after running sudo systemctl daemon-reload
and sudo systemctl restart ollama
you should find the following line in the ollama logs (sudo journalctl -u ollama
).
level=INFO source=types.go:131 msg="inference compute" id=0 library=rocm variant="" compute=gfx1012 driver=0.0 name=1002:7340 total="4.0 GiB" available="3.6 GiB"
Details
- GPU: Radeon RX 5500 (4Gi)
- CPU: AMD Ryzen 5 5600G with Radeon Graphics (6 core, 12 threads)
- Distro: Fedora 41 Workstation
- Installed Fedora packages:
rocminfo
,rocm-hip
,rocm-clinfo
,rocm-opencl
- Ollama: Version is 0.5.7
- Open WebUI 0.5.6
Running Models
Given the limited memory on this card it’s best to stick with small models. When reviewing the available models you’ll notice a drop down that lists the available model sizes. You’ll want to choose smaller models that can fit in the 3.6GiB
of memory available on the card. You can use larger models, and ollama will only load some layers into the GPU and this will result in slower responses.
I did a quick test using three models. I used the same prompts for each model, “write a fizz buzz function in go”.
llama3.2:latest
This model is a 3b model that is about 2GiB in size.
write a fizz buzz function in go
The response was generated at 44 tokens per second and took about 4 seconds.
Using the CPU the response was 19 tokens per second in 13 seconds.
The result was:
package main
import "fmt"
// Function to print the FizzBuzz sequence up to n
func fizzbuzz(n int) {
for i := 1; i <= n; i++ {
if i % 15 == 0 {
fmt.Printf("FizzBuzz\n")
} else if i % 3 == 0 {
fmt.Printf("Fizz\n")
} else if i % 5 == 0 {
fmt.Printf("Buzz\n")
} else {
fmt.Printf("%d\n", i)
}
}
}
func main() {
// Test the FizzBuzz function
fizzbuzz(100)
}
deepseek-r1:1.5b
This model is 1.5b tokens and is only 1.1GiB in size.
write a fizz buzz function in go
The response from this model was 57 tokens per second and took about 20 seconds but included a lot of explanation about what fizz buzz is and how to approach solving the problem. The deepseek-r1
models seem to do a lot “thinking” before responding (15-20s per response).
Using the CPU the response was 36 tokens per second in 47 seconds.
func fizzbuzz(n int) {
for i := 1; i <= n; i++ {
if i % 15 == 0 {
fmt.Printf("FizzBuzz")
} else if i % 3 == 0 {
fmt.Printf("Fizz")
} else if i % 5 == 0 {
fmt.Printf("Buzz")
} else {
fmt.Printf("%d", i)
}
}
}
deepseek-r1:7b
This model is too large to fit in the GPU ram at 4.7GiB.
When loading the model ollama will output logs that show it’s not fully loaded onto the GPU.
Jan 23 16:09:32 fedora ollama[59636]: llm_load_tensors: offloading 17 repeating layers to GPU
Jan 23 16:09:32 fedora ollama[59636]: llm_load_tensors: offloaded 17/29 layers to GPU
Jan 23 16:09:32 fedora ollama[59636]: llm_load_tensors: CPU_Mapped model buffer size = 2180.12 MiB
Jan 23 16:09:32 fedora ollama[59636]: llm_load_tensors: ROCm0 model buffer size = 2280.33 MiB
write a fizz buzz function in go
The response from this model was quite a bit slower at 11 tokens per second and almost 3m to finish it’s response.
Using the CPU this model responded at 9 tokens per second and took about 3m to complete the response.
package main
func FizzBuzz(n int) {
for i := 1; i <= n; i++ {
if i%15 == 0 {
print("FizzBuzz")
} else if i%3 == 0 {
print("Fizz")
} else if i%5 == 0 {
print("Buzz")
} else {
print(i)
}
}
}
func main() {
FizzBuzz(15)
}
Conclusion
Using this GPU for small models is roughly about twice as fast as the CPU alone. If you have a use case for small models this would probably work relatively well, and there’s a decent speed increase from using a relatively old GPU. Larger models that do not fit in the GPU RAM will likely not see much of a speed boost.