Sinlore Kain Posts - Answer Overflow

Sinlore Kain

•Created by Sinlore Kain on 2/8/2025 in #⛅｜pods

Kobold.cpp - Remote tunnel loads before the model, causing confusion (possible off-product issue)

Here's the log piece:

load_tensors: offloading 88 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 89/89 layers to GPU
load_tensors:          CPU model buffer size =   315.00 MiB
load_tensors:        CUDA0 model buffer size = 32487.19 MiB
load_tensors:        CUDA1 model buffer size = 32487.19 MiB
load_tensors:        CUDA2 model buffer size = 30636.42 MiB
load_all_data: no device found for buffer type CPU for async uploads
load_all_data: using async uploads for device CUDA0, buffer type CUDA0, backend CUDA0
Your remote Kobold API can be found at https://mock-up-cloudflare-link.trycloudflare.com/api
Your remote OpenAI Compatible API can be found at https://mock-up-cloudflare-link.trycloudflare.com/v1
======
Your remote tunnel is ready, please connect to https://mock-up-cloudflare-link.trycloudflare.com
.................................load_all_data: using async uploads for device CUDA1, buffer type CUDA1, backend CUDA1

load_tensors: offloading 88 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 89/89 layers to GPU
load_tensors:          CPU model buffer size =   315.00 MiB
load_tensors:        CUDA0 model buffer size = 32487.19 MiB
load_tensors:        CUDA1 model buffer size = 32487.19 MiB
load_tensors:        CUDA2 model buffer size = 30636.42 MiB
load_all_data: no device found for buffer type CPU for async uploads
load_all_data: using async uploads for device CUDA0, buffer type CUDA0, backend CUDA0
Your remote Kobold API can be found at https://mock-up-cloudflare-link.trycloudflare.com/api
Your remote OpenAI Compatible API can be found at https://mock-up-cloudflare-link.trycloudflare.com/v1
======
Your remote tunnel is ready, please connect to https://mock-up-cloudflare-link.trycloudflare.com
.................................load_all_data: using async uploads for device CUDA1, buffer type CUDA1, backend CUDA1

Not sure if it affects smaller text models, I tested official Kobold, seems to have the same issue. Did they move remote tunnel to async? Main issue with this - the link shows

 502 Bad Gateway
Unable to reach the origin service. The service may be down or it may not be responding to traffic from cloudflared

 502 Bad Gateway
Unable to reach the origin service. The service may be down or it may not be responding to traffic from cloudflared

everything functions properly once the model loads.

16 replies

Gaming

Programming