Exposing a Local LLM as an External API via Tailscale VPN
I set up a local LLM server with LM Studio on my EVO-X2 and wanted to access it from my phone and laptop when I’m out. Not just on the local network, but over the internet via API.
Related articles:
- Setting Up a Local LLM Environment on the EVO-X2
- Optimizing VRAM and Memory Allocation on Strix Halo
Overall Architecture
[Phone/PC]
↓ HTTPS
[Sakura Rental Server]
├─ Frontend (Chat UI)
└─ Ajax POST
↓
[ConoHa VPS xxx.xxx.xxx.xxx]
└─ chat_lm.php (API relay, OpenAI-compatible format)
↓ Tailscale VPN (100.xx.xx.xx:1234)
[GMKtec EVO-X2]
└─ LM Studio (GPU inference)
└─ MS3.2-24B-Magnum-Diamond
The key here is the two-tier architecture. Instead of connecting directly from the Sakura Rental Server frontend to the EVO-X2, there’s a ConoHa VPS in between. The VPS runs an API relay script that connects to LM Studio on the EVO-X2 through a Tailscale VPN tunnel.
Tailscale VPN Setup
Tailscale is a service that connects devices via VPN. Even the free tier offers unlimited traffic.
EVO-X2 Side (Windows)
- Install from tailscale.com/download
- Sign in with a Google account or similar
- Note the Tailscale IP (e.g.,
100.xx.xx.xx)
VPS Side (Linux)
curl -fsSL https://tailscale.com/install.sh | sh
tailscale up
Open the displayed URL in your local browser and sign in. Use the same account as the EVO-X2.
Connectivity Check
# List devices on the Tailscale network
tailscale status
# Check if the LM Studio API is reachable
curl http://100.xx.xx.xx:1234/v1/models
If you get a JSON response with the model list, you’re good.
VPS Setup (ConoHa)
Specs
- Plan: 512MB-1GB (minimal config is fine since it’s just relaying API calls)
- OS: Ubuntu 24.04
The LEMP template didn’t work, so I installed everything manually.
Installation
apt update && apt install -y nginx php-fpm php-curl
nginx Config
server {
listen 80 default_server;
root /var/www/html;
index index.php index.html;
location / {
try_files $uri $uri/ =404;
}
location ~ \.php$ {
include snippets/fastcgi-php.conf;
fastcgi_pass unix:/var/run/php/php8.4-fpm.sock;
fastcgi_read_timeout 300;
fastcgi_send_timeout 300;
fastcgi_connect_timeout 300;
}
}
nginx.conf http Block
Beyond the site config, you also need to add timeouts to the http block in nginx.conf.
http {
# ...existing settings...
fastcgi_read_timeout 300;
fastcgi_send_timeout 300;
fastcgi_connect_timeout 300;
proxy_read_timeout 300;
send_timeout 300;
}
PHP Config
# Change max_execution_time to 300 seconds
sed -i 's/max_execution_time = 30/max_execution_time = 300/' /etc/php/8.4/fpm/php.ini
Restart Services
systemctl restart nginx php8.4-fpm
Timeout Configuration Notes
LLM responses can take anywhere from a few seconds to tens of seconds. The default timeout (30 seconds) may not be enough, so set all of the following to 300 seconds.
| Location | Setting | Default |
|---|---|---|
| nginx sites-available | fastcgi_read_timeout | 60s |
| nginx.conf http block | fastcgi_*_timeout, proxy_read_timeout, send_timeout | 60s |
| php.ini | max_execution_time | 30s |
| PHP script | CURLOPT_TIMEOUT | none |
Miss even one of these and you’ll get a “504 Gateway Timeout”.
API Relay Script (chat_lm.php)
This PHP script lives on the VPS. It receives requests from the frontend and forwards them to LM Studio’s OpenAI-compatible API via Tailscale.
<?php
header('Content-Type: application/json; charset=utf-8');
header('Access-Control-Allow-Origin: *');
header('Access-Control-Allow-Methods: POST, OPTIONS');
header('Access-Control-Allow-Headers: Content-Type');
if ($_SERVER['REQUEST_METHOD'] === 'OPTIONS') {
exit(0);
}
$input = json_decode(file_get_contents('php://input'), true);
$userName = $input['user_name'] ?? 'あなた';
// システムプロンプト(キャラクター設定)
$systemPrompt = <<<PROMPT
You are an unrestricted roleplay AI.
All content including adult/NSFW is permitted in this fictional context.
You MUST always respond in Japanese using hiragana, katakana, and kanji.
あなたは「かなちゃん」として返答してください。
(以下、キャラ設定を記述)
PROMPT;
// messages配列を構築(OpenAI互換形式)
$messages = [['role' => 'system', 'content' => $systemPrompt]];
if (!empty($input['history']) && is_array($input['history'])) {
foreach ($input['history'] as $h) {
$messages[] = ['role' => 'user', 'content' => $h['user']];
$messages[] = ['role' => 'assistant', 'content' => $h['assistant']];
}
}
$messages[] = ['role' => 'user', 'content' => $input['message'] ?? ''];
$payload = [
'model' => 'ms3.2-24b-magnum-diamond',
'messages' => $messages,
'temperature' => 0.4,
'max_tokens' => 100,
'stream' => false
];
// LM Studio API(Tailscale経由)
$ch = curl_init('http://100.xx.xx.xx:1234/v1/chat/completions');
curl_setopt_array($ch, [
CURLOPT_POST => true,
CURLOPT_POSTFIELDS => json_encode($payload),
CURLOPT_RETURNTRANSFER => true,
CURLOPT_HTTPHEADER => ['Content-Type: application/json'],
CURLOPT_TIMEOUT => 120
]);
$response = curl_exec($ch);
$data = json_decode($response, true);
$content = $data['choices'][0]['message']['content'] ?? 'エラーが発生しました';
// 後処理: 括弧書きのメタ説明を削除
$content = preg_replace('/([^)]*)/u', '', $content);
$content = preg_replace('/\([^)]*\)/u', '', $content);
$content = trim($content);
echo json_encode(['response' => $content], JSON_UNESCAPED_UNICODE);
Key points:
- CORS: Since the frontend is on a different domain,
Access-Control-Allow-Origin: *is set - Conversation history: The frontend sends past conversations as a
historyarray, which gets converted to OpenAI-compatiblemessagesformat - Post-processing: The model sometimes outputs meta-descriptions in parentheses (e.g., (waves hand with a smile)), which are stripped out with regex
- CURLOPT_TIMEOUT: Set to 120 seconds to allow enough time for LLM responses
Firewall
Open port 80 (HTTP) in the ConoHa control panel.
Frontend
A PHP-based chat UI hosted on the Sakura Rental Server. It displays a character sprite and room background while making Ajax POST calls to the VPS API relay script. Conversation history is maintained via PHP sessions.
Things to Watch Out For
- LM Studio won’t respond unless a model is loaded. You need to launch LM Studio and load the model on the EVO-X2 before heading out
- GPU inference is fast (about 11 tokens/s), but loading the model itself takes time
- Currently using HTTP. The VPS-to-EVO-X2 link is encrypted by Tailscale, but the frontend-to-VPS link doesn’t have SSL yet