#Apple Silicon

49 articles

TechApr 25, 2026updated11 min

Ling-flash-2.0 MXFP4 (bailing_moe) on SwiftLM + M1 Max 64GB: working config, support check, --stream-experts notes

Hands-on running inclusionAI Ling-flash-2.0 (100B / 6.1B active, MXFP4 quant, 54.7GB) on SwiftLM via mlx-swift-lm on an M1 Max 64GB. Covers bailing_moe + MXFP4 support check in mlx-swift, the startup surprise, and what --stream-experts actually saves.

Apple Silicon LLM MLX Local LLM Swift SwiftLM MoE MXFP4 Ant Group Experiment

TechApr 24, 2026updated14 min

WAI-Illustrious v17 hands-on: hires fix auto-corrects hands and feet, 4 rating tags, do v16 LoRAs still work?

WAI-Illustrious SDXL v17 tested on M1 Max 64GB ComfyUI against v16 with the same seed. Hires fix now auto-corrects hands and feet, the four rating tags (general/sensitive/nsfw/explicit) still drive NSFW output, and v16-trained LoRAs mostly carry over — with one case where they don't.

AI Image Generation ComfyUI Stable Diffusion LoRA Apple Silicon Experiment

TechApr 24, 202613 min

Running SwiftLM on M1 Max 64GB and Comparing It to Ollama and MLX-lm

A hands-on build and run of the Swift-based LLM inference server SwiftLM on an M1 Max 64GB. Covers Qwen3.6-35B-A3B and Qwen3.5-122B-A10B, with the same BST, BBS, and persona tests used in the existing Ollama and MLX-lm write-ups.

Apple Silicon LLM MLX Local LLM Swift SwiftLM MoE Experiment

TechApr 23, 202621 min

Running open-notebook on M1 Max Without Docker or Cloud APIs, and Letting qwen3.6:35b Read Its Own Article

The NotebookLM clone open-notebook assumes Docker and cloud APIs by default. I installed SurrealDB natively, ran four processes in tmux, and wired everything through Ollama's qwen3.6:35b and bge-m3. I fed it the Qwen3.6 benchmark article I wrote this morning, and it answered with the correct numbers.

AI LLM ローカルLLM Ollama Qwen Apple Silicon RAG OSS 実験

TechApr 23, 202613 min

Qwen3.6-27B Dense vs Qwen3.6-35B-A3B MoE on M1 Max — MLX Was 2× Faster Than Ollama

Tried Qwen3.6-27B on both Ollama and MLX. Ollama couldn't load the VL-projector-embedded GGUF, MLX ran it at 11 tok/s. On the side, running 35B-A3B under MLX was roughly 2× faster than the Ollama GGUF. Also had both models build a BBS to gauge intent handling.

LLM Local LLM Qwen Ollama MLX Apple Silicon MoE Experiment

TechApr 21, 2026updated11 min

Qwen3.6-35B-A3B on M1 Max via Ollama 0.20.6: 27 tok/s same as 3.5, but 13× thinking tokens

Hands-on Qwen3.6-35B-A3B (23GB 4bit GGUF) on M1 Max 64GB via Ollama 0.20.6. Generation speed stays at 27 tok/s — same as Qwen3.5-35B-A3B — but the same prompt produces 13× more thinking tokens. Multi-turn behavior, persona handling, and a three-tier NSFW probe included.

LLM Local LLM Qwen Ollama Apple Silicon MoE Experiment

TechApr 17, 202610 min

Testing Z-Image i2i for Pixel Art Conversion

Z-Image has its own pixel art LoRAs, but can they actually convert photos to pixel art via i2i? Tested Z-Image Turbo, base model, and compared with Illustrious on M1 Max 64GB.

Z-Image Image Generation Apple Silicon Experiment

TechApr 16, 2026updated13 min

WAI-Anima v1 vs WAI-Illustrious on M1 Max ComfyUI: brings Anima's atmospheric backgrounds but loses on tag control and character consistency

Tested WAI-Anima v1, Anima preview3-base, and WAI-Illustrious v160 side by side on M1 Max 64GB ComfyUI with same seed/prompt. WAI-Anima inherits Anima's atmospheric lighting and natural running poses but still loses to WAI-Illustrious on tag control and character consistency. Includes i2i pipeline test (denoise 0.5), ~275s generation times, and how the Anima derivative ecosystem (WAI-Anima, CottonAnima, Kirazuri, RDBT) expanded in two months.

AI Image Generation ComfyUI Qwen Apple Silicon Stable Diffusion LoRA Experiment Anima WAI-Anima

TechApr 16, 202614 min

How Far Has AMD ROCm Come in Catching Up to CUDA?

Based on EE Times' interview with AMD AI Software VP Anush Elangovan, we assess the ROCm vs CUDA ecosystem gap. Includes hands-on experience with ROCm breaking four times on Strix Halo, plus practical guidance on choosing between NVIDIA, AMD, and Apple Silicon.

AMD NVIDIA ROCm CUDA GPU AI Infrastructure PyTorch MLX Apple Silicon

TechApr 14, 202610 min

Can Qwen Image Edit Convert Photos to Pixel Art?

Tested 5 approaches including Qwen Image Edit, JS color reduction, and Illustrious i2i + LoRA. Illustrious i2i alone turned out to be the fastest and lightest solution for pixel art conversion.

Qwen Image Generation Apple Silicon Experiment

TechApr 14, 202610 min

Can Local Vision LLMs Extract RPG Stats from Character Art?

I tested local Vision LLMs (Gemma 3, Qwen2.5-VL, Llama 3.2 Vision, Gemma 4) to see if they could look at character illustrations and pixel art and generate RPG-style stats in JSON format.

AI Local LLM VLM Image Recognition Ollama Gemma Qwen Apple Silicon Experiment

TechApr 2, 2026updated13 min

SwiftLM is a Swift-based LLM inference server that integrates TurboQuant and SSD streaming into Metal shaders

SwiftLM, an Apple Silicon–only MLX inference server, provides a native Metal implementation of TurboQuant V2+V3 hybrid KV‑cache compression and NVMe SSD expert streaming.

Apple Silicon LLM MLX Local LLM Inference Optimization KV Cache MoE Swift