How to reproduce NovelAI Precise Reference locally: ComfyUI + IP-Adapter
What NovelAI Precise Reference is
In February 2026, NovelAI released a feature called “Precise Reference.” In addition to the existing character reference, it added style reference, so the two can now be combined.
| Feature | Role |
|---|---|
| Character reference | Preserve the character’s appearance (face, hair, clothes, etc.) |
| Style reference | Preserve the art style, brushwork, and color palette |
Just by specifying reference images, you can keep both character and style consistent. Its main strength is that it is easy to use and does not require LoRA training.
NovelAI is still a cloud service, though, and every generation consumes Anlas. If you want to do something similar locally, IP-Adapter is the main option.
What IP-Adapter is
An image-prompting method developed by Tencent AI. It extracts features from a reference image using CLIP Vision and injects them during generation. In practice it behaves a bit like a “single-image LoRA.”
Here is the rough mapping to NovelAI Precise Reference:
| NovelAI feature | IP-Adapter |
|---|---|
| Character reference | IP-Adapter FaceID / Plus Face |
| Style reference | Standard IP-Adapter |
| Strength slider | weight parameter |
| Fidelity slider | start_at / end_at parameters |
Hardware environment
| Item | Spec |
|---|---|
| Machine | Mac Studio / MacBook Pro etc. (Apple Silicon) |
| Memory | 32 GB or more recommended (64 GB is comfortable) |
| Storage | 50 GB or more free space |
I plan to test on an M1 Max with 64 GB. Even with an SDXL model, IP-Adapter, and CLIP Vision loaded at once, memory should be sufficient. Reports suggest generation speed around RTX 3060 class, roughly 30-60 seconds per image.
Setup
1. Install ComfyUI
Install a version of ComfyUI that works on Apple Silicon.
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
pip install -r requirements.txt
2. Install ComfyUI_IPAdapter_plus
cd custom_nodes
git clone https://github.com/cubiq/ComfyUI_IPAdapter_plus.git
3. Download the required models
CLIP Vision models
Place them under ComfyUI/models/clip_vision/:
| Model | Purpose |
|---|---|
CLIP-ViT-H-14-laion2B-s32B-b79K.safetensors | Works for both SD1.5 and SDXL |
CLIP-ViT-bigG-14-laion2B-39B-b160k.safetensors | For SDXL, with higher accuracy |
Download source: Hugging Face - openai/clip-vit-large-patch14
IP-Adapter models
Place them under ComfyUI/models/ipadapter/:
| Model | Purpose |
|---|---|
ip-adapter-plus_sdxl_vit-h.safetensors | General use (good for style reference) |
ip-adapter-plus-face_sdxl_vit-h.safetensors | Face-specialized (good for character reference) |
Download source: Hugging Face - h94/IP-Adapter
Base model (SDXL)
Place it under ComfyUI/models/checkpoints/. For anime-oriented work, these are good candidates:
- Animagine XL 3.1
- Counterfeit-V3.0
- Kohaku XL
Workflow structure
Character reference only
[Load Image] -> [IPAdapter Face] -> [KSampler] -> [Save Image]
↓
[CLIP Vision Encode]
This is the basic setup. It extracts facial features from the reference image and reflects them in generation.
Character reference + style reference (NovelAI Precise Reference equivalent)
[Character ref image] -> [IPAdapter Face] -┐
├-> [KSampler] -> [Save Image]
[Style ref image] -> [IPAdapter] -┘
Connect two IP-Adapters in series. The first injects character features, the second injects style features.
Parameter tuning
weight (strength)
This corresponds to NovelAI’s strength slider. It controls how strongly the reference image influences the output.
| Value | Effect |
|---|---|
| 0.5-0.7 | Light influence; high prompt freedom |
| 0.7-0.85 | Balanced; recommended range for faces |
| 0.85-1.0 | Strong influence; easily pulled toward the reference image |
If you set it too high, even the facial expression and pose from the reference can carry over, so be careful.
start_at / end_at
These control when IP-Adapter intervenes during sampling.
| Setting | Effect |
|---|---|
start_at: 0, end_at: 1 | Active for the whole process (default) |
start_at: 0, end_at: 0.8 | Later stages prioritize the prompt |
start_at: 0.2, end_at: 1 | Early composition is left more to the prompt |
For style reference, setting end_at: 0.8 or so tends to make fine details follow the prompt more easily.
Combining multiple references
When using character reference and style reference together:
| Reference type | Recommended weight |
|---|---|
| Character reference (Face) | 0.7-0.85 |
| Style reference (standard) | 0.5-0.7 |
If both are set too high, the image tends to break down. If one is strong, keep the other restrained.
Limitations
Multiple character references at once
As with NovelAI, if you try to use multiple character references simultaneously, their features get blended together. They are not handled as separate characters.
If you want multiple characters:
- Split regions with Regional Prompting
- Generate them individually with inpainting
- Combine with ControlNet to lock the pose
FaceID-series models
If you need more accurate facial consistency, the IP-Adapter FaceID series exists. However, it requires installing InsightFace, and Apple Silicon may need extra configuration.
NovelAI vs local IP-Adapter
| Item | NovelAI Precise Reference | Local IP-Adapter |
|---|---|---|
| Upfront cost | Subscription (about $25/month and up) | Free (hardware separate) |
| Generation cost | 5 Anlas per run | Electricity only |
| Setup | None | Required |
| Quality | Optimized with a dedicated model | Depends on model choice |
| Customization | Limited | High flexibility |
| Offline | No | Yes |
If you generate hundreds of images per month or more, building a local setup is worth considering.
Related articles
- SeaArt LoRA training guide - How to achieve character consistency with cloud LoRA training
- Building a LoRA training environment on an RTX 3060 - Local LoRA training workflow
- Making base body reference images for Gem with Flow - A reference-image workflow using Gemini Flow