How to reproduce NovelAI Precise Reference locally: ComfyUI + IP-Adapter

Update (2026-04-23): Fixed broken internal links.

What NovelAI Precise Reference is

In February 2026, NovelAI released a feature called “Precise Reference.” In addition to the existing character reference, it added style reference, so the two can now be combined.

Feature	Role
Character reference	Preserve the character’s appearance (face, hair, clothes, etc.)
Style reference	Preserve the art style, brushwork, and color palette

Just by specifying reference images, you can keep both character and style consistent. Its main strength is that it is easy to use and does not require LoRA training.

NovelAI is still a cloud service, though, and every generation consumes Anlas. If you want to do something similar locally, IP-Adapter is the main option.

What IP-Adapter is

An image-prompting method developed by Tencent AI. It extracts features from a reference image using CLIP Vision and injects them during generation. In practice it behaves a bit like a “single-image LoRA.”

Here is the rough mapping to NovelAI Precise Reference:

NovelAI feature	IP-Adapter
Character reference	IP-Adapter FaceID / Plus Face
Style reference	Standard IP-Adapter
Strength slider	`weight` parameter
Fidelity slider	`start_at` / `end_at` parameters

Hardware environment

Item	Spec
Machine	Mac Studio / MacBook Pro etc. (Apple Silicon)
Memory	32 GB or more recommended (64 GB is comfortable)
Storage	50 GB or more free space

I plan to test on an M1 Max with 64 GB. Even with an SDXL model, IP-Adapter, and CLIP Vision loaded at once, memory should be sufficient. Reports suggest generation speed around RTX 3060 class, roughly 30-60 seconds per image.

Setup

1. Install ComfyUI

Install a version of ComfyUI that works on Apple Silicon.

git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
pip install -r requirements.txt

2. Install ComfyUI_IPAdapter_plus

cd custom_nodes
git clone https://github.com/cubiq/ComfyUI_IPAdapter_plus.git

3. Download the required models

CLIP Vision models

Place them under ComfyUI/models/clip_vision/:

Model	Purpose
`CLIP-ViT-H-14-laion2B-s32B-b79K.safetensors`	Works for both SD1.5 and SDXL
`CLIP-ViT-bigG-14-laion2B-39B-b160k.safetensors`	For SDXL, with higher accuracy

Download source: Hugging Face - openai/clip-vit-large-patch14

IP-Adapter models

Place them under ComfyUI/models/ipadapter/:

Model	Purpose
`ip-adapter-plus_sdxl_vit-h.safetensors`	General use (good for style reference)
`ip-adapter-plus-face_sdxl_vit-h.safetensors`	Face-specialized (good for character reference)

Download source: Hugging Face - h94/IP-Adapter

Base model (SDXL)

Place it under ComfyUI/models/checkpoints/. For anime-oriented work, these are good candidates:

Animagine XL 3.1
Counterfeit-V3.0
Kohaku XL

Workflow structure

Character reference only

[Load Image] -> [IPAdapter Face] -> [KSampler] -> [Save Image]
                     ↓
              [CLIP Vision Encode]

This is the basic setup. It extracts facial features from the reference image and reflects them in generation.

Character reference + style reference (NovelAI Precise Reference equivalent)

[Character ref image] -> [IPAdapter Face] -┐
                                           ├-> [KSampler] -> [Save Image]
[Style ref image]     -> [IPAdapter]     -┘

Connect two IP-Adapters in series. The first injects character features, the second injects style features.

Parameter tuning

`weight` (strength)

This corresponds to NovelAI’s strength slider. It controls how strongly the reference image influences the output.

Value	Effect
0.5-0.7	Light influence; high prompt freedom
0.7-0.85	Balanced; recommended range for faces
0.85-1.0	Strong influence; easily pulled toward the reference image

If you set it too high, even the facial expression and pose from the reference can carry over, so be careful.

`start_at` / `end_at`

These control when IP-Adapter intervenes during sampling.

Setting	Effect
`start_at: 0, end_at: 1`	Active for the whole process (default)
`start_at: 0, end_at: 0.8`	Later stages prioritize the prompt
`start_at: 0.2, end_at: 1`	Early composition is left more to the prompt

For style reference, setting end_at: 0.8 or so tends to make fine details follow the prompt more easily.

Combining multiple references

When using character reference and style reference together:

Reference type	Recommended `weight`
Character reference (Face)	0.7-0.85
Style reference (standard)	0.5-0.7

If both are set too high, the image tends to break down. If one is strong, keep the other restrained.

Limitations

Multiple character references at once

As with NovelAI, if you try to use multiple character references simultaneously, their features get blended together. They are not handled as separate characters.

If you want multiple characters:

Split regions with Regional Prompting
Generate them individually with inpainting
Combine with ControlNet to lock the pose

FaceID-series models

If you need more accurate facial consistency, the IP-Adapter FaceID series exists. However, it requires installing InsightFace, and Apple Silicon may need extra configuration.

NovelAI vs local IP-Adapter

Item	NovelAI Precise Reference	Local IP-Adapter
Upfront cost	Subscription (about $25/month and up)	Free (hardware separate)
Generation cost	5 Anlas per run	Electricity only
Setup	None	Required
Quality	Optimized with a dedicated model	Depends on model choice
Customization	Limited	High flexibility
Offline	No	Yes

If you generate hundreds of images per month or more, building a local setup is worth considering.

SeaArt LoRA training guide - How to achieve character consistency with cloud LoRA training
Building a LoRA training environment on an RTX 3060 - Local LoRA training workflow
Making base body reference images for Gem with Flow - A reference-image workflow using Gemini Flow