Tech 5 min read

How to reproduce NovelAI Precise Reference locally: ComfyUI + IP-Adapter

What NovelAI Precise Reference is

In February 2026, NovelAI released a feature called “Precise Reference.” In addition to the existing character reference, it added style reference, so the two can now be combined.

FeatureRole
Character referencePreserve the character’s appearance (face, hair, clothes, etc.)
Style referencePreserve the art style, brushwork, and color palette

Just by specifying reference images, you can keep both character and style consistent. Its main strength is that it is easy to use and does not require LoRA training.

NovelAI is still a cloud service, though, and every generation consumes Anlas. If you want to do something similar locally, IP-Adapter is the main option.

What IP-Adapter is

An image-prompting method developed by Tencent AI. It extracts features from a reference image using CLIP Vision and injects them during generation. In practice it behaves a bit like a “single-image LoRA.”

Here is the rough mapping to NovelAI Precise Reference:

NovelAI featureIP-Adapter
Character referenceIP-Adapter FaceID / Plus Face
Style referenceStandard IP-Adapter
Strength sliderweight parameter
Fidelity sliderstart_at / end_at parameters

Hardware environment

ItemSpec
MachineMac Studio / MacBook Pro etc. (Apple Silicon)
Memory32 GB or more recommended (64 GB is comfortable)
Storage50 GB or more free space

I plan to test on an M1 Max with 64 GB. Even with an SDXL model, IP-Adapter, and CLIP Vision loaded at once, memory should be sufficient. Reports suggest generation speed around RTX 3060 class, roughly 30-60 seconds per image.

Setup

1. Install ComfyUI

Install a version of ComfyUI that works on Apple Silicon.

git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
pip install -r requirements.txt

2. Install ComfyUI_IPAdapter_plus

cd custom_nodes
git clone https://github.com/cubiq/ComfyUI_IPAdapter_plus.git

3. Download the required models

CLIP Vision models

Place them under ComfyUI/models/clip_vision/:

ModelPurpose
CLIP-ViT-H-14-laion2B-s32B-b79K.safetensorsWorks for both SD1.5 and SDXL
CLIP-ViT-bigG-14-laion2B-39B-b160k.safetensorsFor SDXL, with higher accuracy

Download source: Hugging Face - openai/clip-vit-large-patch14

IP-Adapter models

Place them under ComfyUI/models/ipadapter/:

ModelPurpose
ip-adapter-plus_sdxl_vit-h.safetensorsGeneral use (good for style reference)
ip-adapter-plus-face_sdxl_vit-h.safetensorsFace-specialized (good for character reference)

Download source: Hugging Face - h94/IP-Adapter

Base model (SDXL)

Place it under ComfyUI/models/checkpoints/. For anime-oriented work, these are good candidates:

  • Animagine XL 3.1
  • Counterfeit-V3.0
  • Kohaku XL

Workflow structure

Character reference only

[Load Image] -> [IPAdapter Face] -> [KSampler] -> [Save Image]

              [CLIP Vision Encode]

This is the basic setup. It extracts facial features from the reference image and reflects them in generation.

Character reference + style reference (NovelAI Precise Reference equivalent)

[Character ref image] -> [IPAdapter Face] -┐
                                           ├-> [KSampler] -> [Save Image]
[Style ref image]     -> [IPAdapter]     -┘

Connect two IP-Adapters in series. The first injects character features, the second injects style features.

Parameter tuning

weight (strength)

This corresponds to NovelAI’s strength slider. It controls how strongly the reference image influences the output.

ValueEffect
0.5-0.7Light influence; high prompt freedom
0.7-0.85Balanced; recommended range for faces
0.85-1.0Strong influence; easily pulled toward the reference image

If you set it too high, even the facial expression and pose from the reference can carry over, so be careful.

start_at / end_at

These control when IP-Adapter intervenes during sampling.

SettingEffect
start_at: 0, end_at: 1Active for the whole process (default)
start_at: 0, end_at: 0.8Later stages prioritize the prompt
start_at: 0.2, end_at: 1Early composition is left more to the prompt

For style reference, setting end_at: 0.8 or so tends to make fine details follow the prompt more easily.

Combining multiple references

When using character reference and style reference together:

Reference typeRecommended weight
Character reference (Face)0.7-0.85
Style reference (standard)0.5-0.7

If both are set too high, the image tends to break down. If one is strong, keep the other restrained.

Limitations

Multiple character references at once

As with NovelAI, if you try to use multiple character references simultaneously, their features get blended together. They are not handled as separate characters.

If you want multiple characters:

  1. Split regions with Regional Prompting
  2. Generate them individually with inpainting
  3. Combine with ControlNet to lock the pose

FaceID-series models

If you need more accurate facial consistency, the IP-Adapter FaceID series exists. However, it requires installing InsightFace, and Apple Silicon may need extra configuration.

NovelAI vs local IP-Adapter

ItemNovelAI Precise ReferenceLocal IP-Adapter
Upfront costSubscription (about $25/month and up)Free (hardware separate)
Generation cost5 Anlas per runElectricity only
SetupNoneRequired
QualityOptimized with a dedicated modelDepends on model choice
CustomizationLimitedHigh flexibility
OfflineNoYes

If you generate hundreds of images per month or more, building a local setup is worth considering.