Tech 5 min read

LoRA: Setting Up a Training Environment on an RTX 3060 Laptop (6 GB VRAM)

Introduction

In the previous article, I was planning to train on a Mac mini M4 Pro (24 GB), but due to circumstances I ended up with the base M4.

The base M4 just doesn’t have enough VRAM to run training. So I decided to let an idle laptop (RTX 3060 Laptop / 6 GB VRAM) grind overnight.

This article summarizes how to set up a LoRA training environment on Windows 11 + RTX 3060 Laptop (6 GB VRAM). With only 6 GB of VRAM, SDXL is tough, so this guide targets a SD1.5-based model.


Environment

ItemSpec
OSWindows 11 Home
GPURTX 3060 Laptop (6GB VRAM)
Storage1TB SSD

1. Install Prerequisites

1-1. Git

1-2. Python 3.10.6

Use this exact version (other versions often cause issues).

1-3. CUDA Toolkit 11.8

1-4. cuDNN 8.x (for CUDA 11.x)


2. Install kohya_ss

# Create a working folder
mkdir D:\AI
cd D:\AI

# Clone
git clone https://github.com/bmaltais/kohya_ss.git
cd kohya_ss

# Run setup
.\setup.bat

During setup, answer:

  • CUDA version → 11.8
  • Install accelerate → Yes

3. Place Base Model(s)

Destination: D:\AI\models\

ModelURL
Anything V5https://civitai.com/models/9409
Counterfeit-V3.0https://civitai.com/models/4468

4. Prepare Training Data

Folder layout

D:\AI\datasets\my_character\
  └── 10_chara_name\         ← "<repetitions>_<trigger word>"
        ├── image001.png
        ├── image001.txt     ← caption
        ├── image002.png
        ├── image002.txt
        └── ...

Repetition count guideline

  • 100 images × 10 repeats = 1,000 steps/epoch
  • If you have 50 images, use 20_chara_name to compensate

5. How to Write Captions

Basic structure

<trigger word>, <core attributes>, <variable details>

Examples

For the same character (a short black-haired, red‑eyed girl):

image001.txt (school uniform, smiling)

chara_name, 1girl, short black hair, red eyes, smile, school uniform, white shirt, blue skirt, upper body, simple background

image002.txt (casual outfit, neutral face)

chara_name, 1girl, short black hair, red eyes, casual clothes, hoodie, jeans, upper body, outdoor

image003.txt (swimsuit, side view)

chara_name, 1girl, short black hair, red eyes, swimsuit, bikini, from side, beach, summer

Tag categories

Fixed tags (shared across all images)

  • chara_name ← trigger word
  • 1girl
  • short black hair ← hairstyle
  • red eyes ← eye color

Variable tags (change per image)

  • Outfit: uniform, casual, swimsuit…
  • Expression: smile, serious, angry…
  • Composition: upper body, portrait, full body…
  • Background: simple background, outdoor, indoor…
  • Pose: standing, sitting, from side…

Tips for the trigger word

  • Coin a unique token that won’t collide with existing words
  • Examples: mycharaname, xyz_oc, aaa_character
  • If it’s too short, it can mix with other concepts

Notes

NGOK
black_hairblack hair (space-separated)
trigger,1girltrigger, 1girl (space after comma)
Japanese tagsEnglish tags only

6. Auto-generate Captions → Manual Cleanup

Method 1: WD Tagger (built into kohya_ss)

  1. In kohya_ss, Utilities tab → “WD14 Captioning”
  2. Select the image folder
  3. Run → .txt files are generated per image
  4. Manually edit:
    • Add the trigger word to the beginning
    • Normalize character-specific feature tags
    • Remove unnecessary tags

Method 2: BooruDatasetTagManager

Workflow

  1. Auto-generate with WD Tagger (~5 minutes)
  2. Open in BooruDatasetTagManager
  3. Add the trigger word to all images at once
  4. Normalize character-specific tags (hair color, eye color)
  5. Fix obviously wrong tags
  6. Save

7. On Image Sizes

If the originals are large (e.g., 2800x2800), no need to resize.

If you enable enable_bucket in kohya_ss, you can train with variable sizes as-is.

Settings:
- enable_bucket: ON
- resolution: 512
- bucket_no_upscale: ON

8. Launch

cd D:\AI\kohya_ss
.\gui.bat

Open http://127.0.0.1:7860 in your browser.


9. Training Settings for 6 GB VRAM

ItemValue
Network Rank (Dim)32–64
Network Alpha16–32
Batch Size1
Resolution512
Epocharound 5 (adjust after checking)
Learning Rate1e-4
OptimizerAdaFactor or Lion
Mixed Precisionfp16
Gradient Checkpointing✓ ON
Cache Latents✓ ON
enable_bucket✓ ON
bucket_no_upscale✓ ON

About epochs: Don’t start with 10. First run 5 and inspect the outputs. If it’s not enough, run more. With more images × repeats, overfitting happens more easily, so start conservative.


10. Training Data Guidelines

ImagesResult
10–20Bare minimum; tends to overfit to specific poses
30–50Practical
50–100Very good
100+More than enough

Qualities of a good dataset

  • Variety of expressions
  • Variety of outfits
  • Variety of compositions (front, side, diagonal)
  • Variety of backgrounds
  • Stable line quality
  • Include a few full-body shots
  • Include a few close-up face shots

11. Handling 3‑view sheets

If you include a 3‑view sheet as-is, it will also learn the “3‑view layout”.

Fix: split and use individually

Cut out front/side/back and include them as separate images.

  • Learns the character traits
  • Doesn’t learn the 3‑view layout
  • Effectively adds three images’ worth of data

12. Training Time Estimate

RTX 3060 Laptop / 100 images / SD1.5 base:

  • About 1–2 hours to finish

Troubleshooting

CUDA out of memory

  • Lower Batch Size to 1
  • Turn ON Gradient Checkpointing
  • Lower Network Dim to 32

Loss becomes NaN

  • Lower the Learning Rate (e.g., 5e-5)
  • Change Mixed Precision to bf16

Outputs look blurry

  • Undertrained → increase epochs
  • Raise Network Dim (64–128)

Character doesn’t show up

  • Forgot to include the trigger word in the prompt
  • Mismatch between the training trigger word and the prompt

Closing

I expect the M4 Pro to arrive sometime next year—when it does, I’ll ask for a higher spec. Hopefully I can write the second part then.