Textual Inversion - SDXL
This tutorial walks through a Textual Inversion training run with a Stable Diffusion XL base model.
1 - Dataset
For this tutorial, we'll use a dataset consisting of 4 images of Bruce the Gnome:
This sample dataset is included in the invoke-training repo under sample_data/bruce_the_gnome.
Here are a few tips for preparing a Textual Inversion dataset:
- Aim for 4 to 50 images of your concept (object / style). The optimal number depends on many factors, and can be much higher than this for some use cases.
- Vary all of the image features that you don't want your TI embedding to contain (e.g. background, pose, lighting, etc.).
2 - Configuration
Below is the training configuration that we'll use for this tutorial.
Raw config file: src/invoke_training/sample_configs/sdxl_textual_inversion_gnome_1x24gb.yaml.
Full config reference docs: Textual Inversion SDXL Config
# Training mode: Textual Inversion
# Base model: SDXL
# GPU: 1 x 24GB
type: SDXL_TEXTUAL_INVERSION
seed: 1
base_output_dir: output/bruce/sdxl_ti
optimizer:
optimizer_type: AdamW
learning_rate: 2e-3
lr_warmup_steps: 200
lr_scheduler: cosine
data_loader:
type: TEXTUAL_INVERSION_SD_DATA_LOADER
dataset:
type: IMAGE_DIR_DATASET
dataset_dir: "sample_data/bruce_the_gnome"
keep_in_memory: True
caption_preset: object
resolution: 1024
center_crop: True
random_flip: False
shuffle_caption_delimiter: null
dataloader_num_workers: 4
# General
model: stabilityai/stable-diffusion-xl-base-1.0
vae_model: madebyollin/sdxl-vae-fp16-fix
num_vectors: 4
placeholder_token: "bruce_the_gnome"
initializer_token: "gnome"
cache_vae_outputs: False
gradient_accumulation_steps: 1
weight_dtype: bfloat16
gradient_checkpointing: True
max_train_steps: 2000
save_every_n_steps: 200
validate_every_n_steps: 200
max_checkpoints: 20
validation_prompts:
- A photo of bruce_the_gnome at the beach
- A photo of bruce_the_gnome reading a book
train_batch_size: 1
num_validation_images_per_prompt: 3
3 - Start Training
Install invoke-training, if you haven't already.
Launch the Textual Inversion training pipeline:
# From inside the invoke-training/ source directory:
invoke-train -c src/invoke_training/sample_configs/sdxl_textual_inversion_gnome_1x24gb.yaml
Training takes ~40 mins on an NVIDIA RTX 4090.
4 - Monitor
In a new terminal, launch Tensorboard to monitor the training run:
Access Tensorboard at localhost:6006 in your browser.Sample images will be logged to Tensorboard so that you can see how the Textual Inversion embedding is evolving.
Once training is complete, select the epoch that produces the best visual results.
For this tutorial, we'll choose epoch 500: Screenshot of the Tensorboard UI showing the validation images for epoch 500.
5 - Transfer to InvokeAI
If you haven't already, setup InvokeAI by following its documentation.
Copy the selected TI embedding into your ${INVOKEAI_ROOT}/autoimport/embedding/
directory. For example:
cp output/sdxl_ti_bruce_the_gnome/1702587511.2273068/checkpoint_epoch-00000500.safetensors ${INVOKEAI_ROOT}/autoimport/embedding/bruce_the_gnome.safetensors
Note that we renamed the file to bruce_the_gnome.safetensors
. You can choose any file name, but this will become the token used to reference your embedding. So, in our case, we can refer to our new embedding by including <bruce_the_gnome>
in our prompts.
Launch Invoke AI and you can now use your new bruce_the_gnome
TI embedding! 🎉
Example image generated with the prompt "a photo of <bruce_the_gnome> at the park
".