TL;DR

Fine-tune Stable Diffusion in the fastest way with a batch run on VESSL.

Description

This research presents a novel method to personalize text-to-image diffusion models by fine-tuning them with a small set of subject images. By incorporating a unique identifier and leveraging semantic prior, the models can generate highly realistic images of the subject in different contexts, surpassing previous limitations in tasks like subject re-contextualization, text-guided view synthesis, appearance modification, and artistic rendering while preserving important features.

YAML

<aside> đź’ˇ You need A100 GPU to run this YAML. Please refer to Cluster Integrations.

</aside>

name: dreamboothstablediffusion
description: "Fine-tune Stable Diffusion in the fastest way with a batch run on VESSL."
image: nvcr.io/nvidia/pytorch:22.10-py3
resources:
  cluster: vessl-dgx-a100
  preset: gpu-1
import:
  /root/examples: git://github.com/vessl-ai/examples
export:
  /output: vessl-artifact://
run:
  - workdir: /root/examples/Dreambooth-Stable-Diffusion
    command: |
      conda env create -f environment.yaml 
      source activate ldm
      pip install Omegaconf
      pip install pytorch-lightning 
      mkdir data/
      wget <https://github.com/prawnpdf/prawn/raw/master/data/fonts/DejaVuSans.ttf> -P data/
      wget <https://huggingface.co/CompVis/stable-diffusion-v-1-4-original/resolve/main/sd-v1-4-full-ema.ckpt>
      python main.py --base configs/stable-diffusion/v1-finetune_unfrozen.yaml -t --actual_resume ./sd-v1-4-full-ema.ckpt -n "generate_pikachu" --no-test --gpus "0," --data_root ./dataset --reg_data_root ./reg --class_word "{$class_word}"
      rm -rf ./logs/*.ipynb_checkpoints
      python scripts/stable_txt2img.py --ddim_eta 0.0 --n_samples 2 --n_iter 4 --scale 10.0 --ddim_steps 100  --ckpt ./logs/*/checkpoints/last.ckpt --prompt "{$prompt}"
      cp -r ./outputs /output

env:
  class_word: "pikachu"
  prompt: "A photo of sks pikachu playing soccer."

Input

Output

Untitled