Visual Programming: Compositional Visual Reasoning Without Training

TL;DR

Let LLM solve your vision task without training with an interactive run on VESSL.

Description

VisProg is an innovative neuro-symbolic approach that utilizes natural language instructions to tackle complex visual tasks. By generating modular programs and employing computer vision models and image processing routines, VisProg offers flexible solutions for tasks like visual question answering and language-guided image editing. This approach broadens the capabilities of AI systems, allowing them to cater to diverse user needs and effectively handle a wide range of complex tasks.

YAML

<aside> 💡 Replace ”your openai api key” with your actual OpenAI API key.

</aside>

name: visprog
description: "Let LLM solve your vision task without training with an interactive run on VESSL."
image: nvcr.io/nvidia/pytorch:22.10-py3
resources:
  cluster: aws-apne2
  preset: v1.v100-1.mem-52
run:
  - workdir: /root
    command: |
      echo $OPENAI_API_KEY
      git clone <https://github.com/treasuraid/visprog.git>
  - workdir: /root/visprog
    command: |
      conda env create -f environment.yaml
      source activate visprog
      pip install vessl opencv-python-headless
      python script/image_editing.py
env:
  OPENAI_API_KEY: "your openai api key"

Input

Query: "Replace man in black henley (person) with brick wall”

Untitled

Output

Untitled