← gallery

Multimodal

One model takes images and text in, and produces images and speech out — across modalities.

Section: advanced-techniques · scene id multimodal · tutorial 03-advanced-techniques/07-multimodal