Sharing is caring!

This new version integrates a Multimodal Diffusion Transformer (MMDiT) architecture, which separates weights for image and text representations, improving text comprehension and image generation fidelity.

Visit
Find us on AI Scores

Sharing is caring!