Model Zoo

This project involves two major categories of machine learning models: image generation and classification. Each model includes a link to its model card, providing documentation on the model's background, training specifications, and quirks.

An AI-generated image of an alpine landscape
An AI-generated image of an alpine landscape
A list of comma-separated tags describing an urban setting
A list of comma-separated tags describing an urban setting
Classifier Models

AI systems that automatically analyze and label images with descriptive tags.

Image Generation Models

Text-to-image AI systems that create visual content from descriptions.

Image Generators

These are models like Stable Diffusion 1.5, Stable Diffusion XL, and Flux.1 that create images from textual prompts. They transform text descriptions into visual content and are the primary focus of our investigation.

Stable Diffusion 1.5
Image of a prism generated by Stable Diffusion 1.5
Image of a prism generated by Stable Diffusion 1.5
Image of a prism generated by Stable Diffusion XL
Image of a prism generated by Stable Diffusion XL
Image of a prism generated by Flux.1-dev
Image of a prism generated by Flux.1-dev

The original breakthrough: 512px images from compressed latent space, trained on 2B curated web images.

Stable Diffusion XL

The quality leap: Two-stage 6.6B-parameters delivering refined 1024px images with superior composition.

Flux.1

The next generation: 12B-parameter transformer replacing U-Net architecture with unmatched fidelity and accuracy.

Classifiers

WD EVA02-Large Tagger v3

Advanced Transformer tagger with top-tier performance for high-quality anime images.

These are the tagger models (WD EVA02-Large, WD ViT-Large, WD SwinV2, and WD ConvNext) that analyze, classify, and label images. They represent the other side of the AI image ecosystem - how machines "see" and categorize visual content. These models both contribute to dataset creation/filtering and can reveal how bias manifests in automated image interpretation systems.

WD ViT-Large Tagger v3
WD SwinV2 Tagger v3
WD ConvNext Tagger v3

Versatile Vision Transformer for fast and accurate anime tag automation.

Efficient Swin Transformer V2 excelling at balanced, accurate image tagging.

Modern CNN tagger optimized for speed, reliability, and strong tagging results.