Frameworkยท

OminiKontext - Universal Two-Image Fusion Framework for Flux Kontext

A revolutionary training framework that teaches Flux.1-Kontext-dev how to combine any two images - products, clothing, objects, or characters. Train custom LoRA models for e-commerce, VTON, and creative applications using 3D RoPE embeddings.

omini-kontext framework

Introduction

OminiKontext is a training framework that teaches Flux.1-Kontext-dev how to intelligently combine two input images. Unlike traditional approaches that alter model architecture, OminiKontext leverages 3D RoPE embeddings to achieve precise reference-based image fusion with any type of content.

The framework enables users to train custom LoRA models that can seamlessly blend reference objects, products, characters, or clothing into target scenes. Whether you're working with e-commerce products, virtual try-on applications, or creative content generation, OminiKontext provides the tools to teach Kontext exactly how to merge your specific use case.

๐Ÿš€ Live Demo Available on Replicate

๐Ÿ“‚ GitHub Repository

๐Ÿ› ๏ธ ComfyUI Integration & Workflows

๐ŸŽฌ Video Demo on Github

Key Features & Capabilities

How Flux.1-Kontext Differs from Standard Flux.1-Dev

The key innovation in Flux.1-Kontext lies in its 3D RoPE embedding system, which fundamentally changes how images are processed compared to standard Flux.1-dev:

Standard Flux.1-dev: Uses 2D RoPE embeddings where image tokens are arranged in a simple 2D grid (width ร— height).

Flux.1-Kontext: Introduces a third dimension creating layered token spaces (width ร— height ร— layers), similar to Photoshop's layer system. This allows multiple images to exist in separate layers that can be precisely controlled and blended.

The Delta Coordinate System

OminiKontext leverages this 3D architecture through its delta coordinate system that provides unprecedented control:

delta = [L, Y, X]

Where:

  • L (Layer): Which layer the reference image occupies relative to the base
  • Y, X (Position): Top-left positioning within that layer (in token coordinates)
  • Token Conversion: delta = [L, Yp // 16, Xp // 16] where Yp, Xp are pixel coordinates

Key Mechanism:

  • Base image is always placed at layer 1 starting from [1, 0, 0]
  • Reference images can be positioned at [L+1, Y, X] for layer separation
  • Each token represents a 16ร—16 pixel patch for efficient processing

Spatial vs Non-Spatial Control Modes

Spatial Control (delta = [1, Y, X]):

  • Places reference on a different layer than the base
  • Enables pixel-perfect positioning for products, clothing, or objects
  • Ideal for e-commerce photography and VTON applications
  • Example: delta = [1, 0, (base_width - subject_width) // 16] for right-edge placement

Non-Spatial Control (delta = [0, 0, 96]):

  • Places reference in the same layer with large offset
  • Model learns contextual blending without strict positioning
  • Perfect for natural integration where exact placement isn't critical
  • Commonly used for character insertion and creative applications

Advantages Over Traditional Image Stitching

Traditional approaches stitch images horizontally or vertically, which:

  • Ties subject position to base image dimensions
  • Creates inconsistent training data
  • Offers no precise XY control

OminiKontext's delta system:

  • Consistent Training: Fixed deltas ensure stable learning across examples
  • Precise Control: Token-level positioning accuracy
  • Efficiency: Avoids tokenizing empty white space through optimized cropping
  • Flexibility: Switch between spatial and contextual blending modes

Live Examples

Multi-Domain Integration Results

OminiKontext demonstrates versatile capability across different domains:

  • Product Placement: Seamlessly inserting products into lifestyle and commercial settings
  • Virtual Try-On: Natural clothing and accessory integration on models
  • Object Insertion: Precise placement of any object into target scenes
  • Character Integration: Cartoon and realistic character placement

Model Performance Comparison

When compared to vanilla FLUX.1-Kontext-dev, OminiKontext shows significant improvements in:

  • Object integration quality across all domains
  • Spatial positioning accuracy and control
  • Scene coherence and natural blending
  • Reference image fidelity preservation
  • Training efficiency for custom use cases

Example Pretrained Models

OminiKontext comes with several example models showcasing different training approaches. These serve as both functional tools and training references for your custom models:

General Object Models

  • character_3000.safetensors: Character and object insertion with white backgrounds
    • Delta: [0,0,96]
    • Use case: Characters, mascots, simple objects
    • Settings: CFG=1.5, LoRA strength=0.5-0.7

Spatial Control Models

  • spatial-character-test.safetensors: Precise spatial positioning system
    • Delta: [1,0,0]
    • Use case: Exact placement control for any object type
    • Supports same-size reference images with corresponding placement

Product Integration Models

  • product_2000.safetensors: E-commerce and product placement
    • Delta: [0,0,96]
    • Use case: Product photography, lifestyle integration, VTON accessories
    • Ideal for commercial and marketing applications

Note: These models demonstrate the framework's versatility. You can train similar models for clothing (VTON), furniture, vehicles, or any object category using the same methodology.

Getting Started

Refer to the installation guide for detailed instructions on how to get started.