Summary of CMIS 4140 Control Net Prompting and Intro to LoRA Training 06/17/25
Summary of "CMIS 4140 Control Net Prompting and Intro to LoRA Training 06/17/25"
Main Ideas and Concepts
- Course Context and Introduction
- This is the third class of CMIS 4140.
- The instructor missed the previous class due to a project with the Atlanta Braves, hinting at future student collaboration opportunities.
- The class focuses on catch-up and deepening understanding of ControlNet and LoRA training within Comfy UI and Flux.
- These concepts apply broadly to stable diffusion workflows, not just Flux.
- ControlNet Overview
- ControlNet is used to guide AI image generation by using reference images processed into depth maps, poses, line art (Canny edges), etc.
- It helps control the output for consistent characters, backgrounds, poses, and art styles.
- Different preprocessors (OpenPose, DW Preprocessor) extract pose, face, and hand data.
- Multiple ControlNet modules can be chained, but overusing them can degrade image quality.
- Depth maps are preferred for maintaining pose accuracy and quality.
- Canny edges provide line art control, useful for artists who sketch.
- ControlNet strength settings influence how strictly the AI follows the reference.
- LoRA (Low-Rank Adaptation) Models
- LoRAs are custom-trained models that act as creative building blocks to represent characters, styles, objects, or locations.
- The instructor provided example LoRAs of famous people and a product (Duke’s Mayonnaise) for academic use only (not commercial).
- LoRAs enable consistent character generation and style control.
- LoRAs can be blended or stacked but should be limited to 3-4 for best results.
- Training LoRAs involves sourcing 26-41 handpicked images showing varied poses, expressions, lighting, and framing.
- Using CG characters or real photos is acceptable; the instructor recommends clean, legal datasets.
- Overtraining (too many images or too many epochs) can cause unnatural, overly CG-like results.
- LoRAs trained on fewer, high-quality images tend to yield better, more natural results.
- Training LoRAs with Flux Gym on RunPod
- Flux Gym is a tool/environment for training LoRAs.
- RunPod cloud service is used to deploy GPU instances (e.g., Nvidia 4090) for training.
- The process involves:
- Preparing and renaming images (using Bulk Rename Utility for batch renaming).
- Uploading images and captions to Flux Gym.
- Setting training parameters:
- Learning rate: 5e-4
- Save every epoch (set to 1)
- Network dimension (LoRA rank): 16 (default is 4)
- Enable bucket (for multi-resolution images)
- Minimum SNR gamma: 5
- Multi-res noise discount: 0.3
- Multi-res noise iterations: 6
- Noise offset: 0.1
- Train batch size: 2 (for 4090 GPU)
- Max epochs: 8 (recommended to avoid overtraining)
- Starting the training and monitoring progress.
- Downloading trained LoRA files (typically epochs 6 and 7 are optimal).
- Training can be done locally or on cloud pods; pods must remain active during training.
- The instructor emphasizes the importance of prompt engineering alongside LoRA training for best results.
- Prompt Engineering and Workflow Tips
- Use trigger words to call specific LoRAs in prompts (e.g., “Freeman” for Morgan Freeman LoRA).
- Structure prompts with a cue for shot type, character description, environment, and realism modifiers.
- Use tools like ChatGPT to generate or refine prompts.
- Seed searching (testing different random seeds) is necessary to find the best output.
- ControlNet can be toggled on/off to balance pose control vs. creative freedom.
- Resolution settings (720p vs 1080p) impact speed and detail.
- Steps (20-28) control the quality and training detail.
- Ethics and Usage
- Provided LoRAs of celebrities are for academic use only, not commercial.
- Students are encouraged to create their own LoRAs from legal, personally sourced images.
- LoRAs can represent people, places, art styles, or objects.
- The instructor stresses ethical considerations and responsibility when using AI-generated likenesses.
- Industry and Career Perspective
- AI tools like ControlNet and LoRA represent a new frontier in digital art and content creation.
- Mastery of these tools can provide a competitive edge in the chaotic creative industry.
- The instructor shares personal motivation to democratize access to advanced creative tools at an affordable price.
- Emphasizes the
Category
Educational