A New Workflow for Realistic Characters in Midjourney
As artists and creators, we often find ourselves at the intersection of innovation and tradition, especially in the realm of digital art. Midjourney has introduced a functionality that significantly alters this landscape by allowing users to directly integrate style and character references into their prompts. Previously, achieving this level of detail with characters was a cumbersome process involving the creation of extensive character sheets, prefer option sets, and meticulous image referencing. What started as a personal project soon turned into a refined workflow capable of producing highly realistic and artistically rich characters, pushing the boundaries of what’s possible with these tools
In this post, I will take you through each stage of my development process, from the initial trials and errors to the perfected technique that has now become a cornerstone of my work.
Exploring the Roots: The Limitations of Text Prompting
Initially, my journey began with basic text prompts, which relied solely on verbal descriptions to generate images. While this method provided a foundation, it often resulted in characters that lacked depth and authenticity. Even with more detailed descriptions, I quickly encountered it’s inherent limitations.
Text prompts are straightforward: you describe a character in words, and the AI attempts to translate these descriptions into visual representations. Initially, this simplicity was appealing, allowing for quick and easy creation processes. But the more I worked with these prompts, the more I realized that the resulting images often missed the nuanced expressions, intricate details, and specific environments I envisioned.
Examples of basic prompting in Midjourney for a poster of Freddy Mercury
/imagine A professional, iconic, retro-style digital painting of Freddy Mercury in his vibrant yellow jacket, captured in a timeless pose leaning back with a microphone stand. The scene is set in the vast Wembley Stadium, where Freddy appears dwarfed by the scale of the stadium's immense structure. The artwork features a gritty, textured brushstroke style, enhancing the dynamic and dramatic atmosphere of a live concert. The background shows the expansive stadium filled with a sea of adoring fans, under a slightly overcast sky, which adds to the nostalgic and monumental feel of the moment. The painting uses a pastel palette to soften the overall look, giving it a classic, vintage poster vibe. retro style, digital painting, iconic pose, gritty texture, pastel palette, yellow jacket, Wembley Stadium, large-scale background, overcast sky, concert atmosphere, vintage poster, dynamic composition, dramatic lighting, painterly brushstrokes, rock concert. --ar 3:4 --s 1000 --v 6.0 --style raw
A similar prompt for an image of Hozier
/imagine A visually striking pop art poster featuring the musician Hozier submerged underwater, inspired by the aesthetic of his Wasteland Baby album. The scene captures Hozier in a contemplative pose, gazing upward while effortlessly playing his guitar. The water around him is clear, with soft, diffuse light filtering through, creating a serene yet slightly gritty atmosphere reminiscent of classic pop art. Hozier's features are distinct: his long, curly hair floats around his face, and his expression is thoughtful, almost ethereal. The underwater setting adds a surreal quality to the image, emphasizing the themes of reflection and depth found in his music. pop art style, faded gritty texture, underwater setting, musician portrait, Hozier, long curly hair, playing guitar, contemplative expression, clear water, soft diffuse lighting, serene atmosphere, surreal quality, reflection theme, depth theme, album inspired --ar 3:4 --style raw
The characters generated from these prompts tended to be highly innacurate and lacked dimensionality. This generic quality was far from the realistic and detailed characters I aimed to create, which should reflect the complexity of real human beings.
The turning point was when I started exploring Midjourney’s advanced features, specifically --cref
(character reference) and --sref
(style reference). These tools allowed for more precise control over the appearance and style of the generated characters, providing a pathway to more realistic and detailed outputs.
The --cref
option enables the integration of specific character references into the generation process. By using an image of a real person or even a sketch as a reference, I could guide the AI more accurately to achieve the desired facial features, body language, and overall demeanor. Meanwhile, --sref
allowed me to incorporate specific artistic styles into the creations. Whether I wanted the texture of oil paintings or the distinct lines of ink sketches, --sref
made it possible.
Comparison of the above prompts with the added character and style parameters from refrence images
With these tools, the characters started to exhibit a greater level of detail and authenticity. While they weren’t fully accurate, Facial features were more authentic to the refrence, the clothes showed texture, and the backgrounds were more than just vague settings—they were integral parts of the story each character told.
I out started by experimenting with separate images for each parameter. While this method provided some control over the outcome, the results (unsurprisingly) often lacked a unified aesthetic and one parameter would sometimes overide another. The breakthrough came when I used the same image for both --cref
and --sref
. This approach ensured that the character’s features and the artistic style were drawn from the same source, leading to a more integrated and seamless representation.
Using two different images for style refrencing:
A Detailed Breakdown of My Workflow
Step 1: Choosing an Image
The first and perhaps most critical step is selecting the right reference images. The choice of image greatly influences the fidelity and style of the final character. For each new project, I spend significant time curating images that not only inspire but also closely match the envisioned character in terms of features, style, and ambiance
When choosing images, I consider:
- Relevance: It’s essential that the image aligns perfectly with the envisioned character and desired atmosphere. The selection process often involves a curated search through high-quality databases to find an image that reflects the specific traits and mood required for the project.
- Quality: High-resolution images are usually preferred because they provide the AI with a detailed visual input, essential for an accurate and richly detailed output. Clear imagery ensures that every nuance—from the texture of the skin to the subtleties of lighting—is captured.
- Expressiveness: Images that convey clear emotions or a distinct ambiance are particularly powerful. These images provide a strong narrative element, which enhances the overall impact of the artwork.
Step 2: Writing The prompt & Implementing the Unified Reference Image Strategy
Once the reference images are selected, crafting a precise prompt is essential. The specificity of the prompt (particularly whenusing style parameters) greatly enhances the AI’s ability to generate results that closely mirror the reference. This involves describing not just the physical characteristics but also the mood, setting, and any particular stylistic elements that should be prominent. Describing an image in different style or composition to the reference would tend to confuse the model and result in unwanted routputs.
As mentioned previously using the same images for both --cref
and --sref
ensures that the character’s features and the artistic style are drawn from the same source, leading to a more cohesive and authentic output.
The Benefits of The “Unified Reference Image Strategy”:
- Cohesive Visualization: By using a single image, the AI is not left to reconcile differing inputs for character and style, which can often lead to disjointed or conflicting elements in the artwork. A unified reference leads to a more integrated and harmonious result.
- Enhanced Realism: This strategy significantly enhances the realism of the character, as the style and physical depiction are drawn from the same source, maintaining consistency in lighting, texture, and atmosphere.
- Streamlined Process: It simplifies the workflow by reducing the number of decisions and adjustments needed during the creation process, allowing for a more efficient and reliable means of character design
Example of a detailed prompt with –cref and –sref
/imagine A professional, iconic, retro-style digital painting of Freddy Mercury in his vibrant yellow jacket, captured in a timeless pose leaning back with a microphone stand. The scene is set in the vast Wembley Stadium, where Freddy appears dwarfed by the scale of the stadium's immense structure. The artwork features a gritty, textured brushstroke style, enhancing the dynamic and dramatic atmosphere of a live concert. The background shows the expansive stadium filled with a sea of adoring fans, under a slightly overcast sky, which adds to the nostalgic and monumental feel of the moment. The painting uses a pastel palette to soften the overall look, giving it a classic, vintage poster vibe. retro style, digital painting, iconic pose, gritty texture, pastel palette, yellow jacket, Wembley Stadium, large-scale background, overcast sky, concert atmosphere, vintage poster, dynamic composition, dramatic lighting, painterly brushstrokes, rock concert. --ar 3:4 --s 1000 --cref https://s.mj.run/rTWZ95Qg9c0 --sref https://s.mj.run/rTWZ95Qg9c0 --v 6.0 --style raw
Step 3: Balancing Style and Content Weights
The next phase involves fine-tuning the influence of style versus content through the --sw
(style weight) and --cw
(content weight) parameters, which range from 0 to 1000 and 0 to 100, respectively. Proper adjustment of these settings is crucial for achieving the desired balance between the stylistic elements and the fidelity of the character’s portrayal.
- Style Weight (–sw): Adjusts how much the artistic style of the reference image influences the character. A higher value results in a stronger adherence to the style.
- Content Weight (–cw): Adjusts how much the content (the physical details) of the reference influences the character. A higher value means more accurate portrayal of features.
Finding the sweet spot for these parameters often involves experimentation, starting with balanced values like --sw 50
and --cw 50
, and adjusting based on preview results. For the most consistent results, I’d reccomend using the “Vary Subtle” function for adjusting each parameter. Doing so would keep the desired pose and composition across prompts (by using the same image seed) while only affecting the influence of the style refrence.
- Adjusting style and content weights
A professional, iconic, retro-style digital painting of Freddy Mercury in his vibrant yellow jacket, captured in a timeless pose leaning back with a microphone stand. The scene is set in the vast Wembley Stadium, where Freddy appears dwarfed by the scale of the stadium's immense structure. The artwork features a gritty, textured brushstroke style, enhancing the dynamic and dramatic atmosphere of a live concert. The background shows the expansive stadium filled with a sea of adoring fans, under a slightly overcast sky, which adds to the nostalgic and monumental feel of the moment. The painting uses a pastel palette to soft
en the overall look, giving it a classic, vintage poster vibe. retro style, digital painting, iconic pose, gritty texture, pastel palette, yellow jacket, Wembley Stadium, large-scale background, overcast sky, concert atmosphere, vintage poster, dynamic composition, dramatic lighting, painterly brushstrokes, rock concert. --ar 3:4 --s 1000 --cref https://s.mj.run/rTWZ95Qg9c0 --cw 50 --sref https://s.mj.run/rTWZ95Qg9c0 --sw 100 --v 6.0 --style raw--v 6.0 --style raw
- Visual comparison of outputs with different –sw and –cw settings using the “Vary Subtle” Function. Notice that the overall composition remains largely unchanged across each variation
Based on how closely you wish to adhere to the reference, you can adjust the weights accordingly. However I found the best results which struck a balance between originality and accuracy was using a character weight of 50 and a style weight of 100. A stronger weight on either would tend to produce more generic images while lower weights would lose the style all together.
Step 4: Inpainting and Custom Zoom Techniques
After establishing a solid foundation with the unified reference image and precise parameter settings, the next steps involve using advanced techniques like inpainting and custom zoom to refine and enhance the artwork further.
Inpainting: Detail Enhancement and Correction
Inpainting is a technique that allows for localized edits and enhancements within an image, providing an opportunity to correct or enhance specific areas that may not have rendered as expected. This method is particularly useful for adjusting facial features, textures, or background elements that require more precision.
Using inpainting involves:
- Identifying areas of the image that need refinement, such as blurred details, incorrect textures, or unwanted artifacts.
- Crafting specific prompts that describe exactly how these areas should be adjusted, ensuring that the inpainting does not disrupt the overall style and coherence of the image.
- Applying the inpainting technique selectively, focusing on small sections to maintain control over the modifications without overwhelming the original design.
Custom Zoom: Seeing the bigger picture
Custom zoom is another powerful tool that allows artists to focus on and expand specific sections of an image for detailed work. This technique is ideal for expanding and reframing your composition along with placing additional narrative elements into the environment if needed.
Steps to effectively use custom zoom include:
- Specifying the amount you want the image to be zoomed out by (between 1x and 3x) .
- Adding or changing environmental elements in the composition (i.e. people, buildings, clouds, etc.)
- Using the zoomed-out area to apply further refinements, such as texture enhancements which contribute to the overall realism and narrative within the image.
Both inpainting and custom zoom are integral to polishing the final image, allowing for an unprecedented level of detail and customization in digital artwork. By integrating these techniques, artists can not only correct minor flaws but also add layers of depth and complexity to their creations, pushing the boundaries of digital realism further.
Final Touches: Enhancing Images in Photoshop
After refining the images in Midjourney, I often take the results images into Photoshop for further customization. This stage is where personal creativity truly shines. Here, I can fine-tune color grading, adjust clarity and texture, and even incorporate typography for projects that require textual elements. Each of these adjustments is aimed at enhancing the unique qualities of the image, ensuring that it not only captures the original vision but also stands out with its own distinctive flair. This final phase of customization in Photoshop is crucial; it transforms solid digital artwork into a one-of-a-kind piece that resonates with its intended audience and tells a story
Embracing A New Era of Artistic Possibilities
The exploration and implementation of the unified reference image strategy, detailed parameter adjustments, and advanced techniques like inpainting and custom zoom illustrate just how transformative this technology can be for artists. The potential applications for these tools are as vast as the imagination allows. From editorial and graphic design and to concept art and illustration, the ability to craft detailed, realistic characters and environments can significantly impact various creative industries. Moreover, these advancements encourage a deeper engagement with technology as a creative partner, opening up dialogues and collaborations that enrich both the tech and art communities.
For those interested in learning more about generative art, the Midjourney documentation is an invaluable source. It provides valuable insights into effectively utilizing AI for artistic creation, ensuring that both new and seasoned artists can harness the full potential of these tools. Happy prompting!!