“Intelligent” Image Generation: Crafting Visuals with AI

Sorab Ghaswalla
8 min readSep 7, 2023

--

Producing images, art, or graphics now requires neither a camera, photographer, nor graphic designer.

Photo by Kamyar Ghalamchi on Unsplash

In today’s digital age, one aspect that has undergone a profound transformation is content creation. It has evolved far beyond traditional mediums such as writing or capturing still photographs. Technology, once a companion, has now emerged as the forefront of content generation. Today, we find machines actively engaged in content creation.

Perhaps, the initial instances of technology aiding individual content creation can be traced back to the manual typewriter, or even earlier with the use of quills, pencils, and pens.

The modern version of the pen came into existence in the early 1800s, followed by the still photo camera. The typewriter came into being towards the latter half of the 19th century. Between then and the early ’70s, the situation, by and large, remained the same. Till television and then the PC became household names. The Internet followed the home computer, and then came the World Wide Web, the dotcom boom, the move away from analog to digital, followed by the Metaverse, virtual reality, augmented reality, and then artificial intelligence. Boom. Suddenly, the content landscape was altered to a digital realm where information could be created, stored, and shared electronically, enabling unprecedented speed, accessibility, and scalability in content production and distribution.

In the last 50 years or so, the world of content has seen a dimensional change, much of it propelled by computerization, blazing-fast connectivity, and pocket-friendly software. To think that it was barely 100 years ago when people were using ink-based styluses and paper, resorting to a printing press when they needed to reach the masses. In fact, a few of us in the content industry today can still claim to have used the typewriter before having hitched a ride on the bullet train of tech.

Each of these developments has left its own indelible mark on the world of content and creators. Today, content is no longer limited to the printed word, but instead is this heady mix of all sorts of things — from cool virtual reality adventures and interactive websites to podcasts, videos, and more. It’s always changing and pushing the limits of what we can do creatively.

Some of you may recall having read my piece on Generative Adversarial Network (GAN), a hot subject in the worlds of artificial intelligence and neural networks.

Continuing that conversation, I shall talk about AI content generators today, especially those conjuring images from a mere handful of words….or text.

What Are AI Content Generators?

Some of you at least may have heard words like “Midjourney” or “DALL-E”. A few of you may have even used them. The diffusion model used by them is a type of GAN. These cutting-edge software are taking content creation to a different plane altogether.

An AI content generator is a tool that uses artificial intelligence to create content. This content can be in the form of text, images, or even videos. AI content generators are trained on large text and code datasets, allowing them to learn the patterns of human language and creativity. So you have software writing out blogs, ad copy, research papers, almost, everything.

But for the purposes of this newsletter, I shall delve only into those generators that create images or works of art based on text inputs….like Midjourney. There are now many in the market, like Midjourney, working almost on the same principle, but they are all equally fascinating.

Midjourney is a text-to-image AI generator in technology parlance that uses a diffusion model and a large language model (LLM). For the layman, it’s where you go, key in a few sentences, and lo, out comes an image, or many. Deleted from the equation: the camera and the man behind the lens!

Here’s how it basically works: The first step is to provide a text “prompt” that describes the image you desire. The text prompt should be as detailed as possible and specific about the features of the image you want to create. For example, you could say, “a painting of a cat sitting in a hat” or “a photorealistic image of a cityscape at night.”

The Midjourney AI model then uses its LLM to convert the text prompt into a numerical representation. As most of you know by now, the LLM is trained on a massive dataset of text and code, allowing it to understand the nuances of human language and creativity.

The numerical representation is then used by what’s called a “diffusion model” to generate a series of images.

The diffusion model works by gradually adding “noise” to an image until it becomes the desired image. (BTW, image noise is a random variation of brightness or color information in digital photos). The first image will be blurry and abstract, but the images will become more and more realistic as the model learns. You can then choose the image that you like the best.

These AI models are trained on a massive dataset of images (there’s a lot of controversy around copyright, etc, but I shall save that for another day). This dataset allows the model to learn the patterns of human vision and creativity. The more images that the model is trained on, the better it will be at generating realistic and creative images.

Midjourney is a powerful tool that can be used to create various images. However, it is important to note that neither Midjourney nor its rivals are perfect. The images that come out sometimes be blurry or unrealistic. To me, many of them still manage to look very “plasticky” and unreal. But the journey has started, and I am sure, very soon, we’ll have some real-world type images.

Here are some additional details about how Midjourney works:

  • The Midjourney AI model is trained on a massive dataset of images that are collected from the internet. This dataset includes images from various sources, including social media, stock photo websites, and art galleries.
  • The LLM used by Midjourney is also trained on a massive dataset of text and code. This dataset includes text from books, articles, and websites, as well as code from open-source projects.
  • The diffusion model used by Midjourney is a type of GAN. They are a type of machine learning model that is trained to compete against each other. In the case of Midjourney, the two GANs are trained to generate images and to distinguish between real and fake images.
  • Midjourney is still under development, and the quality of the images that it generates can vary. The model is constantly being updated and improved, and the goal is to create a model that can generate realistic and creative images that are indistinguishable from real images.

Others Like Midjourney

Some of the AI content generators like Midjourney are:

  • DALL-E 2: OpenAI created this text-to-image diffusion model, which can generate realistic images from text.
  • Stable Diffusion: This is another text-to-image diffusion model. It is known for its high-quality images and its ability to generate images from complex and challenging prompts.
  • Jasper Art: This is a text-to-image generator that uses a generative pre-trained transformer model. It can generate a variety of different images, including paintings, drawings, and sketches.
  • Dream by WOMBO: This is a text-to-image generator that uses a neural network called DeepDream. It can generate surreal and abstract images from text descriptions.
  • NightCafe: This is a text-to-image generator that uses a generative adversarial network (GAN). It can generate realistic and creative images from text descriptions.
  • AutoDraw: This is an AI-powered drawing tool that can help you create drawings from your text descriptions.
  • Designs.ai: This is an AI-powered design tool that can help you create logos, posters, and other designs from your text descriptions.
  • StarryAI: This is an AI-powered image upscaling tool that can help you improve the quality of your images.
  • Craiyon: This is a free AI image generator that was formerly known as DALL-E mini. It is known for its simplicity and ease of use.

These are just a few of the many AI image generators that are available (not my endorsement, just knowledge). Imagine, that almost all of them came online within the last year or so.

Photo by Austin Chan on Unsplash

Some Drawbacks of Text-to-image AI Generators

  • They can be inaccurate. AI text-to-image generators undergo training using extensive image datasets, yet they are susceptible to errors. Instances of these errors include generating images that deviate from the provided text description or producing images with blurriness or pixelation.
  • They can be biased. AI text-to-image generators are trained on datasets that are created by humans, and these datasets can reflect human biases. For example, an AI text-to-image generator may be more likely to generate images of white people than of people of color.
  • They can be used to develop hateful content. It’s crucial to recognize that AI text-to-image generators can be harnessed for the creation of harmful content, including images depicting violence or hate speech.
  • They can be expensive. Some AI text-to-image generators are expensive to use. This can be a barrier for people who want to use these generators.
  • They can be difficult to use. Some AI text-to-image generators are difficult to use. This can be a barrier for people who are not familiar with technology.

Despite these drawbacks, AI text-to-image generators are a powerful tool that can be used to create creative and interesting images.

What Do Other AI Content Generators Do?

  • Write blog posts and other long-form content
  • Create marketing copy, email campaigns, and other content collaterals
  • Generate product descriptions, social media posts, and other marketing content
  • Translate from one language to another
  • Write creative content like poems, code, scripts, and music

Content Creators: Add Context to the World of Content. Join Our LinkedIn Group!

How Precisely Do AI Content Generators Help?

That’s the million-dollar question: Just how do these AI generators help content creators? The answer:

AI content generators can be very helpful for businesses and individuals who need to create content quickly and easily. They can also be helpful for people who struggle with writing or who want to improve their writing skills.

But, at the same time, experience over the last 10 months has shown us that AI content generators are not perfect. They can sometimes make mistakes, and they may not always be able to generate original or creative content.

Here are some other points to consider when using AI content generators:

  • The quality of the content generated by AI content generators can vary depending on the tool. Some tools are better at generating certain types of content than others.
  • It is important to proofread and edit the content generated by AI content generators before publishing it. This is to ensure that the content is accurate and error-free.
  • AI content generators should not be used as a replacement for human creativity. They can be a helpful tool but should not be used to automate the entire content creation process.

Conclusion: AI content generators are a powerful tool that can be used to create content quickly and easily. However, it is important to use them wisely and to proofread and edit the content before publishing it. AI content generators can be a great way to save time and improve your writing skills, but they should not be used as a replacement for human creativity.

--

--

Sorab Ghaswalla

An AI Communicator, tech buff, futurist & marketing bro. Certified in artificial intelligence from the Univs of Oxford & Edinburgh. Ex old-world journalist.