Sora Has Exponentially Improved AI Video Generation

Open AI’s new Sora video model has just released, and boy is it impressive. Lets take a brief look at what it is, what it is capable of, and its future.

Sora Has Exponentially Improved Ai Video Generation

Sora Has Exponentially Improved Ai Video Generation

Sora is OpenAI’s (makers of GPT/DALL-E)newest creation and has been making huge waves in the AI community. What it is capable of is leaps and bounds ahead of previous text-to-video models. Let’s examine what Sora is, and what we can expect to see from it shortly.

What is Sora?

According to OpenAI “Sora is an AI model that can create realistic and imaginative scenes from text instructions.” and boy are they being modest. Sora has capabilities far beyond any previous text-to-video model by miles. Sora was just recently announced and is being showcased on OpenAI’s Website, and already AI enthusiasts are buzzing about it.

Some Fantastic Sora Examples

These examples are all taken from OpenAI’s website, and there are MANY more, I encourage you to visit and take a look for yourself at all of them.

Prompt: The camera follows behind a white vintage SUV with a black roof rack as it speeds up a steep dirt road surrounded by pine trees on a steep mountain slope, dust kicks up from it’s tires, the sunlight shines on the SUV as it speeds along the dirt road, casting a warm glow over the scene. The dirt road curves gently into the distance, with no other cars or vehicles in sight. The trees on either side of the road are redwoods, with patches of greenery scattered throughout. The car is seen from the rear following the curve with ease, making it seem as if it is on a rugged drive through the rugged terrain. The dirt road itself is surrounded by steep hills and mountains, with a clear blue sky above with wispy clouds.
Prompt: Historical footage of California during the gold rush.

Knowing It’s Weakness

The website also shows some of the limitations and weaknesses in the model at present. There are some things it simply struggles to do. They stated the following:

The current model has weaknesses. It may struggle with accurately simulating the physics of a complex scene, and may not understand specific instances of cause and effect. For example, a person might take a bite out of a cookie, but afterward, the cookie may not have a bite mark.

The model may also confuse spatial details of a prompt, for example, mixing up left and right, and may struggle with precise descriptions of events that take place over time, like following a specific camera trajectory.

Prompt: Five gray wolf pups frolicking and chasing each other around a remote gravel road, surrounded by grass. The pups run and leap, chasing each other, and nipping at each other, playing.
Weakness: Animals or people can spontaneously appear, especially in scenes containing many entities.
Prompt: Archeologists discover a generic plastic chair in the desert, excavating and dusting it with great care.Prompt: Archeologists discover a generic plastic chair in the desert, excavating and dusting it with great care.
Weakness: In this example, Sora fails to model the chair as a rigid object, leading to inaccurate physical interactions.

Sora’s Reception

Video generated with OpenAI’s Sora. Hard to wrap your mind around this.
byu/gantork insingularity

This was the initial video I and many others saw posted to Reddit that captivated me. Truly the reflection is impressive, and the overall composition is mature. This goes well beyond anything seen in the short amount of time that text-to-video has existed.

Twitter is also lively with the discussion on Sora.

And Just Because

That is it, I simply wanted to show what I found to be quite impressive and am excited to see more of in the future. AI has come a long way in a short time, and these examples are sure to get many believing in their multitude of multimedia possibilities. OpenAI hasn’t done it all themselves, remember:

Leave a Reply

Your email address will not be published. Required fields are marked *