The OpenAI artificial intelligence research lab has announced details of its latest technology that will bring massive improvements to 3D rendering.
OpenAI is the company behind the text-to-image generator, DALL-E, which has now focused on translating text prompts into 3D point clouds, which they called POINT-E.
According paper (opens in a new tab) published by OpenAI, POINT-E “creates 3D models in just 1-2 minutes on a single GPU,” compared to other current solutions that can take hours and require multiple GPUs.
An excerpt from the article describes the current place of POINT-E in the world of building 3D models:
“Although our method is still not state-of-the-art in terms of sample quality, sampling is one to two orders of magnitude faster, offering a practical compromise in some use cases.”
It works by generating a single synthetic view with a text-to-image diffusion model. A 3D point cloud is then generated which is easier to synthesize, hence the reduced load on GPUs, although it does not capture finer details, hence the trade-off mentioned in the article.
A secondary AI has been trained to mitigate some of these problems, but the paper explains that this can “sometimes miss thin/sparse parts of objects” such as plant stems, giving the illusion of floating flowers.
OpenAI promises to train AI on several million 3D models and their metadata, though use cases remain quite limited for now.
One such example is the rendering of real objects for 3D printing, although as the technology develops and refines, it is likely that we will see it in more advanced cases such as gaming and even television.
The project’s open source code is available at Github (opens in a new tab),