Achieving reliable generative AI

Contents

Some truths about generative AI

Event

A brittle quality

Limited impact (so far)

Neural solutions

Looking ahead

DataDecisionMakers

Check out all the on-demand sessions from the Intelligent Security Summit here.

The term “generative AI” has been all the buzz recently. Generative AI comes in several flavors, but common to all of them is the idea that the computer can automatically generate a lot of clever, useful content based on relatively little input from the user.

Much of the recent excitement has been fueled by visual generative AI systems, such as DALL·E 2 and Stable Diffusion, in which the machine generates novel images based on brief textual descriptions. Want an image of “a donkey on the moon reading Tolstoy”? Voila! In a few seconds, you get a never-before-seen image of this well-read, well-traveled donkey.

These systems provide endless fun, and they’re breathtaking. It’s hard to shake the feeling that they must be smart to understand your intent and creative enough to generate an aesthetically pleasing novel image based on it. Plus, there’s a compelling value exchange: You input a few words, and in return, get a picture that’s worth a thousand. Finally, intelligent, creative and useful AI!

Some truths about generative AI

But this is misleading, since it reinforces the idea of the computer doing all the work. If indeed all you want is any aesthetic image of an erudite donkey, chances are you’ll be satisfied with the output; there are many such images or parts thereof, and the systems are good enough to be able to produce one. But if you’re an artist, you have a more nuanced intent in mind, and at best, you’d use the generative system as an interactive tool to generate images based on many prompts you experiment with — and you’re likely to also massage the image yourself afterward.

Event

Intelligent Security Summit On-Demand

Learn the critical role of AI & ML in cybersecurity and industry specific case studies. Watch on-demand sessions today.

Watch Here

This is even more striking in the case of textual generative AI — systems in which both the input and the output are text. Here too, the promise of models like GPT-3 suggests an ideas-to-text future in which the user jots down some key ideas, and the system takes over and does most of the writing. And indeed, current systems are impressive. They write poems, blog posts, emails, marketing copy — the list goes on. The systems can even sometimes produce long-form text that’s surprisingly coherent and on-message, and includes many correct and relevant facts not mentioned in the instructions.

Except when they don’t. And often, they won’t. In practice, textual generative AI, when deployed without proper controls, generates as much nonsense as it does useful content. The most notable recent example of this was Meta’s Galactica, which claimed the ability to generate insightful scientific content but was taken down after two days when it became apparent that it was producing as much pseudo-science as it did credible scientific content.

A brittle quality

The brittleness of textual generative AI was recognized early on. When GPT-2 was introduced in 2019, columnist Tiernan Ray wrote, “[GPT-2 displays] flashes of brilliance mixed with […] gibberish.” And when a year later GPT-3 was released, my colleague Andrew Ng wrote, “Sometimes GPT-3 writes like a passable essayist, [but] it seems a lot like some public figures who pontificate confidently on topics they know little about.”

Certainly, those of us working in the area have been well aware of this brittleness. In reality, the textual generative systems are, at best, used as idea generators, stirring the imagination of the human writer. My colleague Percy Liang, no stranger to generative AI, reports having used it in this mode when composing a speech for a wedding.

But relying on generative AI to reliably produce a complete, final text lies beyond the capabilities of current systems. As a well-known publisher recently complained to me, the time his company saved by using a certain generative system was offset by the time it needed to spend fixing the nonsense the system produced.

Limited impact (so far)

This brittleness of current generative AI limits its impact in the real world. To fully realize its potential, generative AI — especially the textual kind — must become more reliable. Several technological developments hold promise in this regard.

One is increasing the degree to which the output is firmly anchored in trusted sources. By “firmly anchored,” I don’t mean merely being trained on trusted sources (which is already an issue in current systems), but also that important parts of the output can be reliably traced back to the sources on which they were based. Current so-called “retrieval methods,” which access trusted text to help guide the output of the neural network, point in a promising direction.

Another key element is increasing the degree to which the systems exhibit basic common sense and sound reasoning. Long-form text tells a story, and the story must have internal logic, be factually correct, and have a point. Current systems do not.

The statistical nature of the neural networks which power the current systems enables them to produce cogent passages some of the time, but they inevitably fall off the cliff when pushed beyond a certain limit. They make blatant factual or logical errors and can easily veer off-topic.

Neural solutions

There are several strands of work aimed at mitigating this. They include purely neural approaches, such as so-called “prompt decomposition” and “hierarchical generation.” Other approaches follow the so-called “neuro-symbolic” direction, which augments the neural machinery with explicit symbolic reasoning.

But I think the most important development will be the harmonization of product and algorithmic thinking. The temptation to “get something for nothing” seduces people into not providing enough guidance to the generative systems and demanding an output that is too ambitious.

Generative AI will never be perfect, and a good product manager understands the limitations of the underlying technology; she designs the product to compensate for those limitations and, in particular, crafts the best division of labor between the user and the machine. Galactica, as mentioned earlier, is actually an interesting engineering artifact. But asking it to reliably produce scientific papers is just too much.

Generative AI needs more guidance — if you don’t know where you’re going, you’ll never get there. The guidance can be given upfront, such as by an enriched set of prompts, but also interactively in the product itself.

Looking ahead

The jury is out on which combination of techniques will prove most useful, but I believe that the shortcomings of generative AI will be dramatically reduced. I also believe that this will happen sooner rather than later because of the enormous economic benefits of reliable textual generative AI.

Does that mean the end of human writing? I don’t believe so. Certainly, some aspects of writing will be automated. Already today, we can’t live without spell-checking and grammar correction software; copy editing has been automated. But we still write, and I don’t think that will change.

What will change is that, as we write, we’ll have built-in research assistants and editors (in the sense of a book editor, not the software artifact). These functions, which have been a luxury only the very few can afford, will be democratized. And that’s a good thing.

Yoav Shoham is the co-founder and co-CEO of AI21 Labs.

DataDecisionMakers

Welcome to the VentureBeat community!

DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.

If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.

You might even consider contributing an article of your own!