A Computer Tried (and Failed) to Write This Article

Jane Kelly/Shutterstock.com

For now, machine journalists should probably stick to box scores and basic weather reports.

None of this turned out how it was supposed to. 

Here I am, a human, writing a story assigned to a machine. If I’d published what the computer came up with, you’d be reading something like:

content that was that communications and everything that makes on a person what they’re are also to be in the Internet in the fact about it is that models are technologication of the same that its also from the most computer.”

A machine really did come up with that garbage. But let me back up for a second. 

A little over a year ago, I started asking around—among computer scientists at universities and tech companies, mostly—to see if someone would help me design and carry out a weird little experiment I had in mind. I wanted to train a machine to write like me. 

The idea was this: We’d give Robot Adrienne a crash course in journalism by having it learn from a trove of my past writings, then publish whatever Robot Me came up with. 

This isn’t a far-fetched idea, and not just because robots have a long track-record of automating human labor.

There are already algorithms that can write stories. At The Los Angeles Times, there’s a program that reports breaking news about earthquakes. Bots can easily be programmed to write other basic stories—things like box scores and real estate listings, even obituaries.

In January, Wired had a news-writing bot produce a remembrance of Marvin Minsky, the artificial intelligence pioneer. The result was a little dry compared with the obituary for Minksy written by a human at The New York Times—but the machine version was decent.

Last year, NPR’s Scott Horsley raced a bot to report quarterly earnings for the diner chain, Denny’s. Horsely’s version had more stylistic flourish, but the bot filed its story in 2 minutes flat—5 full minutes before Horsely, basically an eternity in radio time. 

Impressive as these news bots may be, I didn’t want Robot Adrienne to be a formulaic speed machine with a Mad Libsian approach to reporting. Instead, I wanted to use a neural network, a computer model inspired by the human brain, that could assess a bunch of input data, infer its structure, then represent that information with its own original output.

This approach could ostensibly produce something more complex and less predictable than the bots that spend their days filling in the blanks. Using a neural network to generate language is, on the simplest level, how human babies learn about language, too: they model input from adults, then generate their own output.

“If you think about that loop, babies are taking information in, encoding it, and expressive speech is the output,”  said Kirsten Cullen Sharma, a neuropsychologist who focuses on child development. “This is how the brain develops, [and determines] what connections the brain decides to keep.”

My interest in a neural-net configuration for my experiment was somewhat selfish. These networks are so cool in part because when you give a machine a gob of data, that information is all the machine has to make sense of the world. So when you look at the eventual output, you have the potential to learn quite a bit about the input.

If a computer can learn how to write simply by analyzing one person’s work, that machine is theoretically in a position to reveal otherwise subtle idiosyncrasies about the input data—a particular writer’s structural conventions might become obvious, and other stylistic tics illuminated.

If a robot’s eventually going to take my job anyway, why not get it to help me become a better writer first?

Last fall, Google agreed to help me. This was great news. The company’s commitment to A.I. research is well documented: It has taught computers to dream, to defeat humans in abstract strategy games, and to compose music. Artificial intelligence is arguably at the heart of everything Google is and does.  

But after months of back and forth, with little explanation, the company said it wouldn’t be able to help me with my experiment after all. (Understandable, of course, that one of the world’s leading tech firms might have other priorities.) I had similar experiences with several other big tech companies that focus on artificial intelligence. 

Last month, my editor suggested I ask the writer and technologist Robin Sloan, one of those rare humans who always seems to be effortlessly dabbling in some crazy-brilliant side-project, yet is still somehow kind enough to maybe consider helping a random reporter obsessed with building a robot version of herself.

As it happened, Sloan had lately been conducting a similar language-generation experiment of his own: tinkering with a recurrent neural network and a massive chunk of text from the annals of science fiction as a way to see what kind of story a computer mind might write. 

So, I begged him to help me, and told him about the massive plain text doc of my writing I had all ready to go. 

“I am totally game to train an RNN on your Adrienne Corpus,” Sloan told me. “I would definitely set your expectations low—there is no way in which its output will seem like ‘a piece by Adrienne’—but at the same time, the things these networks capture—the ‘flavor’ of text—are, to me, still pretty remarkable.”

So, I sent Sloan the document, which contained painstakingly copy-and-pasted text from two years of published stories—almost all the stuff I’d written forThe Atlantic since 2014, totaling 532,519 words. Sloan turned to an open-source Torch-RNN package—which you can find on GitHub, courtesy of the Stanford computer scientist Justin Johnson—and he got to work.

“One of the magical things about this moment in deep learning is that many of core tools have been packaged up and made accessible to enthusiast-level programmers (like me), but that doesn’t mean it’s exactly straightforward,” Sloan explained. “I had to install and configure a scientific computing package called Torch and a bundle of sub-packages that specifically support neural networks.”

He also used a computer with a very fast GPU—experimentation still would have been possible with the kind of processor in a powerful laptop, for instance, but it would have taken a lot longer.”

Here’s what it looks like when you train a neural network. (Robin Sloan)

It became clear pretty quickly that half-a-million words, or about 3 MB of text, wasn’t enough for the neural network to learn language the way I’d hoped.

When Sloan entered a prompt—“The most interesting thing is...”—here’s what the machine spit out in various iterations of the initial experiment:  

The most interesting thing is a different on the past with the distriction of the film of a different of the computer seems of the since something that it was have people in the companies of the most computer that their people of the find of surfect for the profession of the same -- and the prediction of the company state that makes it was something when they were not on the other more of the meanth for the computer that they were all the parence -- and for the way that it is a companies and one of the paternation of the long of the computer in the some hald for a really to the first that has product and computer and companies in for the way that it makes the first that was a stories of the person of a person of the way the first that have a term of the question of the most interesting the computer of where the computer that they are things about the companies that was interested to computer when they were all of the planet of the companies with the most in the way the part of the companies that their experience and…

Here’s another example:

The most interesting thing is the one of the could that experience. "In the paceported by the drate which the nettory ferers of the east desound, the Now that are not be a mabriates, macketing other way for distant to a still provides. The presive in the long the years of could design are a computer to a time in the used to the person replored the same that it for the decigual is behove probless. For mean. There was a few in pulling diefter on pook simple Moren the really complement had business looking with too was flay depressicled to the the truition of a part the concent of the computer who have the decention of a bick of a litten, For study releaser satelles piece,  and deference the recond in the world with a prite vises. The years like what's the play with it was not been to call the movion from the five samplo do make see planet as "more somethative to have a biggle than the finse counters to the mother take Google, eught in a releason in partable partically as the "explore of the more and the trach and sai …

The computer had ingested the Adrienne Corpus and produced mostly gobbledygook. This wasn’t exactly a surprise. 

“The deal with these networks—the thing that makes them so powerful, but also weirdly fragile—is that they know NOTHING at the outset of their training,” Sloan said. “It's not like you tell them, ‘OK, this is going to be English text, look for nouns and verbs, etc.’—it's just a stream of characters. And from that stream, it infers all this amazing structure. BUT the inference requires a lot of input.”

So we needed more words, but how many?

Sherjil Ozair, a computer scientist at Baidu Research’s Silicon Valley AI lab, told me I would probably need more text than I’d ever written. Doubling or tripling the existing sample size wouldn’t be enough.

“You’ll be able to generate something, but the story won’t be legible,” Ozair told me. “You really need an order of magnitude like 100 megabytes—but that data doesn’t exist, and that’s the fundamental problem. We have these methods that could work really well, but only if you have a tremendous amount of data.”

Increasing my sample size to 100 megabytes would mean going from 500,000 words to something like 18 million—or the equivalent of "War and Peace" 30 times in a row. That wasn’t going to happen, but Sloan was still willing to experiment with a much (much) leaner sample.

After hours of piling on the megabytes—adding text from stories I’d written for The New York Times, The Washington PostSlateGawkerHonolulu Civil Beat, and elsewhere dating back to around 2012—I was only up to about 725,000 words. 

Sloan ran the experiment again, this time using the prompt, “She told me…”

She told me and more like modernings in our computer. Of course, this is not a human work. That's the web that's selfies that would be the moon is that they'll make it online. That's not only more importantly changed and most of them in all of those and questions about their factors. For example, as far several cases, all this kind of regulations for information—that's the person who painted itself that's the technological change of human process. And it's going to be the first electric companies that can make it into their context in places—is not a technological computer. Scientists reported her everything that's because there's also an interview that there's something like a story of the findings of their work. But that's what does it reported by their work. This is that Internet connections can complete the way that these people are searching to their cars and the passage of itself. And that's the situation of the library's first computer that was more likely to have all this person betwe…
 

There was a slight difference, but the output still wasn’t exactly publishable, not even close. So, Sloan encouraged the model to be less conservative, riskier in its predictions about which letter to generate next in sequence.

“That prediction is hedged and probabilistic, not definite: at every step, the net established a probability for every possible next character,” Sloan explained. 

When a model is less risky, it’s more likely to be repetitive. (That’s why, on the most conservative setting, the Adrienne Corpus produced sentences like this: “The most people who are all the story of the company and the story of the story of the story of the company and the first place.”)

But too risky can predict its own breed of nonsense. Here’s what the riskier model based on my larger sample of writing produced: 

She told me that the first # years ago in a decade from the Times—even as well as much them. "The background change was created the world that was focused on a time that the pool can make the world." The format points that they weren't always "demonstrating and being solved with organizations over time. (Researchers have been spuns from the property source of the town of a being print people who are likely to try on the platform, and the Internet age is perfectly happening a diversity of a screen. In an excerpt, like himself, the people so we're really been right at the past year. "The world is long and browse discourse," George We said. "It was also a significant model that makes it interesting." March #, you can see the algorithms. The currency of the site servers shifted the first time when the GoPro does it is an infectious earthquakes and technology in the film in a myth. And the sweet was the biggest project, scientists were built before the internet, it was probably looking for this tech...

And here’s another example from a riskier model:

She told me to any new assault. "I was recorded in the world's book," said Marty McBring, a historian of Security Trust and Army CH for earlier his research in #. "The appeal of a little idea was when and what it's hard to release the most politicism they didn't use the brand of same-time. And they take a process of magnitude to the ground it was like before. I would come out with the classification of engineering, the same emergency and the design we didn't interact themselves so that a sort of three poys material. The larger device is more more than the error and the process of what we go online. That's why you can do something about the person more than the moon were encourages that Catholic levels of the U.S. Geological Handburg. But a colleague is about amerscare structures. They probably be able to only use much of the technical cortain-partace group. "We have a mere of earthquake and main the smoke -- which could be a health point and information to the computer behaviors of what he was now"

“In general, I think these big blocks of text don’t show the RNNs at their best—because, again, they have no sense of the larger structure of a piece of writing,” Sloan said. “It's the shorter ‘completions’ that are more compelling.” (You can see some of the autocompletes from Sloan’s sci-fi experiment on his website.)

And yet, there are still some delightful surprises in the text Robot Adrienne produced. I liked how the moon kept coming up, for instance, and I found myself lingering on made-up words like “somethative,” “macketing” and “replored,” the last of which sounds vaguely French. (“As you can see, the model trained on ~3MB of text is still making a lot of spelling errors, and taking stabs at words like ‘technologication’ which I think is pretty cute,” Sloan wrote in an email after the first round of experimentation.)

In later rounds, using the larger dataset, Robot Me began to include quotes in its work, which is pretty good in terms of journalistic mimicry! Quotes like this one: 

“I was recorded in the world’s book,” said Marty McBring, a historian of Security Trust and Army CH for earlier his research in #.

And this one: 

“The world is long and browse discourse,” George We said. “It was also a significant model that makes it interesting.”

Even as the field of machine learning blossoms, language generation hasn’t fundamentally changed in recent decades. Jaime Carbonell, the director of the Language Technologies Institute at Carnegie Mellon University, told me that one of the first projects he worked on in the 1970s had to do with building a language generator. 

“It’s evolved, and it’s improved, but there hasn’t been what I would call a revolution in this particular field,” Carbonell said. “What the machines can do is express weather forecasts based on meteorological data, or a baseball narrative based on the scoresheets—pretty good writing that is almost indistinguishable from human writing when reporting facts in a formulaic style.”

“But the question that doesn’t have an answer,” he added, “Can a machine acquire general intelligence? So far, the answer has been, ‘no.’ There are things like writing fiction or writing poetry where it’s not clear what it means for a computer to be able to do it, since the machine can not directly experience the emotions you’re trying to convey. It’s difficult to fathom how you could generate genuine creative writing in that sense. There are categories of activities in which it doesn’t really make sense to train the computer to do it. You would just get some ersatz version of a human.”

My favorite line from one of Robot Adrienne’s stories is, I think, this one: “Of course, this is not a human work.” 

Not exactly. Enabling a computer to teach itself to write, even once the machine gets the hang of it, actually is human work—even if human jobs are made obsolete as a result.

So far, the ersatz version of me has quite a bit to learn.