The latest and most ambitious attempt to turn literary taste into an algorithm
Recommending books is an art, replete with mysteries and moments of inexplicable grace. When I wrote about the topic last year, John Warner -- sometime "Biblioracle" at the website the Morning News -- reminisced happily about the time he "went out on a limb and recommended 'Gravity's Rainbow,' and the person said it 'changed my life.'"
The occasional triumph (and perhaps only a fellow recommender will appreciate just how sweet such instances can be) are inevitably balanced out by mortifying failures. Though it was over a decade ago, I'll never forget the time a friend chewed me out for suggesting she read Louise Erdrich's "The Beet Queen." It seemed the perfect choice after I'd ruminated on all the other novels she said she'd liked, but she complained that Erdrich's women characters were all "victims" who refused to do anything to improve their lot.
Can a task this ineffable be automated? Many seem to think so. For years, Amazon and other e-booksellers have offered their customers suggestions based on the purchases of other customers who bought the same books. But you don't necessarily read every book you buy from an online retailer (some are gifts) and you probably don't like every book you read, either. For that matter, the books you like best you may have bought elsewhere, or borrowed from a friend.
The latest and most ambitious attempt to automate book recommendations is the website BookLamp.org, launched last week. It's the public point of interaction with something called the Book Genome Project, an effort to "identify, track, measure, and study the multitude of features that make up a book using computational tools." BookLamp's oft-cited model is Pandora.com, a music-discovery service that allows users to create and listen to playlists based on a single song, artist or genre. Founder Aaron Stanton also cites OKTrends, a blog that crunches and analyzes data extracted from the OKCupid dating website, as an inspiration.
Stanton and company (students at the University of Idaho when the project began in 2003 and academics from several institutions still figure among their researchers and programmers) have fed the text of some 10,000 books into a custom-built software program. The program then identified certain recurring elements and clusters of elements. The designers in turn trained it to recognize those elements and determine how much of each one can be found in a given book.
If this sounds confusing, it is, a bit. On the phone, Stanton explained to me that he and his confederates feel that more traditional labels applied to books -- the genre classifications imposed by publishers and the categories of the Library of Congress' cataloging system -- are inadequate. BookLamp can not only provide a precise accounting of such literary qualities as "dialog," "density," "description" and "motion," it can also measure how much of the book involves "Sea Voyages" or "Pregnancy/ Motherhood/ Infant Care" or, for that matter, "Rocky & Dry Terrain/ Canyons." These metrics are included with 132 other "thematic ingredients" under a broad category the project's designers call "StoryDNA."
Such elements "aren't necessarily what the book's about, but they're present," Stanton told me. He added that although he would never say that Stephen King's "The Stand" is about "Vehicles/ Rural Travel/ Country Roads," nevertheless "you couldn't tell the story of 'The Stand' without referring to those things."
How prevalent such elements are influences a reader's experience. "A book with 10 percent vampires is a very different book than a book with 80 percent vampires," Stanton observed. BookLamp searches its database for titles with similar levels. (The idea that a book could be 10 percent or 80 percent vampires was difficult to wrap my mind around until I recalled the way a friend would complain whenever an episode of "Buffy the Vampire" suffered from "not enough vampires.")
To use a familiar book as an example, Steig Larsson's "The Girl With the Dragon Tattoo" comes up in the BookLamp database with high quantities of "Newspaper Reporting/ Journalism" and "Criminal Investigation/ Detective Work." Very true. But why does it also show significant levels of "Extended Family/ Cousins & Relatives" and the rather cryptic metric "Scheduling/ Elements of Time"?
"We make no claims to rightness," Stanton said, explaining that BookLamp is very much a work in progress and that one of the project's first priorities is to build a much larger database of analyzed books. (This will require the assistance of publishers.) "There are times when the system comes back with something I'd never thought it was capable of" -- such as identifying the works of Stephen King as being very similar to the works of Richard Bachman, a King pseudonym -- "and then there are times when I'm looking at it and thinking this suggestion doesn't make much sense."
Why not rely on the far more sophisticated if also unfathomable judgment of actual human beings, the same intuitive power that enabled John Warner to change a stranger's life by suggesting "Gravity's Rainbow"? Even the mechanized, crowd-sourced versions of such recommendations -- whether provided by Amazon or such social networking sites as GoodReads and LibraryThing -- have aided readers.
But, as Stanton points out, social networks can only tell you about books that other readers themselves already know about and have taken the trouble to read and review. They're heavily weighted toward books that are well-known and successful, as well as toward more recent titles. "Right now," Stanton says, "the world is not good at finding midlist or old books or books by first-time authors -- a huge portion of the pyramid of books out there." The Book Genome Project doesn't care about a book's critical reputation or sales history; all it wants to do is tally up exactly how much "Docks & Warehouses" or "Explicit Descriptions of Physical Intimacy" can be found between its covers. (Or both -- paging Edmund White!)
While some of the recommendations I elicited from BookLamp made sense (people who like Dickens do tend to like Wilkie Collins, too), others left me perplexed. Plugging in one of my favorite titles from last year, Gary Shteyngart's "Super Sad True Love Story" -- a satire set in New York City in the near future about the doomed affair between a middle-aged Russian immigrant and a much-younger Korean-American woman and ranking high in "Partying/ Deviance" (Shteyngart would be so proud!) -- I got back Robert Silverberg's "The Book of Skulls" -- apparently a thriller about four college buddies seeking an immortality-conferring tome guarded by a "mystical brotherhood" in the Arizona desert. Huh? (To be fair, Lara Vapnyar's "Memoirs of a Muse," just a little further down the list, is a perfect fit.)
Mostly, though, I appreciate how BookLamp persuades me to look at the books I like in a different light. Ask me why I love Haruki Murakami's fiction, and I probably wouldn't begin by praising his depictions of "Suburban Living/ Neighborhoods," but come to think of it, that's one of the aspects of "The Wind-Up Bird Chronicle" I remember most vividly and with particular pleasure. I don't recall an abundance of "Fashion/ Clothing/ Accessories" in the same book, but the next time I reread it, I'll be keeping my eyes peeled.