US: Controversy dogs Google's book-search library project

Google continues to encounter bumps in its Book Search Library Project

Taipei Times
Sunday, November 27, 2005

New York --- Twenty years ago, when Sidney Verba became director of the Harvard University Library, he thought there was a good chance he would enjoy a placid transition into retirement.

Placid is not the word Verba would use to describe his life now.

"Challenging" or "exciting" would better fit the bill, he said, choosing his words carefully.

Verba is overseeing the university's partnership with Google, which plans to create searchable digital copies of entire collections -- tens of millions of books -- at five leading research libraries.

The partnership is part of the controversial Google Book Search Library Project, which has provoked lawsuits by publishers and writers' groups that accuse Google of violating copyrights by scanning the books into Google's search database without the permission of the copyright holders.

The University of Michigan, Stanford University, the New York Public Library and Oxford University have also signed on with the Google project, which expects to scan 15 million books from the libraries.
 
For Verba, the decision to support Google's plan was not easy or obvious. He has a unique perspective on the legal and intellectual debate because his various professional roles connect him to every aspect of the creation and use of books.

"It's been dominating my life for the last year and a half," said Verba, a prominent political scientist who has been a professor at Harvard for more than 30 years. Even now, he is cautious about the implications of the ambitious project.

Security

Until two years ago, the congenial and energetic Verba was chairman of the board of Harvard University Press. And in that position, he witnessed mounting anxiety about the future of publishing, especially with the advent of digital texts.

"Scanning the whole text makes publishers very nervous," he said. "I have sympathy with that. They have to be assured there will be security, that no one will hack in and steal contents, or sell it to someone."

And as the author or co-author of 18 books, he understands the worry that Google's digitization project might cause writers to lose income and control of their work. Many of his own books are still in print.

But as a librarian and a teacher, he contends that the digital project will meet the needs of students who gravitate to the Internet -- and Google in particular -- to conduct their research. And he says he believes the project will aid the library's broader mission to preserve academic material and make it accessible to the world.

He was taken aback when Google was sued, first in September by a group of authors, then last month by five major publishers.

"It's become much more controversial than I would have expected," Verba said. "I was surprised by the vehemence."

For the time being, Harvard has confined the scanning of its collections largely to books in the public domain and limited the initial scanning to about 40,000 volumes. But it hopes eventually to scan copyrighted books as well, depending on the outcome of the legal dispute.

"The thing that consoles me," Verba said, "is Google's notion of showing only the snippets, which have everything to do with what's in the book, but nothing to do with reading the book."

Google's search of copyrighted works in the library collections allows users to see a limited amount of text surrounding the relevant search term.

But to make those snippets freely available on the Web, the books must be scanned in their entirety into Google's database to create a searchable index, which the lawsuits claim violates the fair use provision of copyright law.

Verba says he believes that showing small excerpts helps direct readers to books they would not know about otherwise, and could help spur sales.

Author's Choice

Patricia Schroeder, the former Colorado congresswoman who is president and chief executive of the Association of American Publishers, which is suing Google on behalf of the five publishers, has a far less sanguine view.

"Look, people should be able to search all this stuff, but it should be the author's choice and not Google's," Schroeder said. "You can't have a corporation just come in and say, `We're going to do this and it's good for you."'

But as an educator, Verba has watched his students shun libraries in favor of search engines and other electronic resources. In his courses, Verba has cast a skeptical eye on student papers thick with URLs in the bibliography.

"Everyone with a teenage kid is worried that the younger generation may believe that all knowledge is on Google," said Verba, who said he nagged his own students to use library books.

"But what this does," he said, referring to the Google project, "is take you to Google, which takes you to the library."

Yet when Sheryl Sandberg, a Google executive, first visited Harvard two years ago and put forth the idea of digitizing millions of books spread out over Harvard's more than 90 libraries, Verba was skeptical. The sheer magnitude of the task seemed staggering.

A Millennium

James Hilton, the interim university librarian at the University of Michigan, for example, said that he asked his staff a year ago to estimate how long it would take to digitize the library's 7 million volumes.

The answer was more than a 1,000 years.

Then Google came along and offered hope that the project could be done within a decade.

"We are among the most aggressive of libraries doing their own digitizing," Hilton said. "Google thinks they'll be able to do it in six."

As for Harvard's own back-of-the-envelope calculations, "it would be incredibly expensive beyond anything we could imagine funding," Verba said. "I didn't think it could be done by anyone, including Google."

Vulnerability

One of his main concerns was the physical vulnerability of some of the older volumes. As custodian of his institution's materials, he worried that the physical handling of the books could damage them.

But he said he was impressed by Google's technical competence and the ambitious scope of the project. Still, he wanted to see more details, especially about the protection of the books theMs.elves. He told Google to come back after it had worked out those fine points.

Google did return, some nine months later, details in hand.

"It was clear they had done their homework," said Verba, who was careful not to talk about parts of the project that fall under a nondisclosure agreement.

"They had designed a very efficient means of doing the digitization, in a nondamaging, cost-efficient way. And they were willing to invest a large amount of money," he said.

Although Google will not disclose its investment, outsiders have speculated that the company is spending more than US$200 million on the entire project.

Google has also cultivated an aura of mystery around its proprietary book-scanning technology.

Susan Wojcicki, Google's vice president for product management, who is overseeing the Google Book Search project, said the company had built its own scanners, which capture the image of the page using optical character recognition technology.

The scanning for Harvard's collection, she said, is taking place at Harvard's book depositories.

Some Google watchers think the company has developed an advanced page-turning scanning technology, while others think Google's scanners are more conventional, having workers turn the pages at hundreds of scanners.

Crain's Detroit Business reported last month that Google had leased a 3,720m2 warehouse in Ann Arbor to digitize the University of Michigan books.

Nathan Tyler, a Google spokesman, said Google was looking to expand its scanning facilities for Michigan, but did not have anything to announce.

"It's a fascinating time, and very confusing," Verba said of the copyright controversy.

"And if you ask me if I have a clear view of fair use, the answer is `no.' It's all up in the air," he added.

Another concern for the plaintiffs in the lawsuits is the second digital copy that Google gives to the libraries as part of each agreement.

But Verba said that those second copies will be used only for archiving and preservation, in keeping with a research library's charter.

"We think and hope it is legally the appropriate approach," Verba said of the Google project. "But we're taking it day by day."