More on Book Search PDF Downloads
So questions remain. Can Google legitimately restrict access to public domain works? And is it proper for Google to brand these copies and require that others not mess with it?
Search engines and copyright issues.
Why the change? Well, one factor was all the comments we got about how excited people were that Google Print would help them print out their documents, or web pages they visit -- which of course it won't.So the question is why the change back? The change to Book Search probably had more to do with avoiding a lawsuit from the likes of the Author's Guild (too late) and in strengthening its fair use argument (Book Search certainly sounds more "fair use-y" than Print). If I were better at reading the tea leaves, I would like to say that this suggests Google is feeling good/bad about the prospects of winning in court against the Author's Guild, but I'm not.
Boss: Do we have Google installed on our internet?
IT guy: We put it on your machine yesterday.
Boss: How many servers do you think Google has?
Lackey: Infinite.
Boss: Infinite? You’re a retard.
Producer: My friend went to the Galapagos Islands and was astounded. They have birds called blue boobies. Google “blue boobies”. You’ll see pictures of them.
Suit: I’m not searching for blue boobies on my computer. I’ll get called into the office for a talk.
Producer: Oh, I’ll do it...see?
Suit: Wow, who would have thought that would’t have brought up a porn site?
Engineer: I’m against Google Earth! The terrorists are using it! And the communists!
Appropriate: He ego-surfs on the Google search engine to see if he's listed in the results.Inappropriate: He googles himself.
Appropriate: I ran a Google search to check out that guy from the party.
Inappropriate: I googled that hottie.
That's right, Google used the word "hottie" in an official legal warning letter.
However, like I said above, the Google strategy does not seem to deal with the fact that if the OED had not accepted Google's use as a verb, it might not be as widely used. We can quibble over whether the word is regular usage before or after the OED condones its use, but once it does, it's official. And don't forget that a court will look at the dictionary for an ordinary meaning - though I wonder if the OED counts, it being a foreign source of law afterall. Suing the OED would be decidly evil, but if there are no qualms about suing a web site like Word Spy, then I can't wait for someone to sue the OED (or Webster) for inducing others to genericize a trademark.Digitizing all the world's books "was an idea of Sergey and Larry's from very early on," Wojcicki says. In fact, they were supposed to be working on a small library digitization project "when they wound up creating a search engine, which today we know as Google."And there's also this, which is a good summary of the argument that if BookSearch is illegal, then all Internet search is likely legal as well:
I suppose I'll start feeling bad about the publishers having to opt out of Google's project once they start shrinkwrapping every book so I can't flip it open and see if it deals with what I was looking for in a bookstore. "Books" certainly are not free; they cost money because they exist as physical objects. But ideas contained therein are a different matter. And so with the words themselves. Should a publisher or author be compensated for the fact that the word "greed" appears on page 21 of a book? Should that same person be compensated before allowing Google to share that information with me?· The Web search analogy: This gets a bit complicated, but it's crucial to understanding the dispute over Google's library scanning. Wojcicki, Smith, Gerber and Google attorney Alexander Macgillivray -- whom Smith calls "our thought leader" on intellectual property issues -- all insist that there's very little difference between the basic functioning of their Web search engine and Book Search.
The comparison goes like this:
To index the Web, Google first sends out software programs called "crawlers" that explore the online universe, link by link, making copies of every site they find -- just as Book Search makes a digital copy of every book it can lay its hands on. Web sites are protected by copyright, so if you don't want your site indexed by Google and its search brethren, you can "opt out," usually by employing a nifty technological watchdog (a file called robots.txt) that tells search engines to bug off.
Ditto for books, Google argues: Publishers and authors can opt out by informing Google that they don't want their books scanned and made searchable.
The analogy carries a risk for Google. Former Wired editor Kevin Kelly, one of the most influential journalists covering the digital revolution, sums it up this way: "If they capitulate on this with the publishers, they jeopardize their entire ability to search the Web."
Google executives don't sound worried. "No judge is going to rule that Web search is illegal," Macgillivray says. Still, they're on the horns of a dilemma. To use the Web analogy in court is on some level to bet the company, however favorable the odds.
No need to fret, say the publishers: The analogy fails in any case.
Most Web sites, they point out, are designed to be free. Books are not. As for the "opt out" requirement, as one high-ranking publishing executive explains it -- he doesn't want to be named; odds are he'll be dealing with Google in the future -- publishing houses have already installed a perfectly good, low-tech version of robots.txt.
"It's called a price," he says.
In arguing for the importance of search engines as enablers of speech it is necessary to discuss the history of the Internet. From the Internet’s very beginnings, the emphasis has been on organizing the wealth of human knowledge into a useful format that could be comprehended in order to serve a larger public interest.
One of the first calls for an information organizing system resembling the Internet came from Vannevar Bush. His Atlantic Monthly article of July 1945 entitled As We May Think[2] called attention to the problem of an ever growing amount of scientific research without a useful way of using it. He believed that for new research to be useful, it needed to be recorded, continually added to, and continually consulted. He further noted the inefficiencies of indexes at that time:
Our ineptitude in getting at the record is largely caused by the artificiality of the systems of indexing. ... Having found one item, moreover, one has to emerge from the system and re-enter on a new path.
The human mind does not work this way. It operates by association. ... Man cannot hope fully to duplicate this mental process artificially, but he certainly ought to be able to learn from it. In minor ways he may even improve, for his records have relative permanency.
Presumably man's spirit should be elevated if he can better review his own shady past and analyze more completely and objectively his present problems. He has built a civilization so complex that he needs to mechanize his records more fully if he is to push his experiment to its logical conclusion and not merely become bogged down part way there by overtaxing his limited memory.
Bush introduced the concept of the memex, a micro-film device that could be used to store all a person’s books and papers and could be consulted when needed.[3] Though the memex was ultimately a flawed system, Bush’s approach to organizing information led to the creation of hypertext, a system of automated cross-references that connect text in one document to another related document.[4]
The precursor to the Internet was the Advanced Research Projects Agency Network (ARPANET), which was developed by the US Department of Defense and went online October 29, 1969.[5] The first ARPANET link only connected a computer from UCLA to one from Stanford, growing to 231 by 1981. Though it is claimed that ARPANET was initially funded out of a desire to preserve a military command structure in the event of a nuclear attack, ARPANET Director Charles Herzfeld has said that the ARPANET came out of frustration of there being only a few large, powerful research computers available in the country and the interest in giving researchers access to them despite geographical limits.[6] ARPANET provided the technological basis for “packet switching,” the process that powers the network underlying the Internet today.[7]
The Internet progressed slowly from 1969 to 1990 as new computers trickled onto the network. During this time there was little way to find a particular document on the network without knowing exactly where it was or by finding it through word of mouth. The method by which files were shared is called the File Transfer Protocol[8] (FTP), which requires one user to upload a file onto a server, making it available to be read by anyone who finds it. During this time many people uploaded their own files to their own servers and this resulted in a system where information was scattered across servers. Because there was no central clearing house for information, it was rather difficult to find what one was searching for since there was no way to search across all the servers.
Archie changed all that in 1990 when it became the first search engine.[9] Archie was created by Alan Emtage, a student at
Several other search engines popped up after Archie. Gopher[11] was created by the
It was during this time that the first modern website was created by Tim Berners-Lee in 1991, while working at the European Organization Nuclear Research (CERN).[14] Berners-Lee is credited with creating what we know as the World Wide Web. The Web is a computer service that functions on top of the Internet. In this sense the Web is a system that rests on top of the ARPANET infrastructure, making the networks more navigable. By utilizing hypertext and links combined with web pages viewable through web browsers, people were now able to create graphical web pages rather than mere directories of files and could link to other documents without the permission of the host.[15] In 1993, after Gopher decided to charge users for access to their database, CERN declared that the Web would be free for anyone to use. Originally concerned with making his research available to a larger audience, the decision to make the Web free was made to ensure that its use would become widespread.[16]
The amount of content available began to steadily increase with the creation of the Web. The World Wide Web Wanderer[17] was created in 1993 as a way to measure the growth of the Internet with the creation of the world’s first Internet robot called a spider. A robot is an automated program that explores the Internet looking for web pages and the term spider was chosen to fit with the metaphor of a web of pages that were connected by links. The spider travels through the Internet looking for web pages and sends copies of the pages back to its server as well as recording information about all the links contained therein. The information gathered by the Wanderer was used to create the first search engine called Wandex in 1993.[18] Wandex itself was controversial in that its robots consumed a lot of bandwidth while fulfilling their tasks, slowing down the entire Internet.
Based on Wandex’s success, Archie Like Indexing for the Web (Aliweb) was introduced late in 1993.[19] Unlike Wandex, Aliweb relied on users to submit their sites to the index rather than using robots that would slow down the Web. This allowed users to customize how their site would be displayed to users searching for information. Aliweb is the first “modern” search engine in that it was more than an index or directory of pages or documents located on servers.[20] The information that users supplied about their site, the first use of “meta-data”[21] used to classify the contents of web pages, was searched by the use of key words that would respond with results that matched the original query. It was about this time that web search was seen as a profitable business because of the growth of content and increased Internet use beyond academic research, resulting in an explosion of search engines and their functions. The first big innovation came from WebCrawler in 1994 with the first search engine that performed full-text searches rather than only searches of meta-data or page and document titles.[22] Later in 1994, Lycos introduced the first ranked relevance system for text searches, allowing the engine to return better matches by analyzing the full-text of pages and documents.[23] Generally speaking, Lycos was also the search engine to be commercially successful as a business.[24] Then came AltaVista in 1995 that both provided faster service and allowed users to search all of the sites that linked to a particular URL. [25] AltaVista also assisted users in their search by offering a “tips” option that assisted in formulating an effective search and was the first engine to allow “natural language” searches as opposed to Boolean[26] searches. HotBot was created by Inktomi Corporation in 1996 and further increased the speed by which results were returned, helping to increase its popularity. It was also the first search engine to use “cookies” to track user behavior and to customize a user’s experience.[27] HotBot used a computerized system to analyze links, traffic, and other factors to determine a site’s place in its search results.[28]
While the mid-1990s saw an increase in the number of search engines and innovation, such did not necessarily equate to better search results. That each search engine used its own unique formula for searching the web meant that results varied from one engine to the next. Further, the criteria for judging relevancy of any given result was not very good, leading to many false positives for web searches. Based on the technology, there was almost too much information for the search engines to it useful.
Google was founded in 1998 and is currently the number one search engine and the second most visited web site on the Internet with about 275,000 daily searches.[29] Google has set itself apart through its PageRank[30] system for returning relevant search results. PageRank relies on the “uniquely democratic nature of the web” by counting inbound links to a site as a “vote.” A page is deemed popular if it has many votes. Thus, by tabulating the number of votes a web page has and weighing those votes by the popularity of the voter, Google is able to return the most relevant results from the popular sites as determined by web users. The theory is that individuals are good at deciding whether a page is relevant or not and that people will link to sites that provide useful information. Google is then able to sort through the large number of web rages by harnessing the people power of web users, who themselves have determined what pages are relevant.
In describing the history of search engines, from Vannevar Bush’s vision of the memex to Google, the theme has been making sense of the information available to mankind. Information needs to be made useful for it to used. Every technology has struggled to give people access to what it is they are looking for and while Google is the best today, if history is a guide, a new search engine will arise that makes search even more useful.
Another theme is the slow commercialization of search technology and the Internet as a whole. What started out as a government project to facilitate scientific research has turned into the free and public Internet. What started out as ways for academics to find papers and other documents from far away universities has turned into a lucrative business in Internet search. While commerce may have changed how search engines are developed and implemented, it has not changed the underlying purpose of connecting people with the information they seek. The market that has been created with the advent of Internet search is a market of finding answers to questions and no matter what else changes a search engine that fails to produce search results will fail to produce financial results.
One question that remains relevant is the causality dilemma that exists with the development of search technology. Does better search technology help spur the creation of more content, or does the existence of more content force search technology to improve? What can be teased out is from the history of search engines is that access to the Internet, coupled with the ability for the public to create content, appears to be the driving force in the growth of all content. When the Internet was largely an academic affair with few users it grew slowly. Once the Web was introduced the number of pages skyrocketed. Today the amount of new content online is accelerating due in large part by the popularity of web logs (blogs) that empower any person to publish their thoughts online with a few clicks of the mouse. Better search engines allow these new publishers to find information to link to more easily, giving them source material to work with, and allowing any Joe Public with an Internet connection to have as much research power at his fingertips as any researcher at a major university. It is possible that people would be creating as much content without effective search technology – life off-line is interesting enough to keep a writer busy for a lifetime – but the kind of progress envisioned by Bush, real progress based on the findings of others, would certainly suffer. The kind of creation Bush was concerned with has been characterized as the process of “glomming on”[31] as in the process of appropriating information from other sources and using it as the building blocks for innovation and commentary. It is this kind of thinking about information that led to the creation of the Internet, and with it the creation of search engines, making the two inseparable and dependant on the other for their utility.
Whether or not the creative process of “glomming on” will be allowed to flourish is something that will be decided in the courts over the next few years, barring Congressional action. Google understands that its relevance depends in large part on the answers it provides to those seeking information. However, Google’s ability to serve that information on request is being challenged in a number of areas.
[1] For an overview of search engine history, see History of Search Engines and Web History by Aaron Wall at http://www.search-marketing.info/search-engine-history/, A Brief History of Search Engines by Lee Underwood at http://www.webreference.com/authoring/search_history/, A History of Search Engines by Wes Sonnenreich at http://www.wiley.com/legacy/compbooks/sonnenreich/history.html, and Wikipedia entry for “search engine” at http://en.wikipedia.org/wiki/Search_engine.
[2] Vannevar Bush, As We May Think, The Atlantic Monthly, July 1945, Volume 176, No.1; 101-108. Available at http://www.theatlantic.com/doc/194507/bush
[3] Bush, Id
[4] See Wikipedia entry for “hypertext” available at http://en.wikipedia.org/wiki/Hypertext (last checked 5.20.2006). Systems of what can be called hypertext made up the early “Internet” in the 1970s and 1980s. The introduction of the World Wide Web in 1990 by Tim Berners-Lee incorporated hypertext as a way to help researchers connect with each other from all over the world. The commonly known “link” is composed of hypertext, the text that serves as a marker, and the hyperlink, the actual connection between two documents.
[5] See Wikipedia entry for “arpanet” available at http://en.wikipedia.org/wiki/Arpanet (last checked 5.20.2006).
[6] Arpanet, id
[7] See Wikipedia entry for “packet switching” available at http://en.wikipedia.org/wiki/Packet_switching (last checked on 5.20.2006). Packet switching increases the speed of communication as each message is broken down into numerous packets and each is sent across the network, each finding the fastest way to its target where the packets are reassembled and displayed for the user.
[8] See Wikipedia entry for “file transfer protocol” available at http://en.wikipedia.org/wiki/FTP_server (last checked 5.20.2006).
[9] See William Slawski, Just what was the first search engine? SEO by the Sea, 2.5.2006. Available at http://www.seobythesea.com/?p=106 (Last checked 5.20.2006).
[10] See Wikipedia for “archie search engine” available at http://en.wikipedia.org/wiki/Archie_search_engine (Last checked 5.20.2006).
[11] See Wikipedia for “gopher protocol” available at http://en.wikipedia.org/wiki/Gopher_protocol (Last checked 5.20.2006). Gopher was chosen as the name because (1) users instructed the system to “go for” information, (2) the menus were analogous to gopher holes, and (3) the
[12] See Wikipedia for “veronica (computer)” available at http://en.wikipedia.org/wiki/Veronica_%28computer%29 (Last checked 5.20.2006). Veronica stands for Very Easy Rodent-Oriented Net-wide Index to Computer Archives and was most likely chosen as the name to fit in with Archie, based on the Archie comic books.
[13] See Wikipedia for “jughead (computer)” available at http://en.wikipedia.org/wiki/Jughead_%28computer%29 (Last checked 5.20.2006). Jughead stands for Jonzy's Universal Gopher Hierarchy Excavation And Display and was also probably chosen to fit with the Archie Comics theme.
[14] See Wikipedia for “Tim Berners-Lee” available at http://en.wikipedia.org/wiki/Tim_Berners-Lee (last checked 5.20.2006).
[15] See Wikipedia for “world wide web” under “origins” available at http://en.wikipedia.org/wiki/World_wide_web#Origins (last checked 5.20.2006).
[16] See Wikipedia for “history of the internet” under “a world library” available at http://en.wikipedia.org/wiki/History_of_the_Internet#A_world_library.E2.80.94From_gopher_to_the_WWW (last checked 5.20.2006).
[17] See Wikipedia for “web crawler” available at http://en.wikipedia.org/wiki/Web_crawler (last checked 5.20.2006).
[18] See Wikipedia for “search engine” under “history” available at http://en.wikipedia.org/wiki/Search_engine (last checked 5.20.2006).
[19] See Wikipedia for “aliweb” available at http://en.wikipedia.org/wiki/Aliweb (last checked 5.20.2006).
[20] See Historical Web Services available at http://www.greenhills.co.uk/mak/historical.html (last checked 5.20.2006).
[21] See Wikipedia for “meta data” available at http://en.wikipedia.org/wiki/Meta_data (last checked 5.20.2006). Meta data, literally “data about data” from Greek, is “structured, encoded data that describes characteristics of information-bearing entities to aid in the identification, discovery, assessment, and management of the described entities" (Committee on Cataloging Task Force on metadata Summary Report, http://www.libraries.psu.edu/tas/jca/ccda/tf-meta3.html, 1999). It functions on a web page much like a library card catalog card, providing general information about the source being sought and is in itself searchable. The manipulation of meta data can help ensure that a given site appears as a search result when a query matching a meta data term is entered.
[22] See WebCrawler Facts at http://www.thinkpink.com/bp/WebCrawler/History.html (last checked 5.20.2006).
[23] See Sonnenreich, supra, at Mellon-Mania: The Birth of Lycos.
[24] See Wikipedia entry for “history of the internet” at Finding What You Need, available at http://en.wikipedia.org/wiki/History_of_the_Internet#Finding_what_you_need.E2.80.94The_search_engine (last checked 5.20.2006).
[25] See Sonnenreich, supra, at Return of the DEC.
[26] See Wikipedia entry for “Boolean search” available at http://en.wikipedia.org/wiki/Boolean_search (last checked 5.20.2006). Named after the English mathematician George Boole who defined an alegrbraic system of logic in the 1800s. Boolean searches require the use of the “and,” “or,” and “not” operators (among others) in helping the system to understand the order and importance of certain search queries. Natural language search does not require these operators, hence the name.
[27] See Sonnenreich, supra, at A Spider Named “Slurp!”: The Powerful HotBot. HotBot was subsequently purchased by Yahoo! In 2002 and is the engine that produces its search results.
[28] See Underwood, Supra, at Enter the Accountants.
[29] See Alexa at www.alexa.com. These are the Google stats on 5.20.2006 available at http://www.alexa.com/data/details/traffic_details?&compare_sites=&y=r&q=&size=medium&range=&url=http://www.google.com.
[30] Information on Google PageRank is available on Google’s website at http://www.google.com/technology/ (last checked 5.20.2006).
[31] See Jack M. Balkin, Digital Speech and Democratic Culture: A Theory of Freedom of Expression for the Information Society, 79 N.Y.U. L. Rev. 1 at 10 (2004). In discussing mass media’s effect on the Internet, Balkin argues that mass media acts as a bottleneck on the dissemination of ideas because of its monopoly on content relevant to the public. “Glomming on” represents the essence of Internet speech and is built into the Net’s architecture through linking, fundamentally changing speech with the ability to link to primary source proof to show the validity of one’s argument.
"It's part of their absolutist approach," said Joshua Kaufman, an attorney representing Agence France-Presse in the wire service's copyright dispute with Google. "I think they're afraid that if they give an inch, it becomes a slippery slope. It's all or nothing."Again, the crux of the debate left out of the fair use discussion is the added value that Google bestows on content. Without Google, or similar search engines, it becomes significantly more difficult to get traffic to get clicks to get advertising dollars or to find other ways to monetize content.
...
[Jessica] Litman said, however, that these cases are "high stakes" for the company. "If Google is wrong about fair use, it probably has to go out of business," she said.
...
"In one sense it's not surprising because I'm sure Google would like to have everything out there in the public domain--except its own indexes and search results--because that would make its life a lot easier. But it doesn't work that way," said [Russell] Frackman, who also was lead counsel in the lawsuit that shut down the original Napster.