In the search engine comparison project, many people observed a qualitative difference between the resources listed in scholarly research databases and those returned by web search engines. Not only did these research tools tend to return different numbers of results (with web searches returning far more results), but they also tended to return different kinds of material. These outcomes are not surprising, but it is worth considering the nature of these differences so that we may understand how each tool can be used most effectively. Let's consider differences in two areas: information domain and genre.
Most research databases compile resources from a narrow and well-defined band of information domains. By "information domain" I mean something like "topical area" or "discipline" or "kind of content." The information domain of most scholarly databases is limited to these:
Within this range, the information domains of individual databases is often narrowed even further:
The point is that each databases narrowly specifies its information domain and uses some method of ensuring that material from other domains is excluded. In the case of scholarly article databases, for instance, only journals featuring relevant topics are indexed; further, only journals using a "peer review" process are included.
In scholarly database searches, it is up to the user to select the databases that are most likely to cover the desired information domains. This process can often be challenging, especially when a user's interests cross information domains.
For instance, if I am interested in understanding how nineteenth century scientific views of race affected the emergence of technical genres in the natural sciences (which, coincidentally, I am), then I may need to search in a variety of information domains: research on biological science, scientific communication, technical communication, genre theory, racial theory, and even literature may be appropriate, because scholars in any of these areas might have intersected some portion of my research interest. The fact that I may plan to contribute to the information domain of "technical communication" doesn't necessarily mean that this information domain is the only (or the best) one to explore.
Web search engines are far more likely to find material in domains other than those listed above, including corporate, governmental, public, and personal domains. Furthermore, most Internet search engines do not distinguish among information domains. The search engine does not actively identify the subject matter of the web pages that it searches-if it's on the Internet (or, more properly, if the search engine is able to access the page by whatever method it uses-crawler, spider, submission database, etc.), it will be searched. As a result, an Internet search is likely to return material in a much wider spectrum of information domains.
Internet directories (like the Yahoo directory) attempt to sort web pages into information domains based on topic.
For practical purposes, "genre" refers to the type or purpose of a text (or some combination of type or purpose)-roughly, the kind of communicative work that a text is intended to accomplish. Sonnets, novels, and classic rock songs are all examples of genres; so are instructions, shopping lists, obituaries, scientific research articles, and personal homepages. Most scholarly research databases contain material in one of the following genres:
There are two important characteristics of all of these genres. First, each genre has a built-in mechanism to ensure the authenticity of its content. The genres are, in a sense, "regulated" in a way that many articles are not. Articles in scholarly journals are regulated through a formal "peer review process"-no article is published in a peer-reviewed journal unless a panel of anonymous reviewers recommends its publication. Scientific reports undergo a similar process. Likewise, newspaper articles base their credibility upon the reputation of the newspaper and the judgment of its editor, and court rulings are written only by judges and justices in the closely regulated judicial system.
This authentication process does not, of course, ensure the correctness of the texts in question. Peer reviewers, for instance, may agree that a scholarly article is worth publishing even if they strongly disagree with its conclusions. In fact, journal articles would typically not be published if there was no controversy surrounding their claims-after all, the point of research is to explore conflict, not merely to state unproblematic facts.
Now consider the genres includes in web searches. Web searches not only cover a much larger number of genres, they also include genres with widely divergent standards of authenticity. The range of web genres that might be returned in a web search includes:
Some of these genres are highly regulated (e.g., organizational policy statements); others are less well-regulated (e.g., political manifestoes); still others may not even be designed for public consumption (e.g., personal emails that are accidentally posted to a discussion list). Whereas the basic purpose of a research article is clear (to contribute new knowledge or to test or challenge old knowledge within a specified research domain), the purpose of web items must be determined by users, case by case. Likewise, in each case, users must decide whether the item is authentic, reliable, relevant, and correct.
Is such material valuable? It depends what you mean to do with it. Even an authentic, reliable, relevant, and correct policy statement is a qualitatively different thing from a scholarly research article. As such, they may both be useful, but they will probably be useful in different ways. Consider the following items:
Both texts are probably grounded in careful research; both may be relevant to someone's research on multimedia copyright standards. The first, however, provides a different type of material from the second. As a research article, it deliberately participates in an ongoing "conversation" within a scholarly information domain-possibly the same domain in which the searcher wishes to participate. One would consider this material in many of the same ways in which we examined research articles in project 1.
The second item, a web tutorial, may be just as useful as the research article. But it is less likely to have arisen from an ongoing conversation in a scholarly information domain. It isn't trying to participate in such a conversation-instead, it is trying to guide users' practices, or to provide policy advice to web developers, or to do some other fairly practical thing. As a researcher, you would probably use this resource as you would use the results of primary research--as a source of evidence about practices or policy rather than as a statement in an unfolding research conversation.
End of Page
Last Modified: