Wednesday, February 19, 2014

On the importance of subject description

In my last blog, I touched a little bit on some of the basic differences between keyword and subject searches. For the library field, the notions of controlled vocabulary and subject headings have been under scrutiny and debate for some time, owing to the proliferation of natural language searching and the still-present difficulty in helping users understand how to properly use a catalog subject search. Two articles I recently encountered for my cataloging class break down some of the reasons why subject headings retain an important place in the library catalog, and suggest what part they might still play in the growing digital information world.

The first article, by Arlene Taylor and Tina Gross, entitled, “What Have We Got to Lose? The Effect of Controlled Vocabulary on Keyword Searching Results,” points to the important role that subject headings and subject searches play in facilitating retrieval of relevant resources for catalog users. Subject headings not only provide a controlled method for searching, but also function indirectly through showing up via keyword searches. According to the study done by Taylor and Gross, if the subject entry were to be eliminated from the catalog, or withdrawn from keyword searches, approximately one-third of related resources would fail to provide hits. This is a significant number of items that would elude the user’s search. Additionally, since the terms in question are located in the subject field, they are more likely to provide quality resources as relate to the user’s search of terms found in, say, the summary field, or table of contents (which are sometimes entered into a record). This isn’t to completely discard keyword searching, however. Users appear to have difficulty understanding how to search for controlled-vocabulary subjects, at times, or may not know exactly which terms to utilize for a given search. As keyword searches are easier, in some ways, to understand, they are sometimes preferred by users. So it is not so much a matter of choosing keyword searches or controlled vocabulary searches, rather it is ensuring that we infuse the precision of the latter into the usability of the former.


In “On the Subject of Subjects,” Arlene Taylor further advocates for the importance of utilizing subject headings and controlled vocabularies, as well as extending their use into the electronic domain and the Internet. Taylor points out that despite the pre-eminence of keyword searches for use by Internet users, keywords continue to be a poor option where precision of search results are desired. Not only does controlled vocabulary counteract this through the notion of specific entry, but it also serves to increase the possible search parameters for a user through the relating of broader and narrower terms for a subject, as well as synonyms, near synonyms, and other valid variations. Controlled, subject-based description does is not without its faults, however. Taylor expresses that there is certainly a place for keyword-based techniques, owing to their simplicity, lower cost to create, ease of maintenance (automated), and ability to stay current. What is advocated for the digital environment is a system in which less important, ephemeral resources can be indexed automatically using keyword-technologies, while those intended for long-term use can fall under controlled vocabulary systems. 

It is interesting to note that since the latter article’s publication almost 20 years ago, subject-based vocabulary control does not seem to have taken a strong foothold in the online environment, and likely for very obvious reasons - cost, speed, currency, and the ability to automate. While resources such as WorldCat and the Digital Public Library of America, among others, show the possibilities that online, linked resources can accomplish with controlled vocabularies, the world of digital information in general exhibits a pretty low degree of accurate description, much to the chagrin of searchers and their millions upon millions of hits. Is there a better way? Surely now that Pandora's box has been ripped open, it will be all but impossible to stuff it back in.

-Gross, T, and AG Taylor. n.d. "What have we got to lose? The effect of controlled vocabulary on keyword searching results." College & Research Libraries 66, no. 3: 212-230. Social Sciences Citation Index, EBSCOhost (accessed February 18, 2014).

-Taylor, Arlene G., 1941-. 1995. "On the subject of subjects." Journal Of Academic Librarianship 21, 484-491. Education Full Text (H.W. Wilson), EBSCOhost (accessed February 18, 2014).

1 comment:

  1. Subject description is always a pain ... it's so costly and it's difficult to program a computer to do it!

    However, Google's indexing algorithm does a pretty good job computing what a document is about, but those are documents from the open Web where there's lots of redundancy. I'm interested in the long term viability of human subject indexing in the context of projects like DPLA...

    --Dr. MacCall

    ReplyDelete