Tuesday, February 25, 2014

Dublin Core: Format

Having officially submitted my ranked list of Dublin Core elements to our professor for assignment, I was more than a little surprised about some of the less obvious ones that managed to squirm their way up toward the top of my list. Date, as I discussed last blog, ended up being my first choice, but the equally unexpected Format element did not follow too far behind. Now when you normally hear the word "format" in the Digital Age, the idea of specific types of computer files is probably something that comes to mind - Word documents, Powerpoint presentations, mp3s, mpegs, and jpegs all being notable examples we encounter daily. It's a safe bet that a Format element is likely to include this type of technical information, but as I discovered, it also goes beyond that in its task of helping to describe the nature of an informational resource.

The Dublin Core Format element aids metadata creators in providing a location for the specific physical and digital qualities an information resource to be described. For digital items, this includes details such as: file formats, programming languages, file sizes, digital dimensions (i.e. resolution), and encoding standards. When describing physical materials, Format would similarly describe information such as: physical dimensions, duration, and the specific medium carrying the resource (i.e. audio cassette, 35mm photograph). Like many of the Dublin Core elements, Format also has several refinements, or qualifiers, that can enable a greater specificity in the level of description, including: IMT (Internet Media Type), Extent, and Medium.

While this type of description may seem to be more on the mundane side than something like the Subject tag, it nevertheless still plays a vital role in the information that it conveys to users. Knowledge of specific file formats or encoding standards, for example, can alert users to the need for certain types of software in order to access the desired information resource. Description of the physical extent or medium for an item can similarly help a user decide whether a specific information object adequately fits criteria they are looking for. Format is thus a fairly integral aspect of describing both digital and physical materials, and an intriguing possibility to work with.

Sunday, February 23, 2014

Dublin Core: Date

As I'm going about trying to order the DC elements I would like to work with, it is interesting to see some of the complexity, and problems, with elements that would appear to be fairly straight-forward and simple. Take the Date element for instance...not a real difficult item to wrap your head around, right? Just put the copyright date or something? Well, no, it's not quite THAT simple. Dublin Core actually allows several qualifiers to be used in conjunction with Date, such as: Created, Valid, Available, Issued, and Modified, in order to modify the type of temporal information that can be entered in a metadata record. Thus, you can use DateIssued for something like the publication year for a work, or DateCreated for when an electronic document was first generated if needing more specificity than the general Date element can provide.

Another consideration for the entry of date information includes the exact format or sequence with which it is entered into a record. For example, today's date can be described:
February 23, 2014
February 23, '14
Feb 23, 2014
23 February, 2014
2/23/14
2/23/2014
23/2/14
23/2/2014
2-23-14
2.23.14
And that just scratches the surface of variations possible, not to mention factoring in foreign languages. So which format does a metadata creator use to ensure that the date is adequately understood by potential users? Accounting for regional differences in date presentation, and the fact that computers can't even interpret some of the variations, there is no perfect solution. The W3CDTF encoding scheme is one possible solution, and prescribes the use of the YYYY-MM-DD format for single dates. Whichever method is settled upon by the metadata creator, however, the importance of consistency in date entry can not be overstated. 

Thursday, February 20, 2014

What Dublin Core element would YOU like to work with?

Our digital imaging project has been assigned, and we've got Dublin Core elements spinning around in our head. But what to choose, what to choose?

Like many of my LS 566 classmates, I'm currently trying to figure out how to make some sense out of ranking the 15 Dublin Core elements in order of preference to work with for our big semester project. A whole Dublin Core element all to myself, should be easy right? Pick the easy one you dolt! Well, some of them are theoretically simpler than others, but they all have their wrinkles and quirks, and I don't foresee any one path carrying significantly less work than any other...not that that is a great way of going about choosing to begin with.

So, what do we have to work with here...I know I left a list of Dublin Core elements lying around here somewhere.

Voila!:
-Title
-Creator
-Subject
-Description
-Publisher
-Contributor
-Date
-Type
-Format
-Identifier
-Source
-Language
-Relation
-Coverage
-Rights

If I'm going to be spending a lot of time on this project, and I have no reason to suspect I won't, then I might as well get my money's worth and pick something I'm actually interested in working with. Subject is an obvious choice, right off the bat, as I have enjoyed subject description in the past, and am always fascinated with trying to figure out what an item is about. Perhaps I can bring some elements of controlled vocabulary to it as well?

Title is pretty straightforward, but when dealing with items that don't necessarily have an already-defined title, it posits the chance of complications.

Creator...oooo, there's another one where some possibility of vocabulary control exists.

Lot of tough choices ahead, but I'm sure there will be some extremely valuable experience going along with any element I ultimately get assigned. What would you pick?

Wednesday, February 19, 2014

On the importance of subject description

In my last blog, I touched a little bit on some of the basic differences between keyword and subject searches. For the library field, the notions of controlled vocabulary and subject headings have been under scrutiny and debate for some time, owing to the proliferation of natural language searching and the still-present difficulty in helping users understand how to properly use a catalog subject search. Two articles I recently encountered for my cataloging class break down some of the reasons why subject headings retain an important place in the library catalog, and suggest what part they might still play in the growing digital information world.

The first article, by Arlene Taylor and Tina Gross, entitled, “What Have We Got to Lose? The Effect of Controlled Vocabulary on Keyword Searching Results,” points to the important role that subject headings and subject searches play in facilitating retrieval of relevant resources for catalog users. Subject headings not only provide a controlled method for searching, but also function indirectly through showing up via keyword searches. According to the study done by Taylor and Gross, if the subject entry were to be eliminated from the catalog, or withdrawn from keyword searches, approximately one-third of related resources would fail to provide hits. This is a significant number of items that would elude the user’s search. Additionally, since the terms in question are located in the subject field, they are more likely to provide quality resources as relate to the user’s search of terms found in, say, the summary field, or table of contents (which are sometimes entered into a record). This isn’t to completely discard keyword searching, however. Users appear to have difficulty understanding how to search for controlled-vocabulary subjects, at times, or may not know exactly which terms to utilize for a given search. As keyword searches are easier, in some ways, to understand, they are sometimes preferred by users. So it is not so much a matter of choosing keyword searches or controlled vocabulary searches, rather it is ensuring that we infuse the precision of the latter into the usability of the former.


In “On the Subject of Subjects,” Arlene Taylor further advocates for the importance of utilizing subject headings and controlled vocabularies, as well as extending their use into the electronic domain and the Internet. Taylor points out that despite the pre-eminence of keyword searches for use by Internet users, keywords continue to be a poor option where precision of search results are desired. Not only does controlled vocabulary counteract this through the notion of specific entry, but it also serves to increase the possible search parameters for a user through the relating of broader and narrower terms for a subject, as well as synonyms, near synonyms, and other valid variations. Controlled, subject-based description does is not without its faults, however. Taylor expresses that there is certainly a place for keyword-based techniques, owing to their simplicity, lower cost to create, ease of maintenance (automated), and ability to stay current. What is advocated for the digital environment is a system in which less important, ephemeral resources can be indexed automatically using keyword-technologies, while those intended for long-term use can fall under controlled vocabulary systems. 

It is interesting to note that since the latter article’s publication almost 20 years ago, subject-based vocabulary control does not seem to have taken a strong foothold in the online environment, and likely for very obvious reasons - cost, speed, currency, and the ability to automate. While resources such as WorldCat and the Digital Public Library of America, among others, show the possibilities that online, linked resources can accomplish with controlled vocabularies, the world of digital information in general exhibits a pretty low degree of accurate description, much to the chagrin of searchers and their millions upon millions of hits. Is there a better way? Surely now that Pandora's box has been ripped open, it will be all but impossible to stuff it back in.

-Gross, T, and AG Taylor. n.d. "What have we got to lose? The effect of controlled vocabulary on keyword searching results." College & Research Libraries 66, no. 3: 212-230. Social Sciences Citation Index, EBSCOhost (accessed February 18, 2014).

-Taylor, Arlene G., 1941-. 1995. "On the subject of subjects." Journal Of Academic Librarianship 21, 484-491. Education Full Text (H.W. Wilson), EBSCOhost (accessed February 18, 2014).

Sunday, February 16, 2014

Keyword versus Subject Searching

One of the major differences between a controlled metadata system like a library catalog, and the search engines that we utilize every day on the internet, is the way in which subject metadata is created, indexed, and searched for. A library catalog, for instance, runs on the idea of a controlled vocabulary, whereby items are allocated pre-defined subject headings by human catalogers (e.g. History--Byzantine Empire). This gives the item a fairly detailed and accurate description, and facilitates the reliable search and retrieval of information resources that library users are familiar with. Search engines, on the other hand, largely rely on automated indexing, and utilize a keyword, or natural language, style of search (e.g. "What year did Jesse Owens win his gold medals?"). The difference between the two systems is fairly well reflected in the results one will see for a typical internet search - millions of hits, and the retrieval of an extraordinary amount of web sites that have absolutely no relevance to the original search...oh, and don't forget porn sites. This blog entry isn't to make a claim in favor of one or the other, for it is a gross simplification. Both controlled vocabulary, subject-based searches, as well as keyword natural language types have their role in the metadata world, and their relative strengths and weaknesses.

If the differences between keyword and subject searches are new to you, the George Mason University Libraries provides a fairly helpful guide that notes the differences between the two search methods, and the instances in which each should be used. While the information pertains mostly to the use keyword and subject searches in a library catalog, they can be equally applied to online resources, discovery services, and search engines.  There is also a brief, but helpful, section on how to use Boolean operators (AND, OR, NOT, etc...) to get the most out of complex searches. Searching may seem to be a very basic skill, but there are actually quite a few tricks that are useful for those looking to spend more time reading relevant articles and pages, and less time browsing pages upon pages of irrelevant search results. With more and more information resources moving into the digital environment, the ability to quickly and efficiently search for specific items will only become more valuable.

Friday, February 14, 2014

Digital Public Library

A classmate of mine posted a recent blog discussing the Digital Public Library of America, or DPLA. Though I'd seen the DPLA referenced here and there in articles read for other classes, I never really took the time to check it out until now. Having now rectified that oversight, I can say that the DPLA looks to be a fantastic resource, combining some of the best elements one would expect to see in an online library: picture galleries, archival collections, and, of course, a vast number of digitized books from some of the more noteworthy institutions across the United States. Additionally, the DPLA boasts features that give the collection a distinctly modern and digital flavor: apps, games, virtual bookshelves, discovery services, and yes, even a way to make historical Lolcats.

One area that is of particular interest in scanning the DPLA is viewing the way that item records and metadata are utilized. While books, electronic documents, archival resources, and visual media all are normally described using different standards and protocols, the DPLA brings all its resources under a unified internal metadata schema. Unlike some other schema, however, DPLA's also appears to integrate a number of controlled vocabularies and thesauri, which means that the system retains some of the precision in search and retrieval that one would expect to find in other vocabulary-controlled systems, like a library catalog. Though undoubtedly far from perfect, I think the DPLA represents some interesting possibilities for the potential of library-type collections in an online environment, bringing together some of the structured, organized, and controlled features of library collections, with the vast amount of information and interactivity available in the digital world.

Wednesday, February 12, 2014

The Ins and Outs of File Naming

While metadata can often come in forms that are increasingly technical and complicated, there are also many forms that are simple to use and understand, and encountered by many of us on an almost daily basis. However, even simplicity and accessibility provide no guarantee that metadata will be used properly, as is evidenced by a form of metadata we are almost all familiar with, and commonly misuse...electronic file names. Rather than take full advantage of this gem of an organizational, and retrieval, tool for our computers and devices, we instead clog up document folders with endless batches of "Untitled" documents, and incredibly useful file descriptions such as "receipt", "Agenda," or "list." If you, like me, have failed to show adequate appreciation for this easy-to-use metadata tool, then take heart that the State Library of North Carolina has come to rescue us with a series of four short videos on File Naming Guidelines.

The SLNC covers a variety of basic topics related to file naming, including: why it is important, how to alter existing file names, and some best practices to follow, as well as some to avoid. Though some of the information is fairly intuitive and already familiar to users, the videos do provide suggestions and warnings that otherwise may not have been considered. For example, failing to create unique file names for automatically-named files (i.e. digital photos uploaded from a camera) could lead to important files or documents being overwritten and lost, though most modern operating systems have some measure of protection against such an eventuality. Also of interest are the suggestions for characters to avoid in creating file names, including: most special characters (e.g. !, ?, /, $, and the like due to their use in programming languages), spaces (the underscore special character is an acceptable alternative), and capital letters (software often makes no distinction between case). Among the best practices I found most helpful was the recommendation to include a consistently formatted date in file names, which can provide valuable context when searching through old files.

Though not discussed in the videos, I feel that the guidelines provided would also be particularly useful to organizations concerned with long-term preservation, or storage, of electronic files. Naming practices have a considerable impact on the ability of an archivist or records custodian to make sense of the original use, function, organization, and order of electronic documents. Absent a logical and consistent naming scheme, attempts to organize the documents into a meaningful collection can be severely compromised.