Lost in the Stacks

Tuesday, April 29, 2014

Indexing Project: The search for perfect metadata

One of the struggles I've experienced in indexing my assigned digital images has been determining how much time to allocate to working on each record. Since this is a "Final Project" for a class, there's a natural tendency to spend quite a bit more time on the process, and ensure that every field is entered "perfectly," although I don't find that this is indicative of the way we are likely to encounter metadata creation in a professional environment. Certainly, in my day-to-day role as a school librarian, I don't have the spare time to get "too fine" with my cataloging backlog (I hardly have the time to catalog at all), nor did my experience at JPL, entering and describing documents in a digital repository, give me an excess of time to really pore over and deeply research the individual entries. Real-world metadata description often puts a premium on speed that is definitely absent from class exercises to a certain extent. That's not to say that emphasis isn't also placed on the accuracy and quality of entries, but these facets all need to be balanced when entering data in a work environment.

Now, this isn't to say that I didn't put a grade A effort into my image description. As someone who over-thinks, over-writes, and then over-thinks some more, I bellyached plenty over my fifteen assigned images. But at a certain point, I also recognized (and thanks to the sentiments of Dr. MacCall), that there isn't going to be perfection in our entries, and that the subjective nature of much of object description means that there are always going to be ways in which entries could be improved or tweaked. There is also the reality, beyond our control to an extent, that those searching through our records are going to be thinking of terms, phrases, and descriptive terms that are wholly different than those we conceived of, and it is difficult to ensure that all such perspectives are properly represented in our metadata entries.

I think there's a trend in our culture to think that doing something "well enough," or accepting "less than perfect" signifies laziness, a lack of ambition, or an acceptance of mediocrity. While it's true that we don't want to produce poor quality metadata, it's important not to lose sight of what it is we're producing. Metadata is a tool, a representation of a work utilized to find a desired information resource. It is meant to be USED. In order for that to happen, however, we need to be willing to "cut the cord," and let it come to life. In doing so, and turning it over for users to scrutinize, we may discover points for improvement and editing that we never would have considered on our own. It's a learning process, and one that we should approach humbly, open-minded, and with an ability to adapt. Don't get so concerned with creating excellent metadata, that you really end up creating no metadata. For an over-thinker like me, this is a tremendous challenge, but one I will continue to apply myself to meeting as I gain more and more experience with metadata, cataloging, and other descriptive processes.

This will be my last blog for LS 566, Metadata...I really just can't believe Mr. Anti-Social Media managed to crank out 42 this semester! If you've been following my posts, I appreciate you taking the time, and thanks for those that have left comments. Best of luck to the rest of my SLIS compadres in their future studies and professional endeavors.

Wednesday, April 23, 2014

Ephemerality and the web

So there I am, just about ready to present my digital repository for Metadata class...well, scratch that, I had been done with my presentation materials for a bit of time in advance, but like with everything, I tweak, edit, revise and redo right up til the end. As I surf back over to Northwestern University's World War II Poster Collection, my chosen repository, I am met with a startling discovery...it's GONE! Ok, not gone completely, but it would appear that after 17 years, the decision was made to finally migrate the old repository over to a new software platform, wiping out quite a bit of the existing metadata from the looks of it, and instantly negating any of the information I was prepared to present on the collection. Surely I am not the first student this has happened to, and we have all certainly encountered the ephemeral nature of content on the web at some point, with favorite sites there one moment and gone forever just a few days later.

It is inevitable that any online content meant to last for an extended period of time (is that an oxymoron?) will undergo changes and transitions at some point in its life cycle. Guessing the specific reasons for such changes is a fairly interesting exercise, and is definitely informed by a lot of what I learned during this semester. For the World War II Poster Collection, the interface is now more modern, the navigation options more intuitive, and the overall aesthetic is definitely of a higher quality. That's certainly of benefit to the site just on visual and presentation improvements. Beyond that, there are undoubtedly changes "under the hood" that have made a change in software and style warranted. How does the site handle linked data, and is it now designed to be integrated with other NWU online collections? What types of elements or features are now usable, or desired, that may not have been possible to represent in the old software? Will these changes allow the collection to be harvested by sites such as DPLA? Will the metadata records still be converted from their original MARC format, or will they be built as original records from the ground up? What schema will it follow? What content standards will it use? As the collection's migration still seems in its infant stages, it will be interesting to follow the progression of how its metadata records are built, how items will be grouped, structured, and searched for, and what extra features (if any) will be implemented.

This whole experience has definitely been an educational one for me. While my professor can attest to my panic upon first realizing my repository had been reborn, there is definitely a lesson to be learned in terms of the need to be flexible, adaptable, and open to changes in the world of metadata, as well as the online environment as a whole. That may seem like an obvious characteristic to some, but I think we sometimes take for granted the nature of the information we pull from the web on a daily basis, and the ability to continue doing so day after day. At the least, I got a really great glimpse at a pretty interesting digital collection of World War II-related posters and some of the technical aspects related to its presentation and use of metadata, and now I get to see the whole process started anew.

Wednesday, April 16, 2014

When do I use the Relation element?

Yesterday, I blogged about beginning our digital image indexing, and made some predictions as to which metadata elements might be a bit of a breeze, and which ones might throw me for a loop. While I've already gone through and finished up the indexing for a good portion of the elements, a few of the more time-intensive ones have been cast aside for the time being (subject, description, title), and one I just really don't know what to do with...the Relation element.

Relation is usually used to indicate associations of an item with other items. This theoretically could be for a variety of reasons: part of the same collection, part of a specifically constructed series, created by the same person, having the same context. The problem is trying to figure out exactly what type of relation should be applied to my images (if any), and to a certain extent, where do I draw the line in determining what is "related" to them.

In my football images, for example, I have 5 that were taken during the 1975 Alabama/Mississippi (Ole Miss) game. Now do these images relate to all images taken from that 1975 Alabama/Mississippi game? One could possibly make an argument that they do, although the enormity of including all such images is daunting. Do they relate to the immediate grouping of 5 that were my assigned images? Not necessarily, as their only common thread in that regard is that I am their indexer (which I will humbly recognize as not significant).

How about my John DePol images? According to a reference sheet we have, many of them belong to broad groupings and categories, such as "Books" or "blk_arch." Should I mention all the other images from such groupings under Relation? Again, the scope of images involved makes this a potentially frightening proposition to consider repeating several times.

So when SHOULD we use Relation and what exactly should we use it for? Hopefully some kind souls, including Relation guideline people, can swing by and shed some light for us.

Tuesday, April 15, 2014

The Great Digital Image Indexing Phase Begins

We've moved past guidelines and are on to indexing our assigned images. I'll be looking at 10 football images (5 from 1975, 5 from 2010) and 5 art images from the DePol Collection., which have already been uploaded to the Omeka Repository. As I approach the indexing process, it is interesting to guess which elements might be easiest to enter, and which are going to be a bit more on the challenging side. It reminds me of the same debate over which elements I wanted to work with, and which I didn't.

Elements I think MAY be simpler to deal with:
Date Created and Date Modified-My own elements (for the DePol Collection), the temporal information for both creation and modification/digitization should be pretty easy to locate and enter.
Source-Seems to relate to derivation of the digital image. Has a drop-down menu, BIG plus.
Identifier-Depends on the specific guidelines, but this may be something like the image's unique file name.
Rights-I see another drop-down menu, at least for football. I like drop-down menus.
Format-File formats are easy to find, and particulars such as the resolution are located in the file properties.
Type-Generally standard categories of resource types from which to choose.

Elements that MAY NOT be so simple to deal with:
Title-Will really be sticking to the guidelines for this one. I don't relish trying to think up titles on my own-creativity NOT a strong suit.
Subject-Errrr....depends how much flexibility we have in the guidelines. There are a LOT of subjects out there. Good controlled vocabulary will make things much simpler though. I feel much more comfortable on the football images with this one than the art images. I love art, but am anything but an expert at it.
Description-More for the art images than the football images. See above about art expertise (lack thereof).
Abstract-Summary of the resource...I have major issues with brevity in my writing.
Relation-Many of these images are related to each other, so it will be interesting to see how this element is handled in the guidelines.

So there are my predictions going into the indexing process. We'll just have to see what ends up being more challenging than expected, what is simpler than expected, and which elements just gave me a headache. I have a lot of faith in my classmates and their guidelines, however, so am really looking forward to this aspect of the project. I think it will be informational, and maybe even a little enjoyable.

Resource description and the one-to-one principle

One of the fundamental challenges in using metadata to describe digitized images and objects is discerning exactly what it is you are describing. This isn't necessarily in a "I have no idea what that's a picture of" sense (though there's also that problem), but more so in determining which version of an object you are entering information for. For example, if you have a digital picture of the Mona Lisa, what types of information would you expect to be displayed as metadata? For some, perhaps many, myself included, we would expect to see the title "Mona Lisa," the creator or artist listed as "Leonardo da Vinci," or "Da Vinci, Leonardo"...we might even expect to see dimensions for the original painting, information about the materials utilized, and a date in the 1500s. Easy enough, right?

If we were to take that same digital image, however, and then use it on our website, blog, or in a school assignment, what would we cite? More than likely, we would be listing the site we took the image from, the date it was copyrighted, and perhaps even who the rights of the image belonged to. Here's were metadata for digital items starts to get complicated, because that image of the Mona Lisa is actually a distinctly different entity than the Mona Lisa itself. Both have unique creators, formats, sizes/dimensions, creation dates, and locations, among other descriptive information. You can see why this may start to get confusing for metadata creators and users alike.

The principle that guides us (sort of) through this confusing situation is known as the One-to-One Principle. According to the DCMI Wiki, the One-to-One Principle "dictates that a metadata description should refer to just one resource." The Wiki goes on to elaborate that the principle was created to ensure "that distinct resources be identified, distinguished, and described separately, as in the case of the original "Mona Lisa", created in 1506 by Leonardo da Vinci, and a photograph of "Mona Lisa" created in 2008 by John Smith" (I really did pick the Mona Lisa at random, honest). The point seems fairly clear, however, digital images of original materials should have metadata that reflects the digital image's creation, not that of the object being captured. The reasons for this are fairly compelling, actually, as it provides attribution of the digital image to the person and circumstances that actually created it. If we were to ignore this contribution, and focus on the original object, we actually obliterate the digital image as a distinct informational object. When we consider the proliferation of digital and digitized documents, ensuring that we preserve the integrity of distinct informational objects should be of prime importance.

In practice, however, the One-to-One Principle may not be so cut and dried. The DCMI Wiki makes the point that the notion can be very subjective, and indeed, it is sometimes difficult to determine what exactly constitutes a unique resource. Beyond those technicalities, however, there is also the problem that conforming to the principle can sometimes be confusing for users. Does someone looking for a picture of the Mona Lisa want to see the details of John Smith's photograph, or do they want the details of the famous da Vinci canvas? Do the preferences of the user really matter in describing information resources (be careful, consider why we do it in the first place). Is there a way to reflect both sets of information, without overloading users (some metadata elements can serve the purpose, but there's a limit to what you probably should include in them)? What do you think about the merits of the One-to-One Principle? What information do you want to see when you are browsing through digital items?

Tuesday, April 8, 2014

Viewing description through different lenses

One of the really intriguing aspects about this current semester has been the opportunity to view the description of information resources through the differing perspectives of three different classes. Cataloging, Metadata, and Archival Arrangement and Description may come at the process of resource description from different motivations and viewpoints, but are all ultimately concerned with identifying what a resource is, and enabling its access by outside users.

Cataloging is the most technical and complex of the three courses. It is in some ways the most traditional descriptive process of the field, carrying on the legacy of inventorying, classification, and ordering that has been a hallmark of libraries for centuries. Though having undergone a significant, recent change in the cataloging rules that now guide the practice, cataloging's attachment to the MARC standard keep it a relatively formulaic and predictable practice. Employment of cataloging processes requires an understanding of specific tag locations, subfields, indicators, and an understanding of specific content standards and data entry formats. While a knowledge of these protocols and features can make cataloging a relatively more difficult, or time-consuming, description method, it also tends to produce catalogs and indexes that are relatively cohesive and consistent, resulting in relatively accurate searches. Though the new RDA cataloging rules place more focus on the relationships between information resources, cataloging has traditionally focused on description of single items.

Archival description, on the other hand, has tended to be a bit more flexible, conceptually. Having undergone cycles of emphasis which included: simple inventories, item-level descriptions, to the more common collection-level finding aids, archival description has tended to focus on how to adequately identify and provide access to aggregations and collections of resources, rather than single items. For archival purposes, data such as the context of creation, provenance, and function for information resources has often outweighed their content in importance, as well as the relationship of those resources to each other and the collection as a whole. While archival description has codified its practices in standards semi-derived from the world of library cataloging, there is more flexibility in how they tend to be implemented. Like library cataloging, the ascendancy of digital documents and information resources will signal some pretty significant changes to archival descriptive practices in the future.

As for Metadata, well, in many ways it is a concept that encompasses both of the others. In a very general way, metadata is the essence of what library cataloging and archival description are about, but in another sense, it represents part of the future for both practices. Digital metadata bears the advantage of not being bound to specific encoding standards, content standards, or software platforms, making it an incredibly flexible and simple tool to utilize. By the same token, it can be made as rigorous and specific in its scope as a user or corporate entity desires, thereby retaining the degree of control and consistency that one would expect to find in a library catalog, for instance. Features such as the ability to easily link data or images make it especially valuable to archives and special collections, providing users the ability to visually explore the relationships between items, rather than hunt for them in stored boxes.

Despite the obvious advantages, however, the big question mark is whether institutions will buy into a collective metadata system (e.g. the Semantic Web) in the same way that they have for MARC and linked cataloging. Can libraries make the big jump from MARC and corporate catalogs to a shared, centralized metadata system? Can it similarly interlink the holdings and records of archives and special collections? Interesting times ahead seeking out the answers to these questions, that's for sure.

Wednesday, April 2, 2014

Title metadata for unnamed photographs

A classmate recently blogged about some of his struggles tackling the Title metadata element for use with the football photographs repository we will be indexing. The images in question were photographs taken from University of Alabama football games during the 1975 and 2009 seasons. Now the Title element would normally seem to be one of the easier tags to provide information for, but when you actually consider the nature of most photographs, most are not likely to possess an official title. As the same classmate pointed out while advocating for the use of the Title element on this project, a title (name) is a core element in providing identity to an object, but it also plays a role in facilitating discovery and retrieval of the item in an online environment. In the absence of a pre-defined title, then, creating a suitable name for an item becomes an extremely important part of the indexing process.

Creating a unique title is definitely not an exact science. Trying to get too creative or interpretive in the forming of a title can make the item relatively inaccessible, or confusing, for potential searchers. On the other hand, getting too descriptive with a title can flood what should be a relatively simple access point with too much detail and information. On the whole, however, I believe that in this type of situation, a descriptive title is probably the best method of assigning a title, if not necessarily the simplest. Providing details such as specific player number or names can be a very useful aspect of a title, such as: "Greg McElroy throws pass versus Florida International." Trying to discern where to draw the line where including player names and numbers can be difficult, though, when facing group shots where no specific player is the focus of the photograph. Additionally, if multiple photographs of the same player performing an action are present in the collection, further information will be needed in order to disambiguate them.

This issue is a pretty intriguing problem, and it's even something that any of us with personal photo collections that we are trying to name can relate to. I will be interested to see how these issues are sorted out in the indexing guidelines.