Lost in the Stacks: January 2014

Friday, January 31, 2014

Metadata and Classical Music

As an aficionado, and burgeoning collector, of classical music, the topic of metadata in digital music collections has become a sort of pet project of mine. Having recently gone through the process of re-importing my entire classical collection (a small one, admittedly, around 1000 songs), I can attest to the fact that attempting to adequately manage the metadata of such a collection is a labor of love. Inconsistent data entry, missing or incorrect metadata, and some unique problems that hamper classical listeners in comparison to those with mainly rock/pop collections, all conspired against my efforts to bring some much needed order and cohesion to my music files.

In a recent blog, LS 566 classmate Mary Elizabeth Watson touched on the subject of metadata and digital classical music collections, and, with reference to the Naxos Blog on classical music, pointed to some of the more significant hurdles that classical listeners must face in organizing their digital collections. Some of the problematic areas mentioned include: the lack of authority control across metadata entries, the incompatibility of metadata tags and their functionality across varying music software and devices, and the purposeful misuse of metadata in order to work around those functionality issues.

One of the fundamental difficulties classical listeners face when dealing with their digital collections is that, by and large, digital music platforms, devices, and software were made with the classical genre as a bit of an afterthought, if thought of at all. Software such as iTunes emphasizes Artists, Albums, and especially single Songs as primary categories, which have limited, or differing, usefulness to a classical listener, who may prefer organizing and searching collections by Composer or complete Work/Piece (i.e. Beethoven's Symphony #5 rather than just the 2nd Movement). Audio quality, perhaps not a primary consideration when listening to the latest Katy Perry single, but paramount when trying to squeeze every ounce out of Mahler, is often compromised by the lack of support for certain lossless file formats. In other instances, the incompatibility of certain metadata tags (i.e. the Disk # tag in iTunes) across players/devices makes attaining even the proper playback order for a piece impossible. While there are a lot of software options out there, all of them seem to have their drawbacks, whether it be: overly simplistic, overly complex, poor compatibility with your purchased music device, lack of compatibility with downloaded music file types, or a myriad of other reasons.

While the picture may look slightly bleak for classical listeners, there's actually great reason for optimism, however. There has never seemed to be a better selection of recordings available, at a better price, thanks in part to digital distribution. Listeners have unprecedented control over the organization and description of their personal collections, depending on how much time they want to sink in to the process. And on the software front, software platforms have gradually begun to include more and more features that are beneficial to classical collectors in particular (i.e. iTunes improved functionality for multi-disc sets, options for gapless playback, lossless audio compression, etc...). But there is still a good deal of work to be done (and a true software platform/device built for the classical genre wouldn't be bad either), and one of the areas that can definitely be improved is musical metadata. Next blog will take a look at one of those problem areas...authority control.

Tuesday, January 28, 2014

Metadata Opportunities for Librarians

As I inch closer toward completion of my MLIS, thoughts invariably turn to what comes next in my career. Though I have a great deal of interest in a traditional library or archival job, including areas such as reference or cataloging, I am also increasingly aware of the possibilities of library and information professionals outside the traditional walls of our field. In a recent blog post, LS 566 classmate Molly Porter discussed her work as an archivist for the NASA's Marshall Space Flight Center, and some of the non-traditional and metadata-related activities that increasingly characterize her work. Examples include: the "crafting of a metadata strategy for the center's film and media migration project," continued development and updating of web content, as well as increased social-media activities of varying kinds.

Having had the fantastic opportunity to experience a similar work environment during an internship with NASA's Jet Propulsion Laboratory, I can echo the increasing role that metadata plays in the world of information professionals, and the excellent chance it affords us to branch out from our traditional role inside the library. The tasks I was responsible for were two-fold: metadata description of digitized documents into a topical digital archive, and the digitization of project documents and reports, and their subsequent entry into the project's document repository. The former job had me manipulating metadata in a manner that was distinctly similar to the controlled process of library cataloging, including the application of document titles, authors, and even subject headings (created from a archive-specific taxonomy). The latter project, by contrast, bore more similarities to records management and preservation, and any controls placed on the metadata entered were a result of my own preferences. This task, especially, illustrated to me just how important consistency of metadata entry can be to the organization of a database or records repository, and how much responsibility the metadata creator bears for it.

It is exciting that library and information professionals are increasingly afforded the opportunity of metadata-related jobs, not just with scientific organizations such as NASA, but also in the corporate world, with health and medical organizations, and countless other environments in which the manipulation and organization of metadata, records, and digital documents require our background. Would love to hear from any of you that regularly work with metadata, or have been able to find your librarian skills in demand outside the library walls.

Sunday, January 26, 2014

Metadata Concepts: MARC

Over the course of my LS 566 blogging, I hope to be able to touch on some basic metadata concepts, schemas, standards, and other tools that are important to the library and metadata fields. For the first of these such posts, I'd like to focus on an integral component of metadata in a library setting- MARC.

MARC, short for MAchine-Readable Cataloging is a data format standard first developed during the 1960s, and still used today in its MARC 21 form. MARC provided information professionals with a format which enabled the creation, use, and transmission of electronic catalog records for libraries, and thus made the use of computerized library catalogs, and the sharing of catalog records, viable. As both a national and international standard for bibliographic data, MARC has had a tremendous on the cataloging world, and is virtually synonymous with the concept of library catalog records.

In form, MARC is a series of 3-digit, numerical fields, or tags, with each such tag representing a piece of bibliographic information, such as the: title, author, or subjects for a given work. Tags are modified through the use of indicators and subfields, which allow the entry of additional information, or denote that the record be used or read in a certain way (i.e. indicators can be used to prevent a record from being searchable in the library catalog, while another tells the computer to skip a certain number of characters in the entry, etc...).

A very simple example of a single MARC tag:

245 14 $aThe Three Musketeers / $cby Alexandre Dumas ; with an introduction by Allan Massie

In this example:
-245 is the field for the title of the book
-14 is a series of two indicators: 1 noting that the title should be added to the catalog, the 4 that the first 4 characters of the entry ("The" and the space after) should be skipped (articles like "The" and "A" make organizing and searching for materials are problematic, and so are ignored by the catalog
-$a is the subfield in which the title of the work is entered
-$c is the subfield for the statement of responsibility, which in this case, includes the author and writer of the introduction

Though complicated to use and understand, at times, MARC has had a tremendous impact on the development of modern cataloging. While it has served the library community well for the better part of 40 years, the possibility of change is on the horizon. Among other factors, a new set of cataloging rules (RDA) is steering the cataloging world in a new direction, and it is questionable whether MARC is the best fit going forward. I hope to touch on this subject in a future blog.

For more information about the MARC standard, there are two great resources online: the MARC Standards page, through the Library of Congress, and OCLC's Bibliographic Formats and Standards page.

Monday, January 20, 2014

Metadata as Historical Record

When attempting to properly discern the events and peoples of the past, historians are dependent upon a wide variety of historical artifacts and evidence to develop their understanding. Primary source documents, such as: chronicles, records, letters, and journals, among others, often provide us with the clearest picture, and yet can be terribly unreliable at times, whether due to: embellishment, mistake or misunderstanding, advancing a specific agenda, or even just a lack of proper context. While this doesn't discard the importance, and relevance, of primary source documents, it does explain the importance of collecting alternative sources of evidence, whether it be physical artifacts, such as pottery, architecture, tools, etc..., or otherwise. These items are capable of giving us incredible insight into the movements, actions, and capabilities of historical peoples, while also providing valuable context to the information contained any source documents. Is it just possible that, for our generation, personal metadata serves just such a role in helping to document our activities, behaviors, and values?

To turn back, again, to the "Guardian guide to metadata," there is no denying the value that collected personal metadata has in tracking human behavior and activity. Information such as: who we call, where we are when we talk in the phone, what we search the internet for, what webpages we visit, the subjects of our emails, and much more, provide tremendous insight into how we conduct our daily lives, and the information we value. Time stamps and locations of our interaction with the digital world illustrate our movements throughout the physical one, and can even be used to determine who we may have encountered and interacted with. The hidden value in such information is that it is presented without bias, without filter, and without distortion. It is simply a record of who we are, what we have done, who we have known, and even what we have thought, freed from the prejudices of our perspective, or that of an observing chronicler. What better canvas is there to work from if you are a historian seeking to understand a people and a culture?

It may be difficult for us to recognize the value that such information could hold in the future. For us, now, it is an intrusion and a violation of our privacy, and something to be averted, if at all possible. Assuming the information ever survives long enough to be of use to future generations, it is interesting to know if they would be similarly concerned with the context in which such personal data was collected. Would they too be outraged, or would they look upon it the way in which we cherish each new artifact and record that provides insight into the workings of the past?

Friday, January 17, 2014

It's 3 a.m., Do you know where your metadata is?

It seems like the siege on our privacy escalates on a daily basis. Hacked cell phones, compromised financial and personal information at popular shopping destinations, and the snooping of our own government security agencies all serve as constant reminders of our vulnerability in a digital world. While we're all aware of the dangers that viruses, malware, keyloggers, and poor passwords pose to us, there is a more subtle threat to our privacy that often escapes notice...metadata.

The "Guardian guide to your metadata," inspired primarily by the controversial NSA surveillance program, provides an interesting look at the way in which our metadata is collected, and the ways in which it can be used against us. Helpfully contained in the article is a listing of the various types of devices and technologies that we utilize on a daily basis, and the types of metadata taken from it. For example, did you know that information such as your: search queries, search results, and the webpages you visit are all elements of metadata collected when you utilize the internet. Or how about: the phone numbers of people that you call or receive calls from, the length of your phone conversations and the time they took place, and even the location that the phone call occurred from?

If this isn't sobering enough, the article provides a case study in just how our metadata can compromise our personal privacy, as illustrated by the case of General David Petraeus.

While most of us are rightly horrified by the prospect of our privacy being invaded through the collection of metadata, there is a rather interesting notion to be explored in how this information creates a sort of historical record of modern times. That is a topic I hope to look at a little closer in a future blog.

Tuesday, January 14, 2014

Objectivity in Metadata Creation

Classmate James Mitchell, in his recent blog, Socially Constructed Metadata? poked at the nature of metadata creation and raised some interesting ideas about the notion of whether constructed metadata reflects objective realities, or is merely reflective of the conditioning, experiences, and perceptions of its creator. While ostensibly focusing on the technical issues of metadata interoperability, I was far more intrigued by the possible issues presented by the conflicting ideas of metadata creators themselves.

Having been brought up largely in the history field, the question of objective vs. subjective reality and the quest for accurate description of events or objects is one that I am fairly familiar with. Historiography, a standard class for the undergraduate history student, deals with the idea of author bias (i.e. experiences, perspectives, moral and social values, language, etc...), and causes one to question the ability to take sources at face value due to them always being viewed through the lens of a biased observer. Postmodern perspectives will even question our ability to ascertain the "truth" of events with any certainty, as any record of the event is inevitably interpreted through the inherent perspective or bias of the recorder.

Applying these same ideas to the field of metadata, another field of seemingly-objective description in the hands of biased describers, and one can ask the same questions, and ostensibly arrive at some of the same conclusions. Whether one believes in the objectivity or relativity of constructed metadata, however, I think that simply exploring these issues should help us to be aware that we all bring innate preconceptions, prejudices, bias, and perspectives with us to the description table. Understanding that is an important part of overcoming our differing perspectives and striving toward a greater degree of consistency, uniformity, and objectivity in the work that we do.

Monday, January 13, 2014

Metadata: Metauseful or Metacrap?

The notion of widespread use of descriptive metadata for everything from webpages and search engines to digital media and online stores is undoubtedly an information organizer’s dream. All manner of internet navigation being guided by the use of accurate and consistent metadata would streamline search processes, eliminate undesirable results, and bring some semblance of order to the world of chaos that can often characterize the web.

As all of us have doubtlessly experienced, however, such ideas belong to the realm of fantasy. Web searches are messy, retrieving millions of results, with most users lacking the time or endurance to parse for relevant entries after the fire couple of pages. Sometimes we are deliberately led astray, receiving hits for sites that clearly have nothing to do with the item being searched for. Why does this happen? Why has the potential of metadata failed to bring order to the digital environment when its possibilities are so promising?

I find that journalist Cory Doctorow’s 2001 essay, "Metacrap: Putting the torch to seven straw-men of the meta-utopia,” provides a fairly succinct, accurate, and humorous summary of the reasons why the meta-dreamland is one not likely to ever reach the shores of reality. I won’t summarize the whole of the essay here, but will comment on a couple of points he makes.

For me, one of the more problematic areas that Doctorow discusses is the misuse of metadata for the purpose of attracting our attention. Whether malicious and outright false, or simply exaggerating or “enhancing” the properties of an object or site, the fact is that metadata on the internet, at times, takes on all the properties of intrusive and misleading advertising we’ve witnessed across other forms of media. From mail spam with maliciously-intended titles to the deliberate misuse of high priority/relevance metadata terms, the digital environment is anxiously competing for our attention. That the purposeful misuse of terms destroys the integrity of the metadata in question is of no concern to the creator, so long as it gets us to their site/product. A sad result of this trend is that website metadata fields are virtually ignored by most major search engines of today, depriving internet users of a tool that could have been invaluable in enabling expedient and highly accurate search results.

Another area of the essay I found particularly relatable dealt with the laziness of people contributing to inferior quality of metadata. Sometimes it is simple negligence or omission of information, sometimes it is not taking a few moments to ensure that the entered data is correct. Either way, the “casual” approach to metadata can play a significant role in compromising its quality. In my work for a digital archive in the local area, I was struck by the difference between the quality of metadata used by the archive itself, and the lack thereof in the document repository utilized elsewhere by the company to house all its current files. The lack of: standard naming protocols, visible dates, file naming methods, attribution of the content creator, and even of a standard file format spells N-I-G-H-T-M-A-R-E for the person eventually responsible for collecting and archiving the documents. Establishing such protocols and standards beforehand, and ensuring that content creators understand and follow them, would have averted much future work.

“Metacrap” is a good read, and won’t take but a few minutes of your time, so be sure to check out the link provided above. I know this article has been popular with my fellow LS 566 travelers. Would love to hear some of your thoughts and insights on whether the meta-utopia is truly as unachievable as Doctorow would have us believe. If you’re not of the SLIS crowd, your opinions and insights are equally welcome, as it’s always great to hear how people outside the field view the topic.

Thursday, January 9, 2014

Cataloging vs. Metadata?

Drawing from my relatively meager knowledge of metadata, at this point, I gather it's a fairly common question as to how the concept differs from cataloging. Both are methods of assigning descriptive data to an object using prescribed standards for the purpose of facilitating the location and collocation (placing like items together) of said items. While cataloging is commonly used in reference to libraries and the records of their collections, and metadata often seems applied in relation to digital media, I find myself questioning how much of this is due to traditional associations and naming practices. There is certainly no reason why the information contained in a library catalog could not be considered metadata, but is the application of the term cataloging to non-library media equally acceptable?

Are cataloging and metadata interchangeable ideas, or are they unique and distinct in their form and function? I have seen it mentioned that the word "metadata" is employed simply to make the idea of cataloging more acceptable to types of people that would otherwise look down upon the activity. Does this sound plausible? For what reasons would someone find the notion of "cataloging" unappealing, but not have similar qualms about "creating metadata?" I'd love to hear your thoughts.

Wednesday, January 8, 2014

A Renewed Excuse to Blog

Spring Semester is here, and with it comes a reason to pick the blog back up off the floor, a class on Metadata. For those unfamiliar with the topic, metadata is commonly defined as simply being "data about data." Though the definition is lacking, the concept is not an easy one to describe. Metadata encompasses the various information that describes specific items or objects (often digital), such as: who created it, what format it is in, how many pages it has, physical dimensions, the title, and much more. Though metadata is not exclusive to digital formats and an online environment (our textbook suggests that information such as that found on the outside of grocery packaging can be considered metadata), that is often the medium in which we discuss it. For simplicity's sake, however, and for most of the purpose of my posts, let's consider metadata as the descriptive information about a digital object, such as a digital image or an mp3 file (this latter format I hope to explore much further in future posts).

I am grateful for the excuse to start posting again. I have a tendency to feel that, lacking anything significant, or scholarly, to say, there is nothing that I am able to contribute. Here's hoping that I can prove myself wrong.

Also, please feel free to follow me on Twitter @jdkeyes_1.