Research Tools for the Internet Archivist…
For my inaugural post I thought I’d write something that will hopefully be of some use, or, at the very least, of interest to anyone that has stumbled upon this blog. The topic: internet based archival tools for the modern academic!
Assuming I still have your attention, let’s begin. Before I get to the matter at hand, a bit of context is probably necessary. As a (recently appointed) television scholar I watch a lot of TV. Along with scholarly reading, this tends to form the bulk of my research. However, in recent years there’s been a noticeable shift in terms of what television scholars actually study; indeed, there is an ongoing debate about what actually constitutes “television”. Whereas the TV series was once the main, and sometimes only, point of scholarly discussion, increasingly academics are exploring and examining the prolific constellation of paratexts that tend to accompany series today. For example, Paul Grainge’s edited collection Ephemeral Media: Transitory Screen Culture from Television to YouTube (to which I am a contributor) includes excellent analyses of television idents, adverts, and various other texts that exist beside, between, beyond or below the main text itself. Likewise, Jonathan Gray’s full-length study Show Sold Separately: Promos, Spoilers and other Media Paratexts represents another excellent example of this shift in critical attention.
However, one issue that is rarely addressed within this growing body of research is the process of research itself. Certain paratexts are easy to come by (videogames, twitter feeds, episode recaps, production stills, etc.). Considering that they often serve as promotional materials, it makes sense that they should be easy to track down. However, many of these materials are only available in certain geographic areas and/or for a short window of time. How then is a UK viewer writing about a US series (or vice versa) meant to research the ways in which a television programme is promoted in other territories? Even when television scholars do have access to these materials, how exactly are we meant to gather and store them? Thanks to DVD box sets, television series have, in the words of Derek Kompare, become tangible objects. But what about their associated paratexts? Occasionally these are bundled onto the DVD box set itself (such as Lost: Missing Pieces) but this isn’t always the case. Consider the AMC website for Mad Men which features a plethora of paratexts and changes on a weekly if not daily basis. Moreover, the site features content that doesn’t necessarily translate to DVD. How, then, are scholars interested in these types of materials meant to keep up-to-date if the content changes so regularly? And what about non-official texts related to these series which are also important for this type of research, such as journalism, blogs, or fan-fic. While many of these sources tend to get archived, sometimes they simply vanish into the digital ether. Even if these materials remain available, when we try to find them later on down the line they will often have been relegated to the bottom of the search engine barrel, which tends to privilege “fresh” content over older material.
So, as you can see there are a number of problems for people researching the internet, or researching on the internet. In light of this, I’d like to explore a few of the tools I use or have used in my endless battle with the ephemerality of the World Wide Web.
One the most useful resources for archiving material online is Zotero. Zotero works as a stand-alone application or as a plug in for most popular web browsers. So what is? The company liken their software to “a personal research assistant” which essentially gathers content from the web (or even files stored on your computer) and organises them neatly into a library that syncs across all of your devices. Let me illustrate this with an example: your browsing The Guardian’s website and you come across an article relevant to your research. You have several options: you can bookmark the site in your browser or in a bookmarking plugin such as Delicious; you can log in to your Guardian account and save the page as a “clipping”; OR you can add it to Zotero. Unlike bookmarking or saving to your personal account (which only some websites offer) Zotero creates a “snapshot” of the page so that even if/when it disappears from the face of the internet, it’s still available for you to view (stored locally on your computer). To be more precise, it creates an offline version of the page complete with original links. So, imagine you were researching the promotional materials around a series such as Mad Men and you visited the website today. If you bookmarked the page and returned a day, a week, or a month later, chances are the site would appear differently every time. However, if you saved the page through Zotero it would remain as it did when you first visited – a sort of website time capsule. Whilst this feature may not be important for everyone, it’s certainly one of the most useful when it comes to my own research.
Adding a page to Zotero is only the start – once you’ve done this you can then add meta data, i.e. “tags”, that will help you when you filter through your content using keywords at a later date. For example, if I was writing something about television nostalgia (my current research interest) then you simply type this in to Zotero and it will show you all the relevant content that you’ve ever saved.
Although I haven’t used recent versions (due to switching browser a while ago) it appears that Zotero can now handle moving images – this is especially useful if, like me, you are interested in preserving more than just images and words.
The list of features in Zotero is truly impressive and there isn’t nearly enough room to explore them all here. In short, if you’ve never used Zotero, I highly recommend it.
Another programme that offers similar features to Zotero (but isn’t free) is EndNote. Already well established within academia, EndNote started out life as a citation manager but recently added online archiving to its list of features. Personally I’ve never used EndNote, but if you’re interested ProfHack have written a comprehensive blog post in which they compare EndNote with Zotero.
As with all of the resources discussed here, Zotero has its disadvantages (i.e. it isn’t compatible with all websites – e.g. Twitter) yet it remains one of the most convenient and practical ways of gathering and archiving materials online.
Awesome Screenshot (Browser Extension)
Another (somewhat less sophisticated) method for archiving online content is with the use of a screen capture plug in. There are many available but I find Awesome Screenshot (together with Chrome) the most flexible and user friendly. There’s much less to say about Awesome Screenshot given that it’s really quite straightforward in terms of what it does. Quite simply it will capture any webpage open in your browser, converting it into an image that can be saved remotely or locally. Perhaps the most useful feature however is the option to annotate pages before saving them. All in all, a much more straightforward piece of software than Zotero but very useful nonetheless.
Archive.org (a.k.a. WayBackMachine):
Archive.org s is not strictly a tool, but could be considered more of a resource. The great thing about the site is that it allows you to “go back in time”. While applications like Zotero or Awesome Screenshot enable you to save material as and when you encounter it online, Archive.org allows you to search for snapshots of websites from the past. For example, one of my favourite television shows is HBO’s The Sopranos (which finished back in 2007). Say that I decided to do some research on the series and wanted to see what the website looked like during its original run, Archive.org makes this possible. As long as you know the original URL – and even if you don’t, it’s pretty easy to guess – you can see how the site looked in years gone by. Continuing with the example of The Sopranos, here is what a search of http://www.hbo.com/sopranos yields on Archive.org (incidentally, I used Awesome Screenshot to capture this):
As you can see from the graph across the top, the site was first archived way back in 1999 (when the series was originally broadcast). The bar graph also provides a nice indication of the popularity of the site – assuming that the amount of times it was crawled (i.e. indexed by Archive.org) is based on traffic. Having said that, the site was indexed heavily in 2005 when The Sopranos was on a hiatus. In any case, let’s take a look at an example of the page:
Here’s the earliest available snapshot of the website taken April 27th, 1999 – not long after the first season has finished its initial run.
One of the most useful things about Archive.org is that it doesn’t simply capture screenshots of a website; rather, it captures pages with all their original links (much like Zotero). Although some of these links will inevitably be long gone, remarkably many of them still work (as they too were indexed by Archive.org). In the example above, I can access the fan forums to see what people were saying about the series during (or, in this case, just after) its original broadcast – a very valuable resource for nerdy media historians such as myself!
Whilst the tools that I’ve discussed in this post have been particularly useful to me as a television scholar, I imagine that they are useful for academics more broadly and it’s my hope that by sharing these you might discover some new ways to conduct or organise your research.
If you have any suggestions or experiences please feel free to comment below. I’d love to hear from you.
Posted on August 16, 2013, in Internet Archiving, JP Kelly, Paratexts, Research, Royal Holloway, Television History and tagged Archival, Archive.org, Archiving, Awesome Screenshot, Ephemeral Media, HBO, Internet, Mad Men, Paratexts, Research, Royal Holloway, Television, Television History, The Sopranos, Twitter, Zotero. Bookmark the permalink. Leave a comment.