PDA 2012: Media Types
Second morning session: media types. Let’s see what we have to talk about now.
Processing and Delivering Email Archives in Special Collections using MUSE Peter Chan from Stanford University
Email archiving is important, but there are many challenges: copyright and privacy, sensitive information, description, and delivering. So how do you bulk process/archive emails? Description is especially important and difficult because we must include useful metadata and description in order to make the email archives useful for people.
MUSE is a project at Stanford for email archiving and actually do something useful with the emails. Can do sentiment analysis and also group analysis. [I’ve used this before and it is quite fun.] Can also look at image attachments as a slideshow. Lots of very cool improvements on MUSE since the last time I used it. Very cool.
Processing emails with MUSE: edit pre-built lexicon and screen for sensitive information and mark for restriction, group by known projects, conferences, etc. and can use MUSE functions to create usable archives at the institutional level. Deliver metadata about the emails on the web via summary information, sentiment visualizations, etc. In the reading room, can deliver individual emails and attachments. Gaps: sophisticated search, original view via the creator’s email folders/tags, delivery mode for metadata, lexicons, and foreign language support.
parallel-flickr Aaron Straup Cope
Link to information: parallel-flickr appendix
“For all intents and purposes, no one backs up their photos.” Flickr has a lot of trust from users and people just assume that their photos will always be there. But we really need backups because every system fails at some time.
parallel-flickr uses the Flickr API to pull out the photos and photo information. The source files are then fed into the database and then uploaded to the website. Also can pulls in photos you favorite on Flickr. You can use parallel-flickr just for yourself or for sharing with others.
Note: Seems very interesting and important, but I’m just not following this talk. I need to go through his extra information after the conference to get a better handle on this.
Pinboard.in founded in 2009, 9 million archived bookmarks, and 4 TB stored web content. “The search engine does not replace the need for your own bookmarks.” Archive bookmarks because link rot is a large problem. By archiving your bookmarks, you’ll be able to get to what you want (you can sign up for this extra service through Pinboard). Challenges to getting the content: adversarial servers (paywalls/authentication, sessions, streaming content, geoidiocy), desperate advertisers (hyperpagination, interstitials, URL shorteners, IP law), and inner platform effect (dynamic loading, infinite load, #!hashbang URLs, third-party comments, Flash).
Take Home Message
Email archiving is important and MUSE makes these large email archives actually usable (and makes fun visualizations). MUSE is still being developed, but is already cool and useful project. Back up your data and files. Check out Pinboard for archiving your bookmarks.