Monday, February 27, 2006

code4lib 2006

[2006-02-29]
Accessing code4lib IRC channel via Visual IRC. Server is chat.freenode.net and Channel is #code4lib. List of common commands can be found at this irchelp.org page. Simply to sign in, enter: "/join #code4lib".

Here's the conference schedule.


My Notes

Dan Chudnov helped organize the conference. By talking with him and reading his blog postings on the subject, it became clear that this was a great opportunity to make connections and find points of common interest between catalogers and library Web developers.

I was especially interested in going as a member of the Public Interfaces Committee (PIC) and chair of the Catalog Department Vision and Direction Task Force (VTF).

The VTF charge contains several elements, but most relevant right now is the bit about “helping to engage cataloging staff in discussions of new vision and promoting new expanded vision”.

Given that staff size is unlikely to increase, while staff responsibilities continue to expand, there will necessarily be an increasing reliance on information technology as a way to increase productivity. Moreover, the library OPAC is facing increasing competition from the likes of Amazon, Google, Yahoo, etc. (or ‘Amazoogle', if you prefer), and libraries need to do a better job of providing popular features like “Search within a book”, spelling correction/”did you mean” suggestions, relevancy ranking, and book reviews, if they want to stay competitive. We also need to master link-resolving standards such as OpenURL, which is key to metadata interoperability and seamless digital libraries; and FRBR ideas on the deepest level (e.g., exploiting power of citation analysis for relating digital objects and relevancy rankings), i.e., what our users really want and need from us. Persistent unique context-sensitive identifiers underlie the whole enterprise of semantic interoperability.

Another part of our charge is “assessing staff needs and building new expanded expertise in existing staff”. Attending these kinds of conferences can help built our expertise in non-MARC metadata standards, and the way in which they are being used to provide new library services.

Here's the rationale as I put in my travel requisition:


In my capacity as chair of the Catalog Department Vision and Direction Task Force, I have been studying the rise of non-MARC library applications, metadata interoperability, and the growing convergence of traditional cataloging with Web-based information technology. The code4lib conference engages these developments in a highly authoritative, direct, and practical manner, ranging from a visionary keynote address by OCLC's chief research scientist, to hands-on exercises on new metadata tools led by some of the profession's most intelligent and creative practitioners. The goal of my task force is to analyze current metadata trends and opportunities, and provide the best possible support to my department head as she prepares her staff for the future. I believe my participation in the code4lib conference will help me achieve this goal.


In general, I believe that collaboration with information technologists and Web developers will be key to our success.

What Kind of Conference was This?

The first ever code4lib conference was held February 15 through 17 th , 2006 in Corvallis , Oregon . It was worthwhile for several reasons. Most importantly, it brought together many of the best innovators currently working in library software development and systems administration.


I was impressed with the way the conference was organized. The whole event (from initial call for proposals through the 3-days on the ground in Oregon ) was planned within a 3 month period, on a shoe-string budget, and largely by means of a dedicated code4lib IRC (Internet Relay Chat) channel .

The format of the conference was unusual. The two 45-minute keynote addresses, fifteen 20-minute presentations, and two 1 ½ hour break-out sessions were familiar enough, but the 3 groups of 5-minute “lightning talks (where perhaps 30 speakers each gave 5 minute presentations) was something I hadn't seen before. I liked the fact that people could sign up to deliver their lightening talks literally at the last minute. Since there were only 80 persons altogether in attendance, and a substantial number of them ended up standing on the stage and presenting their ideas, the whole event became much more interactive, generating an unusually high level of audience ‘buy-in' and commitment.

Selected Topics of Interest

Evergreen Team

The keynote address for the first day was given by the team developing Evergreen open source library system for the PINES consortium in Georgia . Starting out as a Y2K bug correction project, the goal then changed to fostering a completely open-source ILS, independent of commercial vendors. Why important? The PINES consortium decided to break its dependency on commercial software vendors. This means they can more rapidly implement customer driven features, and re-purpose their data without first having to ask permission. Having access to all source code means they can forge ahead with R&D and adopt plug-ins as needed. Collaboration with cataloging staff would be ideal, I think, because decisions about how to structure metadata will have a great effect on what can be done later by programmers and database administrators.

Dan Chudnov imagined Connecting Everything with unAPI and OPA . Stated that “Remix culture is unstoppable.” Invoked metaphor of the dial tone, always there when you pick up the phone, doesn't matter what model of phone you have, everyone is connected. Hoping to do something similar with APIs, so that copying and pasting from one Web app to another becomes effortless. Sort of like a desktop ‘clipboard' but on the API level (?). Wants to start off by developing ‘copy' function. What is unAPI? It's a 2 page spec, based on ROGUE 05 rules, with emphasis on code that works. The idea is to provide URIs for microformats for identifying objects on Web pages, with <link> for auto-discovery, i.e., HTML-embedded URI metadata . Example given was FLICKR page with concert ticket stub images, parsed in OPA, multiple choice somehow allowed for MODS role in FLICKR. Dan sees this as possible replacement for COinS PMH.



Jeff Young discussed the OCLC WikiD (Wiki/Data) project. With emphasis on need for exploiting OpenURL 1.0, which “gives us a single consistent API for performing any and all services that reference these items”. See WikiD demo at http://alcme.oclc.org/wikid/ ; project page: http://www.oclc.org/research/projects/wikid

Jim Robertson
In one of my favorite presentations (partly because I think I actually understood most of it), Jim Robertson of the New Jersey Institute of Technology addressed the topic: Lipstick on a Pig: 7 Ways to Improve the Sex Life of your OPAC . He talked about his efforts to tweak the NJIT Voyager implementation to include: (1) book covers; (2) book reviews, (3) live circulation usage history, (4) recommendation engine (e.g., “others who borrowed this book, also borrowed the following titles ….” ), (5) RSS of journals tables of contents, (6) live librarian support (integrated into OPAC), (7) shortcut, durable links (PURLs) to specific items. This is done partially through Cold Fusion and lots of data imported en masse from Amazon. (Amazingly, Amazon doesn't seem to mind. And there are books that teach you how to exploit them for all their worth, e.g., O'Reilly's Amazon Hacks . One slide exclaimed: “Don't catalog. Resolve!”

Robby Robson
of EduWorks Corp. discussed Standards, Reusability and the Mating Habits of Learning Content – I didn't get much out of this one. Something about need for SCOs (Sharable Content Objects), and SCO editor that can convert XML into DHTML, converting motion pictures into still shots? Get things released from Adobe format lockup (e.g., as illustrated through helicopter example?).

We then spend 1.5 hours in various Breakout Sessions . I attended the one that extended on Dan Chudnov's talk. Asking for help meeting ROGUE 05 deadline, getting new release of unAPI out to the public. Mostly interested here in improving ‘copy' feature. Heavy lifting seems to be based on HTTP Status Code 300 (“Multiple Choice”). This may be the closest I came to a hands-on learning experience, but mostly I was confused during this session.

Lightning Talks also had a hands-on quality to them. Some of the 5-minute talks were simply demos of solutions to coding problems. CoiNS (ContextObjects in Spans) were discussed, and fact that Open WorldCat is to have them by March. Other things I didn't quite understand. Raymond Yee talked about Scholars Box that allows users to build personal collection and create simple slide shows out of disparate digital objects. LITA journal editor invited article submissions; Edward Corrado talked about the fact that patrons are not necessarily patronizing our OPACs, so we need to bring catalog data to where they already are, e.g., courseware, portal, rss 23.0, generated from catalog via PERL script, e.g., streaming data about new books via RSS, push technology.

Thom Hickey
Chief Scientist at OCLC ( http://hickey-to.oa.oclc.org.8080 . , spoke about 1,000 Lines of Code, and other topics from OCLC Research . www.errol.oclc.org/laf/n82-54463.html , lightweight OAI harvester in 50 lines of Python really works. Idea Hickey was exploring: “Google Suggest” anticipating your intentions as you type, dynamically. Trying to apply to VIAF?; Project w/ Phoenix Public Library w/FRBR, collecting records into works, VIAF browser, matching phrases following model of Google Suggest, with top categories generated from Dewey XML. . Increased speed achieved through data placement in memory tables (though this is not scalable).

Colleen Whitney
discussed Generating Recommendations in OPACs: Initial Results and Open Areas for Exploration – Basic idea: “patrons who checked this out also checked these other things out”, based on analysis of circulation records, with possibility of weighing faculty data more heavily. 3 to 5 recommendations returned considered most useful. Underbelly consists of AJAX , PERL, etc. Limitation of of circ analysis technique is that only useful for 25% of collection that actually circulates. Project is Mellon-funded. Doesn't catch STM circulation stats, since usage is largely online and through article aggregators, not tracked. through circulation module. Also problem that only 30% of bib records have ISBNs.

Ryan Chute of LANL dissected the Anatomy of aDORe , and discussion of “XMLtape, i.e., concatenation of valid XML content. This is lower level repository, i.e., the ‘plumbing' the result of us shouldn't really have to worry about. Highly technical discussion of largely (for me) unfamiliar territory that still manages to hold my attention.

Raymond Yee
on Teaching the Library and Information Community How to Remix Information . Yee is Technology Architect, Interactive University Lecturer, School of Information at UC Berkeley, and developer of Scholar's Box. http://www.sims.berkeley.edu:8000/academics …; and importance of “learning by doing”; Step one: “Learn one application really well”, then move onto to remixing projects. Project =based learning is important. Flickr chosen for class projects because it has great API, and is poster child for Web 2.0. See Flickr hacks book just published. FLICKR gmap button grabs map then use flyover button (via Google Earth). Mentioned that Ann Arbor has catalog within xml wrapper.

Roy Tennant
The Case for Code4Lib 501c(3) – i.e., registering with the IRS as a tax exempt, charitable organization, would facilitate applications for Mellon, ILMS, etc. grants, and provide some liability protection. Drawbacks include lots of paper work, regulations, IRS scrutiny. Topic continued in Breakout Sessions , where consensus seemed to be to practice benign neglect for the time being. Jeremy opines, better to wait until water is boiling before throwing pasta into pot; right now, it sonly simmering, better to wait. Also asked to consider partnership with ResCarta (John Sarnowski), or Cooperative structure such as OCLC. In breakout, there seemed to be much enthusiasm for something based on the Rare Book School model .


Lightning Talks 2 included info on bookmarklets that pair soft cover with hardcover ISBNs, ask Jeffrey Young for more information ( FRBR application?) ; another (Aaron) on using vendors to give us more access to our own data to perform our own metrics; renegotiate relationships, access to vendor subscription databases, such as what Ryan says he already has with Thompson Elsevier. ; NJIT's Jim Robertson on exposing faculty research via Scopus DB, where vendor allowed modification in contract, win-win situation, good publicity for Scopus and good for faculty.; Keren Combs talked about branding issues, Cold Fusion, MySQL, WordPress, and adding pages to Moveable Type, user categories …; Sarnowski ( ResCarta ) on importance of standardizing image production and metadata storage for libraries and archives, whose work is sustained by a Gelatt Family grant. , mentioned that in 1995 3.5 million images were generated by the Making of American Collection, but different metadata formats are creating silos, and we're not using enough open standards. This foundation is trying to fill gap.; Also: Terry Reese on latest enhancements to MarcEdit , with default now I UTF-8, and ability to change language of interface (thanks to volunteers), and lots of cross walks. OAI Harvesting editor, new/enhanced Z39.50 utility. MarcEdit can be used as .net library. dchud on why we should support free software foundation (consider IRC on freenode.lib, which hosts code4lib IRC for free.

Tigran Zargaryan
Yerevan State University, head of Automation, spoke on Practical Aspects of Implementing Open Source in Armenia – localizations of open source software, such as Mozilla, Open Office 2.0, Moodle (learning mgmt system), Greenstone , phpbb (?), ILIAS (Learning mgt. system) KDE 3.5.1 desktop environment (i.e., instead of Windows), etc. Nicely done presentation, and inspiring, given what he's accomplished with limited resources.

Casey Bisson on What Blog Applications Can Teach Us About Library Software Architecture – “I love Open Search”. Amazon API leads to substitute OPAC; His choice is WordPress OPAC. taking advantage of entire WordPress API, so easy to COinS, plus Amazon and Delicious info , into the mix.

Lightning Talks 3 including Hickey's announcement of open source software for libraries context. Also “Native-xml database demo from Al Cornish, supporting XPath and indexing, Northwest Digital Archives Project, See Ronald Bourret's Web site for more details.

Read more...

Wednesday, February 01, 2006

OSS for Libries (Middletown)

Signed up for workshop on Open Source Software for Libraries (3/17/2006) at Middletown Library Service Center Computer Lab. Instructor is Brian Kissela, systems administrator at Mt. Holyoke College Library. Need to remember directions to Middletown Service Center.

Read more...