Friday, December 30, 2005

XML, MODS, etc.

Random notes on applying XSLT to XML docs...


Install Saxon in C. saxon8.jar is principle file and should be placed in XPath (cf. Web Logic or better Dynamic Apps): "The class path is normally represented by an environment variable named CLASSPATH: see your Java documentation for details. Note that the JAR file itself (not the directory that contains it) should be named on the class path." (from installation notes).

Save xml and xsl files in "data" folder under SAXON, then point Firefox to the xml and the processing takes place automatically. Tryiing to use Saxon itself though is a problem. Its author Michael Kay wrote the book (see below) but also some articles for IBM DeveloperWorks. One is called "What Kind of Language is XSLT" and the other is "Saxon: Anatomy of an XSLT Processor"


Having a real hard time retracing the steps from Patrick Yott's XML class. Some files missing. Try replacing with samples from Rutgers page. I think the key challenges are these: (1) Save MARC data file in program readable/parsable format; (2) Convert MARC to MARCXML or MODS using Terry Reese's MARCedit or some other mapping tool; (3) Apply stylesheet to convert XML to HTML.

Voyager can save bibs as .mrc files to desktop: Voyager--Options--Preferences--Folders/Files--Save to Local File--File--:\documents and settings\dlovins\desktop\avantgarde\export.mrc

Terry Reese's MARCedit program can do things like convert MARC files (with .mrc extension) into MARCXML):

W3C's XML page; note XSL working group page, and then the bit on XSL transformations might be worth reading more closely.

Here's XML course home page courtesy, I think, of Brown University Library Center for Digital Initiatives (CDI) This is distillation of 5-day ARL workshop, including CML, XSLT, PERL, and Web architecture. Contact: Yott is librarian, head of CDI, heavily invested in XML (insufficient funds to purchase Encompass). Recomends getting rid of OPAC, build our own simple version via xml, thinks we'll be getting there eventually. Get rid of III.

Further reading:; Michael Kay's book XSLT 2.0 Programmer's Reference; ask Patrick for style sheets. visit Subscribe xml4lib. ("has fabulous tutorials"). Pat teaches ARL-sponsored course, 3 days intensive and far more expensive.

Worth checking LC Standards page on regular basis, e.g., XML MARC instance, with which one could, as Brown digital library , could take MARC record out from OPAC, run through style sheets to convert MARCXML to MODS

XSLT: Extensible Stylesheet Language [for] Transformations. Handled through stylesheet engine, takes source xml file, with xslt file, and spits out html browser readable files. This class to use SAXON stylesheet engine, obtained through sourceforge, written by Michael Kay, worth buying his book on XSLT ("only book I keep on my desk"). At Brown, since we're PHP based, we use someting called Sablatron, also free, from Can be compiled into PHP on Webserver or command line.

XML not good tool for all tasks, e.g., OED takes gigabyte of XML. But for smaller docs it's good. What Yale does for EAD database is just put html bitstreams on their own servers keeping original xml files in 'vault'. Otherwise one could creat HTML on the fly through stylesheet engine, so permanent HTML doesn't really exist. Problem with OED example is that it forces user's browser to do heavy lifting.

Class example: pie.xml is called 'source tree', while HTML documents are instances of 'results tree'.

Today's goal is to get handle on rudiments of XSLT documents.

Every XML doc has one thing in common: root element, i.e., the only element with no parent. It can have any number of children. Grandchildren are called 'descendents'. Going the other direction you get 'ancestors'. There are siblings but no cousins. Paradoxically, attributes (circles in Venn diagram) are not 'children' of attached element; but elemenet *is* parent of attached attribute.
Use "Wonkavator".

Imaginary node behind root element called "root node", which is it's parent. This is truly the one thing that all xml docs have in common, since it doesn't really exist, little box with diagonal lines. In Unix, root node is represented by slash. Two dots and a slash .. / are the way to go up a level in the directory tree. X-path: go up, go up, and from here, down into title [../../title].

Style sheet always begins (though not required) with search for root node. Stylesheet is based on templates. Says: look for root node and apply template. My template for root node might be: 'print out 'hi mom''. Or a template can invoke other templates, or activate "Wonkavator". Default rules (keep going through the tree) only apply to elements, finally ending with text. Attributes are not elements, so only a specific rule would allow attribute text to be printed. Otherwise, text is default child of its element.

Templates are children of stylesheets, never children of other templates.

For "Run XSLT Transformation" clip, don't use extensions, since system-supplied. HTML file name is arbitrary. In current example, browser output is "blivet", everything smooshed together. Only spaces to appear are those that are embedded in pc-data. Templates over-ride default rules; but default rules can come in handy.

"Wonkavator knows every node in the source tree"

No comments:

Post a Comment