In our last post we discussed the four XML pieces that form the content of Linus Pauling Day-by-Day. Once properly formatted, these disparate elements are then combined into a cohesive, web-ready whole using a set of rules called Extensible Stylesheet Language Transformations (XSLT). This post will delve more deeply into XSLT and how we use it on the Linus Pauling Day-by-Day site.
Creating the Various HTML Pages
A single XSLT file controls the majority of the actions and templates that make up Linus Pauling Day-by-Day. Several Windows batch files, one for each year in the Calendar, control the calling of the calendar stylesheet with the calendar year XML. After some initial processing, the stylesheet begins to output all of the various pages. First, the main Calendar Homepage is created. This can be redundant when processing multiple years in a row, but ensures that the statistics and links are always up-to-date. Second, the Year Page is created. This pulls in extra data from the Pauling Chronology TEI file, specific to the year, displays a summary of the Paulings’ travel for that year, provides a snapshot image for the year, and lastly presents the activities for the year that don’t have a more specific date. An additional page is created that provides a larger image and some metadata for the snapshot document.
Then, acting on the months present in the data file, a Month Page is created. This presents the first calendar grid (how this is done is explained in the next section) followed by all of the days in that month that have activities and document data. If there is data about where the Paulings traveled, that is displayed in the calendar grid to give a visual overview of their itineraries. Finally, a Day Page is created for each day present within a month. The stylesheet simply acts on the days listed in the calendar XML file, and does not try to figure out how many days are in a given month in a given year. This style of acting on the data provided, instead of always doing things to a certain size or range, is a part of the programming paradigm of functional programming. The Day Page features a large thumbnail of a document or photo, a smaller calendar grid, with travel information for the day displayed below if present, and then the activities and documents for that day.
Another page is then created with larger images of the document, additional pages if present, and some metadata about the document. As most of the output pages have a similar look and feel, there is a set of templates that handle the calendar navigation menu. Depending on what context the navigation is needed for (Year Page, Month Page, etc.), the output can adjust accordingly.
Computing the Calendar Grids
An important part of the look and feel of the project from early in the planning stage was to have familiar and user-friendly navigation for the large site, which meant that the traditional calendar grid would play a big role.
However, the data that we’ve amassed at the day level doesn’t have any information like the day of the week. Beyond the 7 days of the week that January 1 could fall on, the grids for each month are complicated by whether or not a year is a leap year, resulting in 14 combinations that don’t occur in a regular pattern. If you’ve ever looked at a perpetual calendar, you’ll know that the grids are deceptive in how simple they look.
An initial design goal was to have the calendar stylesheet be able to handle all of the grid variations with only minimal help. We didn’t really want to have to prepare and store lots of information for each month of each year that the project would encompass. For each year, all the stylesheet needs is the day of the week that January 1 falls on (a number 1-7, representing Sunday-Saturday in our case), which is stored in the year XML, and then it can take that and the year and figure out all of the month grids. Fitting the algorithm for this into the XSLT stylesheet code is one of the more complex coding projects we’ve worked on.
It took several hundred lines of code, but we haven’t needed to mess with it since it was first written, even as we’ve added years and expanded the project. With the help of new features in XSLT version 2 and several more years of experience, the code could be rewritten to be cleaner and more efficient. However, because it was so reliable, time on stylesheet development was spent elsewhere.
Travel Summary Display
In the calendar XML data, each day can contain a basic <travel> element that states where the Paulings were on that day. This information is gleaned mostly from travel itineraries and speaking engagements, but also from correspondence or manuscript notes, and usually represents a city. On the Year page, we wanted to present a nice summary for the year of where the Paulings traveled to, and how long their trips were. Because the data was spread across days, and not already grouped together, it was a challenge at first to get trip totals that were always accurate and grouped appropriately. Using the new grouping features in XSLT version 2, the various trips could be grouped together appropriately, and then displayed in order. A range could be computed using the first and last entry of the group, and linked to the day page for the first entry. Now if you see an interesting place that the Paulings visited, you can go directly to the first day they were there, and see what they were doing that day. If more than one day was spent in a given place, a total is displayed showing how many days they were there.
METS for Documents and Images
The last post covered what METS records are and how we are using them. Because the files are fairly complex and have additional data that we aren’t using, the calendar stylesheet abstracts away the unneeded info and complexity. A temporary data structure is used to store the data needed, and then the calendar stylesheet refers to that in its templates, instead of dealing directly with the METS record and the descriptive MODS metadata. This approach is also used in the stylesheets for our Documentary History websites, and portions of the code were able to be repurposed for the calendar stylesheet.
As covered in the last post, the transcripts are stored in individual TEI Lite XML files. In our calendar year XML data, a simple <transcript> element added to a listing conveys the ID of the transcript file. The stylesheet can then take this ID, retrieve the file, apply the formatting templates, and output the result to the HTML pages. We use XML Namespaces to keep the type of source documents separate, and then the XSLT stylesheet can apply formatting to only a specific one. So, if we wanted to apply some styling to the title element, we could make sure that it was only the title element from a TEI file, not from a METS file. This allows us to have a group of formatting templates for TEI Lite files in a separate XSLT file, which can be imported by the calendar stylesheet, and none of it’s rules and templates will affect anything but the TEI Lite files. Since the code for the TEI Lite files and transcripts was already written for earlier projects, very little stylesheet code (less than 10 lines) was needed to add the ability to display nice, formatted transcripts.