Creating The Pauling Catalogue: Special Features

An image of young Pauling, used as an illustration in Biographical subseries 1.

An image of young Pauling, used as an illustration in Biographical subseries 1.

[Part 8 of 9]

Thanks in part to a number of special features that have been incorporated into the published Pauling Catalogue, the finished product is far from a simple listing of archival holdings.

For starters, each volume contains an introduction by either a major historian of science, a member of the Pauling family or a staffmember of the OSU Libraries Special Collections. Authors include two of Pauling’s biographers, Robert Paradowski and Tom Hager, as well as Robert Olby, the pre-eminent historian of DNA and the author of a forthcoming biography of Francis CrickMary Jo Nye, OSU history professor emeritus and a recent recipient of the Sarton Medal, also contributed a text, as did Linus Pauling, Jr., Linda Pauling Kamb and Barclay Kamb.

Volume One contains a forty-five page Timeline, enhanced with dozens of full-color illustrations, that chronicles the remarkable lives of Linus and Ava Helen Pauling. The Timeline was written by Robert Paradowski and, previous to its appearance in The Pauling Catalogue, had only been available in a very rare Japanese publication titled Linus Pauling: A Man of Intellect and Action. (so rare, in fact, that the only copy listed in WorldCat is the copy residing in the OSU Libraries Special Collections)  Short of the various Pauling biographies that have been written over the years, the Paradowski Timeline is, perhaps, the authoritative encapsulation of the Paulings’ life and work —  It’s inclusion is a terrific boon to The Pauling Catalogue.

An excerpt from the Paradowski Timeline, which appears in Volume 1 of The Pauling Catalogue.

An excerpt from the Paradowski Timeline, which appears in Volume 1 of The Pauling Catalogue.

Volume Two includes sixteen illustrated pages of extracts from Linus Pauling’s Oregon Agricultural College diary, written by the young freshman during the first months of his undergraduate pursuits in 1917 and 1918.  As noted in the introduction to this appendix:

Perhaps the most interesting of all the personal narratives in the Pauling collection is the sixty-three page “Diary (So-Called)” that a young Linus kept from August 1917 through the first several months of his freshman year at Oregon Agricultural College. The OAC diary provides an unusually candid glimpse into the life and personality of a typically uncertain teenager as he leaves the familiarity of home in pursuit of an advanced education. Along the way the reader learns of a photography-processing business that Linus and two friends attempt to establish, and likewise of a minor burn “caus[ing] the formation of blisters fully 1/3 cm. diameter on each of the four fingers of my dextrum.”

Indeed, the OAC diary contains a wide array of the young Pauling’s thoughts and adventures: the happy accident of quite randomly finding a slide rule while walking through a field; the palpable fear summoned in anticipation of impending undergraduate studies; the first pangs of a developing crush on an OAC co-ed named Irene Sparks, whom Linus quickly annoints as “the girl for me.”

A sample of Pauling's OAC diary.  Though his track and field pursuits did not yield much fruit, Pauling would indeed make the acquaintance of Troy Bogart -- a fellow member of Delta Upsilon fraternity.

A sample of Pauling's OAC diary. Though his track and field pursuits did not yield much fruit, Pauling would indeed make the acquaintance of Troy Bogart -- a fellow member of the Gamma Tau Beta fraternity. (later to become Delta Upsilon)

Each of the six volumes contains at least eight pages of color illustrations, as well as a full index listing of all illustrations that appear in a given volume. Volume Six concludes with a Technical Note and a Colophon, which explain the processes used in creating the The Pauling Catalogue and which have served as the foundation for many of the technical blog posts developed in this series.

The Pauling Catalogue

The Pauling Catalogue

Ultimately, it is our hope that the inclusion of these special features combine to add value to the finished project; to form a reference work that is as complete as it is authoritative.

The Pauling Catalogue is available for purchase at http://paulingcatalogue.org

Advertisements

Creating The Pauling Catalogue: Page Design

Example text and image from The Pauling Catalogue scrapbooks series

Example text and image from The Pauling Catalogue scrapbooks series

[Part 7 of 9]

Once the publication’s text had been encoded and its illustrations selected, the next major challenge in creating The Pauling Catalogue was the actual design of the publication, page-by-page and volume-by-volume.  This process was carried out chiefly through the skillful implementation of Adobe’s InDesign software.

Having marked-up the raw text of the publication in XML, the catalogue’s textual content was ready to import into InDesign.  The result of this import, however, was a large mass of largely-unformatted text.  As much as possible, various characteristics were assigned to groups of text based upon a given group’s location along the xml heirarchy.  In this, specific sets of data were styled automatically through a pre-determined set of formatting rules specifying font, color and spacing rules.

The illustration below is an example of the output generated by this process.  The top level of the hierarchy for this series is the series itself, “Biographical.”  The second level of the heirarchy is the subseries, in this case “Personal and Family.”  The third level is a box title, and the fourth level is a folder title.  The illustration depicts the styling characteristics that were assigned to levels three and four of the hierarchy in the Biographical series — all box titles were formatted in red, all folder titles were formatted in black, and each had its own spacing rules.

Personal and Family

An example of the styling output in Biographical: Personal and Family

Formatting the publication’s illustrations was a significantly more complex proposition.  During its initial design phase, a placeholder image template was inserted on each page of The Pauling Catalogue. These templates consisted of three “boxes” meant to hold printing material — one box for the illustration, one box for the illustration’s catalogue identification number and one box for it’s caption — as well as two additional “boxes” in which non-printing design notes (an instructional note for the designer indicating where the image should be located on a given page, and an abbreviated description of the image used to generate illustration indices published at the back of each volume) were inserted. Identifier and caption information for each illustration was imported directly into these image templates from a series of master Excel spreadsheets. For pages not containing an illustration, the placeholder templates were later removed.

A representation of the multiple "boxes" that comprised the image template used for each of the 1,200+ illustrations in The Pauling Catalogue

A representation of the multiple "boxes" comprising the image template used to format each of the 1,200+ illustrations incorporated into The Pauling Catalogue.

A great deal of image correction was likewise conducted to remove flaws — dust speckles, for instance — from the selected illustrations.

A particularly extreme example of the image correction often required in the formatting of the illustrations used in The Pauling Catalogue.

A particularly extreme example of the image correction often required in the formatting of the illustrations used in The Pauling Catalogue.

A few original graphics were also created for this project, most notably the Pauling Catalogue badge designed for presentation on the cover of each volume.

The Pauling Catalogue badge and the Pauling Papers logo -- both are used as design elements throughout The Pauling Catalogue.

The Pauling Catalogue badge and the Pauling Papers logo -- both are used as design elements throughout The Pauling Catalogue.

Manual corrections were made to minimize “widows” and “orphans,” and a few additional manual changes were made directly in InDesign to correct small problems that would not be efficient to address in XML or XSL.

The Pauling Catalogue

The Pauling Catalogue

All design decisions were made with the overarching, two-pronged goal of this project kept in mind: 1) to disseminate scholarly information in a clean and useable manner and 2) to create a product that is aesthetically pleasing, browseable and of interest to a broad audience. While the primary market for The Pauling Catalogue is presumed to be academic libraries and history departments, we feel that the finished product is likewise at home on the coffee table or living room book shelf.

The Pauling Catalogue is available for purchase at http://paulingcatalogue.org

Creating The Pauling Catalogue: Typography and Proofreading

A sample of the Pauling Catalogue page layout. Vol 1, pg. 47

A sample of the Pauling Catalogue page layout. Vol 1, pg. 47

[Part 6 of 9]

As work on The Pauling Catalogue moved further in the direction of what would become the finished product, one surprisingly difficult set of decisions requiring action concerned the typography of the set’s 1,700+ pages.

After much research, two typefaces – Palatino Lynotype and Myriad Pro – and ten fonts were purchased for use in the publication. The purchase of these multiple font options was prompted by the need for a vast library of “special characters” (e.g. certain scientific symbols and non-Roman alphabetic characters) for use throughout the project. As mentioned earlier in this series, coping with the challenges presented by special characters was in part enabled by the use of XML.

XSL was likewise enlisted in the battle against rogue special characters.  Part of what is depicted in the illustration below is the report that was generated by a custom XSL transform written to search for unsupported characters in the hundreds of thousands of lines of XML code that comprise the The Pauling Catalogue dataset.  Characters for which no supportive font library could be found were displayed by the report as the symbols which have been highlighted in pink.  This XSL-based approach worked effectively in identifying problematic areas of text within draft versions of the six-volume set.

An example of the XSL reports used to locate "missing" special characters.  Inset are two examples of special characters created by hand.

An example of the XSL reports used to locate missing special characters.

It is worth noting too that, even with two typfaces and ten fonts working on his behalf, the project team’s graphic designer was still, in a few instances, forced to create certain symbol glyphs by hand.  Two such examples are spotlighted above — the Georgian letter “vin” and the scientific “double-arrow” symbol representing a system in equilibrium.

A great deal of proofreading was already built into the catalogue files as a result of nearly two decades worth of editing and spellchecking in WordPerfect.  Six local drafts of The Pauling Catalogue prototype were printed out on the OSU campus over the eighteen months that the editorial staff spent developing and refining the project. Each of these drafts was line-edited by the indispensable Special Collections student staff, with special attention paid to anomalies caused by special characters. An example of the reams of notes that the students compiled is included here.

Our students did an outstanding job of proofing the six local version drafts.

Our students did an outstanding job of proofing the six local-version drafts.

Image captions, page headers and prefatory materials were closely reviewed by the editorial staff.  That said, a few big mistakes nearly made their way into the finished project.  Can you spot the error below?  We didn’t until our review of the project bluelines – the last possible point at which changes could realistically be made!

The Pauling Catalogue

The Pauling Catalogue







The Pauling Catalogue is available for purchase at http://paulingcatalogue.org/

Creating The Pauling Catalogue: More than One-Thousand Illustrations

In 1931 Linus Pauling was the first recipient of the American Chemical Society's A.C. Langmuir Award, an annual recognition of the best young chemists in the U.S. This cartoon was published in the Double Bond, Jr., a satirical newspaper produced in conjunction with the A.C.S. meeting that year.

[Part 5 of 9] The Pauling Catalogue contains over 1,200 illustrations in its 1,700+ pages of text. The long process underlying the selection of these images was based upon two fundamental guiding principles.

First, it was the goal of the editorial team that The Pauling Catalogue be used to display certain of the more important documents and artifacts held within the Pauling Papers.  Accordingly, annotated reproductions of such noteworthy items as Rosalind Franklin’s famous “Photo 51,” Watson and Crick’s original DNA structure typescript, and Pauling’s legendary “peace placard” are all included.

Of near equal importance was the desire to use image descriptions to tell some of the fascinating but less well-known stories imbued within the Pauling biography.  Part of the archivist’s mission is to provide context for the documents held within their collections.  The editorial team sought to achieve this end by composing extensive captions for a number of illustrations that, on the surface, would not seem to be altogether very interesting.

Two fascinating examples are included below. From the Pauling publications bibliography in Volume I: From the Pauling Honors and Awards listings in Volume III:

In certain other instances, custom illustrations were created by the project team for exclusive inclusion in The Pauling Catalogue.  This composite view of many of Pauling’s medals, plaques and certificates is a perfect example:

18 awards composite

The source images for this illustration are freely available on the web at the Linus Pauling: Awards, Honors and Medals website. The composite image was created using an Excel spreadsheet and a custom PerlScript, which randomized the images. Once randomized, the images were then imported into an InDesign grid with this final composite graphic as the output. Image courtesy of Eric Arnold.

Finally, image series were included throughout the publication to great effect.  The following example is particularly interesting in its depiction of the wide-variety of content included in the Ava Helen and Linus Pauling Papers:

An example of the remarkable diversity of content- and format-types in the Pauling collection.

An example of the remarkable diversity of content- and format-types in the Pauling collection.

Illustrations were selected, scanned and organized using Excel spreadsheets. Each spreadsheet contained information on a selected item’s catalogue identification number, its location as an illustration within the published catalogue and the caption text written for the image.

An example of the Excel spreadsheets used to establish intellectual control over the 1,200+ illustrations used in The Pauling Catalogue.

An example of the Excel spreadsheets used to establish intellectual control over the 1,200+ illustrations used in The Pauling Catalogue.

Documents were scanned with a goal of achieving a minimum print resolution of 300 dots per inch, meaning that certain very small artifacts (slides, for example) required very high scan resolutions – upwards of 2400 dots per inch. As a result, the final tally of 1,200+ image scans required a sizeable amount of storage space – more than 36 gigabytes in total.

A peek at the file directory structure for a portion of the images scanned and used in The Pauling Catalogue

A peek at the file-directory structure for a portion of the images scanned and used in The Pauling Catalogue

The Pauling Catalogue

The Pauling Catalogue

Close to 350 hours were logged discerning and negotiating copyright permissions for items not controlled by the OSU Libraries. This process was made all the more difficult by the fact that many of the items in the Pauling photo collection are classified as “orphan works,” e.g. images for which little or nothing is known concerning copyright provenance.  The project team’s rule of thumb was to conduct due diligence in pursuing contact information for any illustration, no matter how old.

In other instances, archival context was added to image scans to enhance a given illustration’s fair-use characteristics.

Lastly, a small number of illustrations were purchased for one-time print use. (Which means, unfortunately, that we can’t show them off here!)

The Pauling Catalogue is available for purchase at http://paulingcatalogue.org

Creating The Pauling Catalogue: Formatting Text with XML and XSL

The text formatting cycle used in the creation of The Pauling Catalogue

A depiction of the text formatting cycle used in the creation of The Pauling Catalogue

[Part 4 of 9] One of the earliest and most pressing questions that the project team had to answer in constructing The Pauling Catalogue was how to go about formatting the text of such a massive document. The catalogue had been generated over many years as a series of WordPerfect word processing documents. While the word processing interface worked nicely in developing working documents, moving the catalogue data out of WordPerfect and into a flexible format more suitable to a professional printing operation was a significant challenge.

Ultimately it was decided to format the text data using Extensible Markup Language (XML) and Extensible Stylesheet Language Transformations (XSLT).

XML is an encoding schema that adds machine-readable value to existing data. Using a series of tags applied hierarchically throughout a given data set, XML greatly enhances one’s ability to manipulate data in useful, uniform ways. This manipulation of XML-encoded data is implemented using XSLT. In a nutshell, XSL transformations consist of sets of rules that locate specific pieces of data and then either order the data pieces in a certain prescribed way or hide the data pieces entirely.

The seventeen series that make up The Pauling Catalogue were each encoded in XML and manipulated – sometimes subtly and sometimes severely – using XSLT. The importance of this process to the creation of the end product is difficult to overstate. A perfect illustration of the power of XML and XSLT is provided by the Pauling Personal Library series in Volume 6 of The Pauling Catalogue.  Linus and Ava Helen Pauling’s personal library contains over 4,000 volumes and the published bibliography of all these items is 178 pages long. The XML mark-up for each book is shown here:

The XML encoding schema for one of the 4,000+ books in the Pauling Personal Library

The XML encoding schema for one of the 4,000+ books in the Pauling Personal Library

When the personal library was originally encoded for display on the web, all of the volumes that make up the series were arranged according to Library of Congress classification number.  As the details of The Pauling Catalogue publication were being determined, a decision was made that the books in the Personal Library would be more useful to users of a paper reference if each item were presented alphabetically by authors’ last name.

Carrying out this re-sort process by hand would have taken a very long time, as each book listing would require human “cutting and pasting” intervention to reorganize the records from call number order to alphabetical order.  However, because the content of the personal library had been described in XML, which is a machine-readable format, a series of new XSLT rules were instead utilized to automate the re-sort:

A series of rules written in XSL was used to re-sort the Pauling Personal Library arrangement.

A series of rules written in XSL was used to re-sort the Pauling Personal Library arrangement.

Consequently, a process that would have taken many days, if not weeks, to conduct “by hand,” was instead completed with a few hours of nimble XSLT coding.  The resulting differences from the version 2 proof to the version 6 proof of The Pauling Catalogue are immediately apparent:

From working version 2 to working version 6 of the publication, significant arrangement changes were made to the Pauling Personal Library

From draft version 2 to draft version 6 of the publication, significant arrangement changes were made to the Pauling Personal Library

Another major benefit of XML is the standard’s support for special characters.  When developing content in HTML, web authors have traditionally been required to describe special characters (e.g. scientific symbols or non-Roman alphabetic characters) using character entities.

For example, if one wished to insert a subscript number 2 into their text, HTML would require that the author use the character entity $#8322; to display the symbol in a web browser.  XML, on the other hand, uses tags that are both human- and machine-readable to describe and format a subscript 2. (see illustration below)

The situation is similar for symbols such as an arrow:  HTML requires the character entity $#8594; while XML “understands” and will output an arrow symbol entered into a properly-formed XML document.  This enhanced support of special characters encoding was terrifically helpful in the formatting of the Pauling Research Notebooks series, which contains a great number and variety of special characters:

An example of the special characters encoding used in the Pauling Research Notebooks series.  XML's support of special characters encoding is significantly more intuitive and elegant than the character entity requirements specified by html.

An example of the special characters mark-up used in the Pauling Research Notebooks series. XML's support of special characters encoding is significantly more intuitive and elegant than are the character entity requirements specified by HTML.

The Pauling Catalogue

The Pauling Catalogue

While XML and XSLT provided a strong platform for the formatting of The Pauling Catalogue text, the 1,200+ illustrations inserted throughout the six-volume publication presented a new and varied set of challenges.  The processes required to cope with these issues will be the subject of our next post in this series.

The Pauling Catalogue is available for purchase at http://paulingcatalogue.org

Contents of The Pauling Catalogue

A brief overview of the six volumes that comprise The Pauling Catalogue

A brief overview of the six volumes that comprise The Pauling Catalogue

[Part 3 of 9]
The Pauling Catalogue is a mammoth publication — six volumes, more than 1,700 pages and over 1,200 illustrations, the entirety of which is held in a slipcase and weighs in at over twenty pounds per set.  The six volumes are effectively a detailed outline of the Ava Helen and Linus Pauling Papers, a 4,400 linear foot collection that has been arranged and described using a schema of seventeen disparate intellectual series.  These seventeen series — the “meat” of The Pauling Catalogue — are detailed below the jump.

The contents of The Pauling Catalogue

The Acquisition and Cataloging of the Ava Helen and Linus Pauling Papers

Part of Linus Pauling's collection of molecular models, stored in the OSU Libraries Special Collections closed-stack area.

Part of Linus Pauling's collection of molecular models, stored in the OSU Libraries Special Collections closed-stack area.

[Part 2 of 9]

The Ava Helen and Linus Pauling Papers consist of over 500,000 items requiring some 4,400 linear feet of storage space in the Oregon State University (OSU) Libraries Special Collections. The collection began arriving in Corvallis shortly after Pauling announced, in April 1986, that he would be donating his personal archive to OSU, his undergraduate alma mater. A large initial accession of roughly 100,000 items was received shortly thereafter.

Over the remaining eight years of Pauling’s life, a few thousand documents would be transferred annually to the collection. Perhaps the chief reason why Pauling waited until his eighty-fifth birthday to designate a repository for his papers was the simple fact of his high rate of scientific activity, which continued relatively unabated up to within weeks of his death. (Pauling’s bibliography includes eleven articles which were published posthumously) As such, when selecting files to be shipped, Pauling’s primary criterion was the applicability of a given material set to his current research. Likewise, little ceremony was bestowed upon some of the more valuable artifacts in the collection. For example, Pauling sent his two Nobel medals to OSU without any forewarning – the unmarked parcel in which they were shipped arrived late on a Friday afternoon and sat unprotected in the building’s loading dock until the start of business the next Monday morning.

One particular item, however, did require a great deal of care in transport: Dr. Pauling’s office chalkboard. A tangle of chemical formulas, project notes and scores of names, Pauling’s chalkboard presented an especially daunting shipping and conservation challenge. Novel ideas were solicited from the archival community, but none proved satisfactory – spraying the board with a aerosol sealant would push the chalk dust into the porous surface; protecting the board with Plexiglas would electrostatically draw dust off of the board and toward the glass. Ultimately Ockham’s Razor – one of Pauling’s favorite rules of thumb – won out. A crate with custom foam padding was built and the board was transported with great care taken that it not be tilted out of the horizontal. The method worked. No noticeable dust was lost in transit and to this day the board hangs on display in a locked mock-up of Pauling’s office adjacent to the Special Collections reading room.

The Pauling chalkboard, housed on permanent display in the OSU Libraries Special Collections.

The Pauling chalkboard, housed on permanent display in the OSU Libraries Special Collections.

With Pauling’s death in August 1994, large quantities of his work once again began moving from California to Corvallis – a total summing close to 350,000 items. This mass of paper came primarily from three sources: Pauling’s large oceanside ranch at Big Sur, California; a smaller apartment that Pauling kept on the campus of Stanford University; and the offices of the Linus Pauling Institute of Science and Medicine, then located in Palo Alto, California. These major pieces of Pauling’s archive were, however, being shipped to a facility not yet large enough to accommodate them. Indeed, the full Ava Helen and Linus Pauling Papers would not be housed under a single roof until the winter of 1998, when the Valley Library expansion project multiplied the Special Collections space exponentially.

Early in the cataloging process it was decided that the Pauling Papers were of such significance that it would be prudent, even necessary, to create container listings and finding aids of comprehensive detail. As a result, all of the collection is cataloged, at minimum, on the folder level, and significant portions have been described on the item level. The amount of work that has gone into this process is summed up quite nicely with a simple object lesson. In 1991, to honor Pauling’s ninetieth birthday, a preliminary catalog of holdings, 305 pages in length, was issued. Move ahead fifteen years to the completed Pauling Catalogue, and one is confronted by a tome which runs to 1,852 pages, without illustrations, in eight-point type.

The Pauling Catalogue

The Pauling Catalogue

In an attempt to make these 1,852 pages as user-friendly as possible, The Pauling Catalogue has been organized both by material type and by subject, and is further subdivided into seventeen sections. For the purposes of this publication project, these seventeen sections have been organized into six volumes.  The contents of these six volumes will be the subject of our next post in this series.

The Pauling Catalogue is available for purchase at http://paulingcatalogue.org