Default Text Structure for TEI Documents

This chapter describes the default high level structure for all TEI documents. The majority of the different base tag sets described in part II simply embed the framework defined in this chapter; a few redefine it with some minor modifications; this chapter is therefore relevant to every kind of TEI document. For further details on the overall structure of the TEI document type definitions, in particular the use of base and additional tag sets, see chapter .

TEI texts may be regarded either as unitary, that is, forming an organic whole, or as composite, that is, consisting of several components which are in some important sense independent of each other. The distinction is not always entirely obvious: for example a collection of essays might be regarded as a single item in some circumstances, or as a number of distinct items in others. In such borderline cases, the encoder must choose whether to treat the text as unitary or composite; each may have advantages and disadvantages in a given situation.

Whether unitary or composite, the text is marked with the text tag and may contain front matter, a text body, and back matter. In unitary texts, the text body is tagged body; in composite texts, where the text body consists of a series of subordinate texts or groups, it is tagged group. The overall structure of any text, unitary or composite, is thus defined by the following elements: contains a single text of any kind, whether unitary or composite, for example a poem or drama, a collection of essays, a novel, or a corpus sample. contains any prefatory matter (headers, title page, prefaces, dedications, etc.) found before the start of a text proper. contains the whole body of a single unitary text, excluding any front or back matter. contains the body of a composite text, grouping together a sequence of distinct texts (or groups of such texts) which are regarded as a unit for some purpose, for example the collected works of an author, a sequence of prose essays, etc. contains any appendixes, etc. following the main part of a text.

The overall structure of a unitary text is: ... ]]>

The overall structure of a composite text made up of two unitary texts is: ... ]]>

Each of these elements is further described in the following subsections. text, body and group are formally declared as follows: ]]> Elements front and back are separately declared, as further discussed in sections and . Textual elements, such as paragraphs, lists or phrases, which nest within these major structural elements, are discussed in chapter (for elements common to all kinds of document) and in part II (for elements specific to a particular base). The group element, used for composite texts, is further discussed in section . Divisions of the Body

In some texts, the body consists simply of a sequence of low-level structural items, referred to here as components or component-level elements; see further section ). Examples in prose texts include paragraphs or lists; in dramatic texts, speeches and stage directions; in dictionaries, dictionary entries. In other cases sequences of such elements will be grouped together hierarchically into textual divisions and subdivisions, such as chapters or sections. The names used for these structural subdivisions of texts vary with the genre and period of the text, or even with the whim of the author, editor or publisher. For example, a major subdivision of an epic or of the Bible is generally called a book, that of a report is usually called a part or section, that of a novel a chapter --- unless it is an epistolary novel, in which case it may be called a letter. Even texts which are not organized as linear prose narratives, or not as narratives at all, will frequently be subdivided in a similar way: a drama into acts and scenes; a reference book into sections; a diary or day book into entries; a newspaper into issues and sections, and so forth.

To cater for this variety, these Guidelines propose that all such textual divisions be regarded as occurrences of the same neutrally named elements, with an attribute type used to categorize elements independently of their hierarchic level. Two alternative styles are provided for the marking of these neutral divisions: numbered and un-numbered. Numbered divisions are named div0, div1, div2, etc., where the number indicates the depth of this particular division within the hierarchy, the largest such division being div0, any subdivision within it being div1, any further sub-sub-division being div2 and so on. Un-numbered divisions are simply named div, and allowed to nest recursively to indicate their hierarchic depth. The two styles may not be combined within a single front, body or back element. Un-numbered Divisions

The following element is used to identify textual subdivisions in the un-numbered style: contains a subdivision of the front, body, or back of a text. As a member of the class divn this element has the following additional attribute: specifies a name conventionally used for this level of subdivision, e.g. act, volume, book, section, canto, etc.

Using this style, the body of a text containing two parts, each composed of two chapters, might be represented as follows:

]]>

Note that end-tags are mandatory for un-numbered divisions, to avoid ambiguity. Note also that the type attribute must be specified each time its value changes, for reasons discussed in section below.

The div element has the following formal definition: ]]> Numbered Divisions

The following elements are used to identify textual subdivisions in the numbered style: contains the largest possible subdivision of the body of a text. contains a first-level subdivision of the front, body, or back of a text (the largest, if div0 is not used, the second largest if it is). contains a second-level subdivision of the front, body, or back of a text. contains a third-level subdivision of the front, body, or back of a text. contains a fourth-level subdivision of the front, body, or back of a text. contains a fifth-level subdivision of the front, body, or back of a text. contains a sixth-level subdivision of the front, body, or back of a text. contains the smallest possible subdivision of the front, body or back of a text, larger than a paragraph. As members of the class divn these elements all bear the following additional attribute: specifies a name conventionally used for this level of subdivision, e.g. act, volume, book, section, canto, etc.

The largest possible subdivision of the body may be regarded either as a div0 or as a div1 element, This convention (corresponding with the idea that a type-set document may begin either with a level 0 or a level 1 heading) is provided for convenience and compatibility with some widely used formatting systems. and the smallest possible div7. If numbered divisions are in use, a division at any one level (say, div3), may contain only numbered divisions at the next lowest level (in this case, div4).

Using this style, the body of a text containing two parts, each composed of two chapters, might be represented as follows: ]]>

Formal definitions for these elements are as follows: ]]> Numbered or Un-numbered?

The choice between numbered and un-numbered divisions will depend to some extent on the complexity of the material: un-numbered divisions allow for an arbitrary depth of nesting, while numbered divisions limit the depth of the tree which can be constructed. Where divisions at different levels should be processed differently (chapters, but not sections, for example, beginning on new pages), numbered divisions slightly simplify the task of defining the desired processing for each level. Some software may find numbered divisions easier to process, as there is no need to maintain knowledge of the whole document structure in order to know the level at which a division occurs; such software may however find it difficult to cope with some other aspects of the TEI scheme. On the other hand, in a collection of many works it may prove difficult or impossible to ensure that the same numbered division always corresponds with the same type of textual feature: a chapter may be at level 1 in one work and level 3 in another. The two styles may not be mixed within the same front, body or back element.

Whichever style is used, the global n and id attributes (section ) should be used where appropriate to provide reference strings for each division of a text which is regarded as significant for referencing purposes (on reference systems, see further section ). As indicated above, the type attribute is used to provide a name or description for the division. Typical values might be book, chapter, section, part, or (for verse texts) book, canto, stanza, or (for dramatic texts) act, scene. This attribute has a declared value of #CURRENT, which implies that if defaulted, the value used will be that most recently specified on any element of the same kind, scanning the text left to right. Hence, if un-numbered divisions are used, the appropriate value must be specified each time a change of level occurs, both down and up the document hierarchy.

The following extended example uses numbered divisions to indicate the structure of a novel, and illustrates the use of the attributes discussed above. It also uses some elements discussed in the next section () and the p element discussed in section . Book I. Of writing lives in general, and particularly of Pamela, with a word by the bye of Colley Cibber and others.

It is a trite but true observation, that examples work more forcibly on the mind than precepts: ... Of Mr. Joseph Andrews, his birth, parentage, education, and great endowments; with a word or two concerning ancestors.

Mr. Joseph Andrews, the hero of our ensuing history, was esteemed to be the only son of Gaffar and Gammar Andrews, and brother to the illustrious Pamela, whose virtue is at present so famous ... The end of the first Book Book II Of divisions in authors

There are certain mysteries or secrets in all trades, from the highest to the lowest, from that of prime-ministering, to this of authoring, which are seldom discovered unless to members of the same calling ...

I will dismiss this chapter with the following observation: that it becomes an author generally to divide a book, as it does a butcher to joint his meat, for such assistance is of great help to both the reader and the carver. And now having indulged myself a little I will endeavour to indulge the curiosity of my reader, who is no doubt impatient to know what he will find in the subsequent chapters of this book. A surprising instance of Mr. Adams's short memory, with the unfortunate consequences which it brought on Joseph.

Mr. Adams and Joseph were now ready to depart different ways ... ]]> Partial and Composite Divisions

In most situations, the textual subdivisions marked by div elements will be both complete and identically organized with reference to the original source. For some purposes however, in particular where dealing with unusually large or unusually small texts, encoders may find it convenient to present as textual divisions sequences of text which are incomplete with reference to the original text, or which are in fact an ad hoc agglomeration of tiny texts. In some kinds of texts it is in any case difficult or impossible to determine the order in which the individual subdivisions of text should be combined to form the next higher level of subdivision, as noted below.

To overcome these problems, the following additional attributes are defined for all elements in the divn class: specifies how the content of the division is organized. Legal values are: uniform content: i.e. the immediate contents of this element are regarded as forming a logical unit, to be processed in sequence. composite content: i.e. no claim is made about the sequence in which the immediate contents of this division are to be processed, or their inter-relationships. specifies whether or not this division is complete with respect to the original source. Legal values are: the full text of the original has been transcribed. a sample of the original text has been transcribed. indicates from which part of the original source the division has been extracted in the case of a sampled division. Legal values are: division lacks material at start and end. division lacks material at end. position of sampled material within original unknown. division is not a sample. division lacks material at start.

For example, an encoder might choose to transcribe only the first two thousand words of each chapter from a novel. In such a case, each chapter might conveniently be regarded as a partial division, and tagged with a div element in the following form: ]]> where xx represents a number for the chapter. The sampling element in the TEI Header should also be used to record the principles underlying the selection of incomplete samples, as further described in section .

The following example demonstrates how a newspaper column composed of very short unrelated snippets may be encoded using these attributes: News in brief Police deny losing bomb

Scotland Yard yesterday denied claims in the Sunday Express that anti-terrorist officers trailing an IRA van loaded with explosives in north London had lost track of it 10 days ago. Hotel blaze

Nearly 200 guests were evacuated before dawn yesterday after fire broke out at the Scandic Crown hotel in the Royal Mile, Edinburgh. Test match split

Test Match Special next summer will be split between Radio 5 and Radio 3, after protests this year that it disrupted Radio 3's music schedule. ]]>

The org attribute on the div1 element is used here to indicate that individual stories in this group, marked here as div2, are really quite independent of each other, although they are all marked as subdivisions of the whole group. They can be read in any order without affecting the sense of the piece; indeed, in some cases, divisions of this nature are printed in such a way as to make it impossible to determine the order in which they are intended to be read. Individual stories can be added or removed without affecting the existing components.

This method of encoding composite texts as composite divisions has some limitations compared with the more general and powerful mechanisms discussed in section ; it may however be preferable in some circumstances, notably where where the individual texts are very small. Elements Common to All Divisions

The divisions of any kind of text may sometimes begin with a brief heading or descriptive title, with or without a byline, an epigraph or brief quotation, or a salutation such as one finds at the start of a letter. They may also conclude with a brief trailer, byline, or signature. Elements which may appear in this way, either at the start or at the end of a text division proper, are regarded as forming a class, known as divtop or divbot respectively.

The following special-purpose elements are provided to mark features which may appear only at the start of a division: contains any heading, for example, the title of a section, or the heading of a list or glossary. Attributes include: categorizes the heading in some way meaningful to the encoder. contains a quotation, anonymous or attributed, appearing at the start of a section or chapter, or on a title page. A formal list or prose description of the topics addressed by a subdivision of a text. groups together dateline, byline, salutation, and similar phrases appearing as a preliminary group at the start of a division, especially of a letter. For further details of the head element, see section ; for epigraph and argument, see section ; for opener, see section .

The following special-purpose elements are provided to mark features which may appear only at the end of a division: contains a closing title or footer appearing at the end of a division of a text. groups together dateline, byline, salutation, and similar phrases appearing as a final group at the end of a division, especially of a letter. For further details of the trailer element, see section ; for the closer element, section . Headings and Trailers

The head element is used to identify a heading prefixed to the start of any textual division, at any level. A given division may of course contain more than one such element, as in the following example: Etymology (Supplied by a late consumptive usher to a grammar school)

The pale Usher ‐ threadbare in coat, heart, body and brain; I see him now. He was ever dusting his old lexicons and grammars.... ]]>

Unlike some other markup schemes, the TEI scheme does not require that headings attached to textual subdivisions at different hierarchic levels have different identifiers. All kinds of heading are marked identically using the head tag; the type or level of heading intended is implied by the immediate parent of the head element, which may for example be a div1, div2, etc., an un-numbered div, or a list.

In certain kinds of text (notably newspapers), there may be a need to categorize individual headings within the sequence at the start of a division, for example as main headings, or detail headings. Specific elements are provided for certain kinds of heading-like features, (notably byline, dateline and salute; see further section ), but the type attribute must be used to discriminate among other forms of heading.

In the following example, taken from a British newspaper, the lead story and its associated headlines have been encoded as a div element, with appropriate divtop elements attached: President pledges safeguards for 2,400 British troops in Bosnia Major agrees to enforced no-fly zone By George Jones, Political Editor, in Washington

Greater Western intervention in the conflict in former Yugoslavia was pledged by President Bush yesterday.... ]]>

In older writings, the headings or incipits may be rather longer than usual in modern works. When heading-like material appears in the middle of a text, the encoder must decide whether or not to treat it as the start of a new division. If the phrase in question appears to be more closely connected with what follows than with what precedes it, then it may be regarded as a hehading and tagged as the head of a new div element. If it appears to be simply inserted or superimposed --- as for example the kind of pull quotes often found in newspapers or magazines, then the quote, q, or cit element may be more appropriate.

The trailer element, which can appear at the end of a division only, is used to mark any heading-like feature appearing in this position, as in this example: In the name of Christ here begins the first book of the ecclesiastical history of Georgius Florentinus, known as Gregory, Bishop of Tours. Chapter-Headings In the name of Christ here begins Book I of the history.

Proposing as I do ...

From the Passion of our Lord until the death of Saint Martin four hundred and twelve years passed. Here ends the first Book, which covers five thousand, five hundred and ninety-six years from the beginning of the world down to the death of Saint Martin. ]]> Openers and Closers

In addition to headings of various kinds, divisions sometimes include more or less formulaic opening or closing passages, typically conveying such information as the name and address of the person to whom the division is addressed, the place or time of its production, a salutation or exhortation to the reader, and so on. Divisions in epistolary form are particularly liable to include such features. To cater for the full variety of such features, the elements described in chapter should be used, but for many simple cases, the following elements should be adequate: contains the primary statement of responsibility given for a work on its title page or at the head or end of the work. contains a brief description of the place, date, time, etc. of production of a letter, newspaper story, or other work, prefixed or suffixed to it as a kind of heading or trailer. contains a salutation or greeting prefixed to a foreword, dedicatory epistle or other division of a text, or the salutation in the closing of a letter, preface, etc. contains the closing salutation, etc., appended to a foreword, dedicatory epistle, or other division of a text.

The opener element may be used to group together any mixture of the above elements, appearing as a unit at the start of a division. The closer element is used to group together any mixture of the same elements appearing at the end of a division, as in the following examples:

The byline and dateline elements are used to encode headings which identify the authorship and provenance of a division. Although the terminology derives from newspaper usage, there is no implication that dateline or byline elements apply only to newspaper texts. The following example illustrates use of the dateline and signature elements at the end of the preface to a novel: To Henry Hope.

It is not because this volume was conceived and partly executed amid the glades and galleries of the Deepdene, that I have inscribed it with your name.... I shall find a reflex to their efforts in your own generous spirit and enlightened mind.

D. Grosvenor Gate, May-Day, 1844 ]]> In the following examples, both opener and closer grouping elements are used: Sixth Narrative contributed by Sergeant Cuff
Dorking, Surrey, July 30th, 1849 To Franklin Blake, Esq. Sir, ‐

I beg to apologize for the delay that has occurred in the production of the Report, with which I engaged to furnish you. I have waited to make it a complete Report.... I have the honour to remain, dear sir, your obedient servant RICHARD CUFF (late sergeant in the Detective Force, Scotland Yard, London).

]]>
Letter XIV: Miss Clarissa Harlowe to Miss Howe Thursday evening, March 2.

On Hannah's depositing my long letter ...

An interruption obliges me to conclude myself in some hurry, as well as fright, what I must ever be, Yours more than my own, Clarissa Harlowe ]]>

For further discussion of the encoding of names of persons and places and of dates, see section and chapter . Arguments and Epigraphs

The argument element may be used to encode the prefatory list of topics covered sometimes found at the start of a chapter or other division. It is most conveniently encoded as a list, since this allows each item to be distinguished, but may also simply be presented as a paragraph. The following are thus both equally valid ways of encoding the same argument: Kingston — Instructive remarks on early English history — Instructive observations on carved oak and life in general — Sad case of Stivvings, junior — Musings on antiquity — I forget that I am steering — Interesting result — Hampton Court Maze — Harris as a guide.

It was a glorious morning, late spring or early summer, as you care to take it... ]]> Kingston Instructive remarks on early English history Instructive observations on carved oak and life in general Sad case of Stivvings, junior Musings on antiquity I forget that I am steering Interesting result Hampton Court Maze Harris as a guide.

It was a glorious morning, late spring or early summer, as you care to take it... ]]>

An epigraph is a quotation from some other work appearing on a title page, or at the start of a division. It may be encoded using the special-purpose epigraph element. Its content will generally be a q or quote element, often associated with a bibliographic reference, as in the following example: Chapter 19 I pity the man who can travel from Dan to Beersheba, and say 'Tis all barren; and so is all the world to him who will not cultivate the fruits it offers. Sterne: Sentimental Journey.

To say that Deronda was romantic would be to misrepresent him: but under his calm and somewhat self-repressed exterior ... ]]>

For discussion of quotations appearing other than as epigraphs refer to section . Content of Textual Divisions

Other than its initial sequence of divtop elements, and its closing sequence of divbot elements, every textual division (numbered or un-numbered) consists of a sequence of ungrouped lower-level structural elements, that is, a sequence of component elements (see ). The actual elements available will depend on the base tag set in use; in all cases, at least the component-level structural elements defined in the core will be available (paragraphs, lists, dramatic speeches, verse lines and line groups etc.). If the drama base has been selected, then additionally the low level dramatic structural elements (speeches or stage directions, as defined in chapter ) will be available. If the dictionary base is in use, then dictionary entries, related entries, etc. (as defined in chapter ) will also be available; if the tag set for transcribed speech is in use, then utterances, pauses, vocals, kinesics, etc., as defined in chapter ; and so on.

Where a text contains low level elements from more than one base, two options are available. The first option, selected by the mixed base, allows for low level structural elements from any or all of the selected bases to appear at any point. The second option, selected by the general base, allows for low level structural elements from different bases to appear in different textual divisions of the same text, but requires that any one division use elements from only one base. For further information, refer to chapter .

The elements discussed in this section are formally defined as follows: ]]> Groups of Texts

The group element should be used to represent a collection of independent texts which is to be regarded as a single unit for processing or other purposes. Examples of such composite texts include anthologies and other collections; the presence of common front matter referring to the whole collection, possibly in addition to front matter relating to each individual text, is a good indication that a given text might usefully be encoded as a group, though encoders may choose to use this structure to represent other kinds of composite as well. contains the body of a composite text, grouping together a sequence of distinct texts (or groups of such texts) which are regarded as a unit for some purpose, for example the collected works of an author, a sequence of prose essays, etc.

For example, the overall structure of a collection of short stories might be encoded as follows: The Adventures of Sherlock Holmes First published in The Strand between July 1891 and December 1892 Adventures of Sherlock Holmes Adventure I. &mdash A Scandal in Bohemia By A. Conan Doyle.

To Sherlock Holmes she is always the woman. ... Adventures of Sherlock Holmes Adventure II. &mdash The Red-Headed League By A. Conan Doyle. Adventures of Sherlock Holmes Adventure XII. &mdash The Adventure of the Copper Beeches By A. Conan Doyle.

&odq;To the man who loves art for its own sake,&cdq; remarked Sherlock Holmes ... ... she is now the head of a private school at Walsall, where I believe that she has met with considerable success.

]]>

A text which is a member of a group may itself contain groups. This is quite common in collections of verse, but may happen in any kind of text. As an example, consider the overall structure of a typical collection, such as the Muses Library edition of Crashaw's poetry (ed. J.R. Tutin, [ca. 1900]). Following a critical introduction and table of contents, this work contains the following major sections: Steps to the Temple (a collection of verse first published in 1648) Carmen deo Nostro (a second collection, published in 1652) The Delights of the Muses (a third collection, published in 1648) Posthumous Poems, I (a collection of fragments all taken from a single manuscript) Posthumous Poems, II (a further collection of fragments, taken from a different manuscript)

Each of the three collections published in Crashaw's lifetime has a reasonable claim to be considered as a text in its own right, and may therefore be encoded as such. It is rather more arbitrary as to whether the two posthumous collections should be treated as two groups, following the practice of the Muses Library edition. An encoder might elect to combine the two into a single group, or simply to treat each fragment as an ungrouped unitary text.

The Muses Library edition reprints the whole of each of the three original collections, including their original front matter (title pages, dedications etc.). These should be encoded using the front element and its constituents (on which see further section ), while the body of each collection should be encoded as a single group element. Each individual poem within the collections should be encoded as a distinct text element. The beginning of the whole collection would thus appear as follows (for further discussion of the use of the elements div and lg for textual subdivision of verse, see section and chapter ): The poems of Richard Crashaw Edited by J.R. Tutin ...

Editor's Note

A few words are necessary... ... Steps to the Temple, Sacred Poems ...

The Preface to the Reader

Learned Reader, The Author's friend will not usurp much upon thy eye... Sospetto D'Herode Libro Primo Casting the times with their strong signs ... Muse! now the servant of soft loves no more Hate is thy theme and Herod whose unblest Hand (O, what dares not jealous greatness?) tore A thousand sweet babes from their mothers' breast, The blooms of martyrdom... ... The Tear What bright soft thing is this Sweet Mary, thy fair eyes' expense? ... ]]> Following the remaining poems of the Steps to the Temple, each one within its own text element, the structure of the remainder of the work might be represented as follows: ... ... ... ... ... ... ]]>

The group element may be used in this way to encode any kind of collection of which the constituents are regarded by the encoder as texts in their own right. Examples include anthologies of verse or prose by multiple authors, collections, florilegia or commonplace books, journals, day books, etc. As a fairly typical example, we consider The Norton Book of Travel, an anthology edited by Paul Fussell and published in 1987 by W.W. Norton. This work comprises the folowing major sections: Front matter (title page, acknowledgments, introductory essay) The Beginnings The Eighteenth Century and the Grand Tour The Heyday Touristic Tendencies Post Tourism Back matter (permissions list, index) Each titled section listed above comprises a group of extracts or complete texts from writers of a given historical period, preceded by an introductory essay. For example, the second group listed above contains, inter alia, the following: Prefatory essay Five letters by Lady Mary Wortley Montagu An extract from Swift's Gullivers Travels Two poems by Alexander Pope Two extracts from Boswell's Journal A poem by William Blake Each group of writings by a single author is preceded by a brief biographical notice. Some of the extracts are quite lengthy, containing several chapters or other divisions; others are quite short. As the above list indicates, the texts included range across all kinds of material: verse, prose, journals and letters.

The easiest way of encoding such an anthology is to treat each individual extract as a text in its own right. A sequence of texts by a single author, together with the biographical note preceding it, can then be treated as a single group element within the larger group formed by the section. The sequence of single or composite texts making up a single section of the work is likewise treated, together with its prefatory essay, as a single group within the work. Schematically: The Beginnings ... ... ... The Heyday ... ]]>

Note that the editor's introductory essays on each author may be treated as texts in their own right, as here the essays on Lady Mary Wortley Montagu and Alexander Pope, or as front matter to the embedded text, as here the essay on Swift. The treatment in the example is intentionally inconsistent, to allow comparison of the two approaches. Consistency can be imposed either by treating the Swift section as a group containing one text by Swift and one by the editor, or by treating the Montagu and Pope sections as text elements containing the editor's essays as front matter. Marked in the second way, the Pope section of the book would look like this: ... ]]>

The essays on The Eighteenth Century and the Grand Tour and other larger sections could also be tagged as front matter in the same way, by treating the larger sections as text elements rather than group elements.

Where, as in this case, an anthology contains different kinds of text (for example, mixtures of prose and drama, or transcribed speech and dictionary entries, or letters and verse), the elements to be encoded may well need to be drawn from more than one of the base tag sets described in part II. In such a situation, either the mixed or the general base should be specified, as further described in chapter . The elements provided by the core tag set described in chapter should however prove adequate for most simple purposes, where prose, drama, and verse are combined in a single collection.

For anthologies of short extracts such as commonplace books, it may often be preferable to regard each extract not as a text in its own right but simply as a quotation or cit element. The following component-level elements may be used to encode quotations of this kind: A quotation from some other document, together with a bibliographic reference to its source. contains a phrase or passage attributed by the narrator or author to some agency external to the text. For example, the chapter of extracts which appears in the front matter of Melville's Moby Dick might be encoded as follows: Extracts (Supplied by a sub-sub-Librarian)

It will be seem that this mere painstaking burrower and grubworm of a poor devil of a Sub-Sub appears to have gone through the long Vaticans and street-stalls of the earth, picking up whatever random allusions to whales he could anyways find... Here ye strike but splintered hearts together ‐ there, ye shall strike unsplinterable glasses!

And God created great whales. Genesis Leviathan maketh a path to shine after him; One would think the deep to be hoary. Job ... By art is created that great Leviathan, called a Commonwealth or State &dash (in Latin, civitas), which is but an artificial man. Opening sentence of Hobbes's Leviathan ]]> For more information on the use of the quote and bibl elements, see sections and respectively.

Where one or more whole texts are embedded within other texts, without necessarily forming a composite, the encoder may also choose to represent the nested structure directly. The text element is itself a component-level element, and thus can appear within any division level element in the same way as a paragraph. For example, texts such as the Decameron or the Arabian Nights might be regarded as sequences of discrete texts embedded within another single text, the framing narrative, rather than as groups of discrete texts in which the fragments of framing narrative are regarded as front matter. Front Matter

By front matter we mean distinct sections of a text (usually, but not exclusively, a printed one), prefixed to it by way of introduction or identification as a part of its production. Features such as title pages or prefaces are clear examples; a less definite case might be the prologue attached to a play. The front matter of an encoded text should not be confused with the TEI header described in chapter , which serves as a kind of front matter for the computer file itself, not the text it encodes.

An encoder may choose simply to ignore the front matter in a text, if the original presentation of the work is of no interest, or for other reasons; alternatively some or all components of the front matter may be thought worth including with the text as components of the front element. This decision should be recorded in the sampling element of the header. With the exception of the title page, (on which see section ), front matter should be encoded using the same elements as the rest of a text. As with the divisions of the text body, no other specific tags are proposed here for the various kinds of subdivision which may appear within front matter: instead either numbered or un-numbered div elements may be used. The following suggested values As with all lists of suggested values for attributes, it is recommended that software written to handle TEI-conformant texts be prepared to recognize and handle these values when they occur, without limiting the user to the values in this list. for the type attribute may be used to distinguish various kinds of division characteristic of front matter:

The following extended example demonstrates how various parts of the front matter of a text may be encoded. The front part begins with a title page, which is presented in section below. This is followed by a dedication and a preface, each of which is encoded as a distinct div:

To my parents, Ida and Max Fish

Preface

The answer this book gives to its title question is there is and there isn't. ...

Chapters 1-12 have been previously published in the following journals and collections: chapters 1 and 3 in New literary History ... chapter 10 in Boundary II (1980) . I am grateful for permission to reprint. S.F.

]]>

The front matter concludes with another div element, shown in the next example, this time containing a table of contents, which contains a list element (as described in section ). Note the use of the ptr element to provide page-references: the implication here is that the target identifiers supplied (P1, P68 etc.) may correspond with identifiers used either for div elements representing chapters of the text, or for pb elements marking page divisions of the text. (For the ptr element, see .) Alternatively, the literal page numbers present in the source text might be transcribed, but they are likely to be of little direct use in work with the electronic text. Contents Introduction, or How I stopped Worrying and Learned to Love Interpretation Part One: Literature in the Reader Literature in the Reader: Affective Stylistics What is Stylistics and Why Are They Saying Such Terrible Things About It? ...

]]>

The following example uses numbered divisions to mark up the front matter of a medieval text. (Entity references are used to represent the characters thorn, yogh, and ampersand, as discussed in section .) Note that in this case no title page in the modern sense occurs; the title is simply given as a heading at the start of the front matter. Note also the use of the type attribute on the div elements to indicate document elements comparatively unusual in modern books such as the initial prayer: Here bygynni&th; a book of contemplacyon, &th;e whiche is clepyd &Th;E CLOWDE OF VNKNOWYNG, in &th;e whiche a soule is onyd wi&th; GOD. Here biginne&th; &th;e preyer on &th;e prologe.

God, unto whom alle hertes ben open, & unto whome alle wille speki&th;, & unto whom no priue &th;ing is hid: I beseche &th;ee so for to clense &th;e entent of myn hert wi&th; &th;e unspekable &yog;ift of &th;i grace, &th;at I may parfiteliche loue &th;ee & wor&th;ilich preise &th;ee. Amen. Here biginne&th; &th;e prolog.

In &th;e name of &th;e Fader ∓ of &th;e Sone & of &th;e Holy Goost.

I charge &th;ee ∓ I beseeche &th;ee, wi&th; as moche power & vertewe as &th;e bonde of charite is sufficient to suffre, what-so-euer &th;ou be &th;at &th;is book schalt haue in possession ... Here biginne&th; a table of &th;e chapitres. & here eende&th; &th;e table of &th;e chapitres. ]]> Title Pages

Detailed analysis of the title page and other preliminaries of older printed books and manuscripts is of major importance in descriptive bibliography and the cataloguing of printed books: such analysis may require a rather more detailed tag set than that proposed here. Definition of such a tag set remains a work item for the TEI; such tag sets for contemporary printed matter already exist or are being created within the publishing industry, for example the Majour (Modular Application for Journals) Project of the European Workgroup on SGML. See for example MAJOUR: Modular Application for Journals: DTD for Article Headers ([n.p.]: EWS, 1991). The following elements are therefore proposed as an interim measure; they constitute a useful descriptive tag set for the major features of most title pages: contains the title page of a text, appearing within the front or back matter. contains the title of a document, including all its constituents, as given on a title page. contains a subsection or division of the title of a work, as indicated on a title page. Attributes include: specifies the role of this subdivision of the title. Suggested values include: main title of the work descriptive paraphrase of the work included in title subtitle of the work alternative title of the work contains the primary statement of responsibility given for a work on its title page or at the head or end of the work. contains the name of the author of the document, as given on the title page (often but not always contained in a byline). contains a quotation, anonymous or attributed, appearing at the start of a section or chapter, or on a title page. contains a formal statement authorizing the publication of a work, sometimes required to appear on a title page or its verso. contains an edition statement as presented on a title page of a document. contains the imprint statement (place and date of publication, publisher name), as given (usually) at the foot of a title page. contains the date of a document, as given (usually) on a title page. marks the position of a printers device, ornament or figure for example on a title page or elsewhere in a printed text.

These elements constitute the element class tpParts, which is defined by the parameter entity m.tpParts. Encoders wishing to add new elements to this class may do so by modifying or redefining this parameter entity, as further described in chapter . Two examples of the use of these elements follow. First, the title page of the work discussed earlier in this section: Is There a Text in This Class? The Authority of Interpretive Communities Stanley Fish Harvard University Press Cambridge, Massachusetts London, England ]]>

Second, a characteristically verbose 17th century example: THE Pilgrim's Progress FROM THIS WORLD, TO That which is to come: Delivered under the Similitude of a DREAM Wherein is Discovered, The manner of his setting out, His Dangerous Journey; And safe Arrival at the Desired Countrey. I have used Similitudes,Hos. 12.10 By John Bunyan. Licensed and Entred according to Order. LONDON, Printed for Nath. Ponder at the Peacock in the Poultrey near Cornhil, 1678. ]]>

Those elements in the above list which are not defined elsewhere have the following formal declarations: ]]>

Where title pages are encoded, their physical rendition is often of considerable importance. One approach to this requirement would be to use the s tag, described in section , to segment the typographic content of each part of the title page, and then use the global rend attribute to specify its rendition. Another would be to use a tag set specialized for the description of typographic entities such as pages, lines, rules, etc., bearing special-purpose attributes to describe line height, leading, degree of kerning, font, etc.

Front matter elements are defined in a distinct DTD file called TEIfron2.dtd. ]]> Back Matter

Conventions vary as to which elements are grouped as back matter and which as front. For example, some books place the table of contents at the front, and others at the back. Even title pages may appear at the back of a book as well as at the front. The content model for back and front elements are therefore identical.

The following suggested values may be used for the type attribute on all division elements, in order to distinguish various kinds of division characteristic of back matter:

No additional elements are proposed for the encoding of back matter at present. Some characteristic examples follow; first, an index (for the case in which a printed index is of sufficient interest to merit transcription):

Index Actors, public, paid for the contempt attending their profession, Africa, cause assigned for the barbarous state of the interior parts of that continent, Agriculture ancient policy of Europe unfavourable to, artificers necessary to carry it on, cattle and tillage mutually improve each other, ... wealth arising from more solid than that which proceeds from commerce Alehouses, not the efficient cause of drunkenness, ...
... ]]>

Next, a back-matter division in epistolary form:

A letter written to his wife, founde with this booke after his death.

The remembrance of the many wrongs offred thee, and thy unreproued vertues, adde greater sorrow to my miserable state, than I can utter or thou conceiue.... ... yet trust I in the world to come to find mercie, by the merites of my Saiour to whom I commend thee, and commit my soule. Thy repentant husband for his disloyaltie, Robert Greene. Faelicem fuisse infaustum FINIS

... ]]>

And finally, a list of corrigenda and addenda with pseudo-epistolary features:

Addenda M. Scriblerus Lectori

Once more, gentle reader I appeal unto thee, from the shameful ignorance of the Editor, by whom Our own Specimen of Virgil hath been mangled in such miserable manner, that scarce without tears can we behold it. At the very entrance, Instead of prolego/mena, lo! prolegw/mena with an Omega! and in the same line consulâs with a circumflex! In the next page thou findest leviter perlabere, which his ignorance took to be the infinitive mood of perlabor but ought to be perlabi ... Wipe away all these monsters, Reader, with thy quill.

]]>

The back element is defined in file TEIback2.dtd; since there are no other specialized back-matter tags, nothing else is defined there. ]]> DTD Fragment for Default Text Structure

The DTD fragment described by the present chapter is found in file teistr2.dtd; it has the following overall structure: %TEI.front; %TEI.back; ]]>