Home

Archweb

Home

Archaeology and XML Newsletter 2

JANUARY 2004

CONTENTS

INTRODUCTION

NEWS

CATALOGUING THE LATIN INSCRIPTIONS IN THE

BRITISH MUSEUM: IN XML? BY JONATHAN PRAG

THE MARK-UP OF ARCHAEOLOGICAL EXCAVATION REPORTS

USING THE DTD OF THE TEXT ENCODING INITIATIVE (TEI) BY CHRISTIANE

MECKSEPER

GETTING YOUR DATA BACK; AN INTRODUCTION TO

XQUERY BY MARK BELL

INTRODUCTION

Mark Bell

Welcome to the second edition of the newsletter. I am glad to

say there was positive feedback on the first issue. I am also glad

to say this issue has articles by authors other that me.

Submissions however long or short are always welcome. Please keep

the comments coming. I hope to produce another newsletter soon

after Easter, when there should be some reports from the CAA

conference in Prato – see below.

NEWS

In the last newsletter I said there was going to be a specific

session on XML at CAA 2000 in Prato. James Landrum emailed me to

say “There will not be a technology-specific session on XML

and Databases at CAA2004, however there will be a number of

sessions in which XML and databases play roles, and therefore there

may be papers relevant to XML and related markup issues in a

variety of sessions”. Further details of papers should be

posted on the CAA2004 website at

http://www.caa2004.org. The

conference will run from 13-17 April 2004.

CATALOGUING THE LATIN

INSCRIPTIONS IN THE BRITISH MUSEUM: IN XML?

Jonathan Prag

School of Archaeology and Ancient History

University of Leicester

Although this project is temporarily stalled through lack of

time, I hope to return to cataloguing the Latin inscriptions of the

British Museum's Greek and Roman department in the near future. At

present the project is dependent on the research time I personally

have available while holding a teaching post - so not very

much…

The project presents a number of difficulties and in an attempt

to solve this, XML seems to offer the potential for the best

solution(s). Firstly there is a wide range of data which needs

integrating, ranging from images to acquisition records to object

data.

Secondly epigraphic texts present a particular problem when it

comes to database storage. Ideally one wants a wholly searchable

text; however epigraphic conventions, sigla, etc. are not conducive

to searchable texts; furthermore they are not readily entered into

many standard database packages. XML offers probably the best

solution to this range of problems.

The potential of XML for epigraphic texts is currently being

explored through the EPAPP

(http://www.kcl.ac.uk/humanities/cch/epapp/)

and EpiDoc (http://epidoc.sourceforge.net/)

projects. Those interested in the details of the problems are

directed to the EpiDoc website.

In essence a word such as "servus" ('slave') could appear in an

epigraphic text, by way of example, as oddly as: s]er<v>u[s-,

where additionally the letters e and u have subscript dots beneath

them (and you try including those in a plain text or HTML e-mail,

let alone in a database field); including all that information

while still permitting a text search of the field which would find

the word 'servus' is not easy. It is intended that this specific

project will be an adjunct to the EPAPP project with the latter

acting in an advisory and support role. Thirdly, the Museum's own

cataloguing system presents a range of difficulties when it comes

to integration of databases. The existing digital catalogue (which

does not include the Latin epigraphic material) is a much more

minimal catalogue than the specific requirements of this material

demand. However, the medium term plans for the general catalogue

envisage the 'bolting on' of local specific catalogues such as the

one under discussion here. In this case the choice of base database

format is perhaps both more problematic and less immediately

obvious. Indeed it may be that the core data is collated first in

an Access database, from where it can then be exported in minimal

form to the existing BM catalogue and used as the base for the

fuller XML catalogue which in turn will perhaps be more readily

appended to the BM catalogue (the XML interface of which is

undergoing gradual development). It should be noted however that

this aspect of the project is still very much at the discussion

stage. Additionally, a further project at the discussion stage

involves the integration of details of all classical inscriptions

currently held in the UK; this too will raise further problems of

data-type integration. Fourthly preferred publication - again, the

EPAPP project presents the possible 'ideal' format for such a

database, since a paper publication is unlikely. But here too, as

EPAPP has been demonstrating, one advantage of XML is the potential

for very varied output with considerable ease. Whether the

catalogue will ultimately be hosted online, made available in

digital form through other means, or not made generally available

in digital form but used simply to generate a hard copy end-product

is also still under discussion. Here too XML would seem to possess

greater flexibility than 'traditional' database packages (granted

however this is a more debateable point). The initial cataloguing

is envisaged to take in the region of 2-3 years (depending upon

available time), and the final finished product will probably be a

year or two longer in the making – although it is hoped to

have the work essentially complete in time for the International

Epigraphic Congress being hosted by London and Oxford in 2007.

THE MARK-UP OF ARCHAEOLOGICAL EXCAVATION

REPORTS USING THE DTD OF THE TEXT ENCODING INITIATIVE

(TEI)

Christiane Meckseper

In September 2000 I undertook research for a Masters degree in

Information Systems at the University of Sheffield into the

feasibility of using XML as a mark-up language for the publication

of archaeological field reports produced by commercial units,

so-called 'grey literature' (Meckseper & Warwick 2003). The basis for the dissertation was an article by Gray and Walford

(1999) published in Internet Archaeology, who had first recommended

the use of XML for the publication of archaeological data.

The main purpose of my research was to find a suitable DTD, or

XML schema, that could be used for the description of

archaeological data. Gray and Walford had suggested a rudimentary

customary archaeological DTD but I felt that it would be beyond the

scope of a Masters dissertation to build an archaeological DTD from

scratch. I surveyed several DTDs with an archaeological background

or connection, most notably David Shloen's XSTAR project

http://www.oi.uchicago.edu/OI/PROJ/XSTAR/XSTAR.html).

Other DTDs that could be used to mark up aspects of an

archaeological report are the Geography Mark-up Language (GML) for

geographical information (http://www.opengis.org/techno/specs/00-029/GML.html);

the Historical Event Mark-up and Linking (HEML) project (http://heml.mta.ca/heml-cocoon/);

and the DTD based on SPECTRUM, an established museum process and

documentation standard (Degenhardt Drenth 2001) developed by the

Museum Documentation Association (MDA), working in collaboration

with CIMI and other organizations.

However, as no single DTD at the time seemed suitable for the

task, and as I was almost exclusively dealing with written textual

material I decided to use the DTD of the Text Encoding Initiative

(TEI) for the mark-up of the archaeological reports (http://www.tei-c.org/P4X/).

The TEI is an international project to develop guidelines for the

encoding of textual material in electronic form for research

purposes (http://www.tei-c.org/). The TEI DTD is often used to produce electronic facsimiles of original paper publications and the elements defined in the TEI DTD therefore concentrate largely on the structural rendition of a text. However, the TEI DTD also provides extensive means of marking up names, place names and locations, dates and even levels of certainty, all of which are useful elements to mark up aspects of archaeological data and reports.

It is also possible to extend the TEI DTD with custom-made

elements, which could be used to add a further level of

archaeological detail to the mark-up of a report. Using XML's

extensive possibilities of defining elements with attributes, it

would also be very easy to introduce a controlled vocabulary into

the archaeological descriptions based on the English Heritage

Thesauri, MIDAS or other wordlists. For example an element could

take the form of:

"The <monument schema=" English Heritage Thesaurus of

Monument Types" type="inhumation">articulated

skeletons</monument> were all aligned E-W and laid out with

arms across the pelvis."

Another method of including a richer level of archaeological

mark-up could be the use of elements from another, purely

archaeological DTD, or DTDs like the GML mentioned above, in the

form of namespaces (

http://www.w3.org/TR/REC-xml-names/).

Using the TEI’s structural text mark-up, separate sections

of reports, like specialist reports, abstracts or structural

descriptions, could also be extracted from a body of archaeological

reports and re-compiled. For example, a specialist would be able to

extract all human bone reports from a corpus of archaeological

texts. This is one of the recommendations for archaeological

publication by the PUNS survey, a survey into the user needs of

archaeological publication, published by the Council for British

Archaeology (Jones, S., MacSween, A., Jeffrey, S., Morris, R.,

Heyworth, M. 2001).

As part of my research I marked up a very small sample of field

reports produced by ARCUS, the commercial unit based at the

University of Sheffield. However, I think what is needed in

relation to XML and archaeological excavation reports is a pilot

project that would mark up a body of texts using the TEI or another

suitable DTD, and then carried out extensive testing into their

usability and searcheability.

Further postgraduate research is currently being undertaken at

the University of York (Falkingham, pers.comm) into the potential

of xml for archaeological grey literature reports and a

'multi-layered' presentation once a document has been marked up.

This is also thinking of developing XML mark-up for reports that

would allow a degree of interoperability. For example a suitable

mark-up could allow data to be extracted from reports in a useful

format for input into SMR database fields and OASIS records. The

development of the FISH interoperability toolkit will also be an

interesting project to follow in this context.

A small non-representative survey of commercial archaeological

units, undertaken as part of the dissertation, showed that even

though there is a large willingness to publish material in an

electronic format, it is still beyond the financial capabilities of

many units to implement a system of electronic publication, let

alone incorporating extensive XML mark-up. However, hopefully with

the increasing acceptance of electronic publication and the

development of necessary tools to easily integrate different kinds

of archaeological electronic data and to more cost-effectively

process existing electronic versions of texts (for example in the

form of inserting semi-automated mark-up), funding methods can be

found that will allow the integration of richer electronic

publication into a commercial environment.

Bibliography

Gray, J., Walford, K. (1999). “One Good Site Deserves

Another: Electronic Publishing in Field Archaeology.” In:

Internet Archaeology 7. [Online]. Available at:

http://intarch.ac.uk/journal/issue7/gray_toc.html.

[Accessed: 07.01.2004].

Degenthart Drenth, B. (2001). Building on the mda SPECTRUM-XML

DTD for Collections Management Data Interchange. [Online].

Available at:

http://www.archimuse.com/mw2001/papers/degenhart/degenhart.html.

[Accessed: 07.01.2004].

Jones, S., MacSween, A., Jeffrey, S., Morris, R., Heyworth, M.

(2001). From the Ground Up. The Publication of Archaeological

Projects: a user needs survey. (CBA Publications).

MeckseperC. & Warwick, C. (2003). "The Publication of

Archaeological Excavation Reports using XML." In: Journal for

Literary and Linguistic Computing, Volume 18, Issue 1. pp. 63

– 75. Available at:

http://www3.oup.co.uk/litlin/hdb/Volume_18/Issue_01/pdf/180063.pdf

[Accessed: 07.01.2004].

GETTING YOUR DATA BACK; AN

INTRODUCTION TO XQUERY

Mark Bell

There are two parts to data management. The first part is

structuring and storing your data. XML as a data format is ideal

for this. The second and more important half of the process is to

retrieve information. Until very recently there were few tools

around for data retrieval, but the publication of a number of

standards has lead to the creation of some useful tools.

The World Wide Web Consortium ( http://www.w3.org/) has published

drafts for XPath version 2.0, XLST version 2.0, and Xquery version

2.0 all on the 12th November 2003.

XPath is a way of describing where nodes are in an XML document

and is used by the other standards to navigate through XML

documents.

XSLT (Extensible Stylesheet Language Transformations) is used

for transforming XML data into other formats such as HTML or

PDF.

XQuery is to XML documents what SQL is to a database – a

standard way to query a document.

The heart of XQuery is FLWOR (pronounced "flower"). FLWOR comes

from For, Let, Where, Order by, and Return.

The for and let clauses specify a sequence of values, which can

then be filtered with a where clause and ordered using an order by

clause. The return clause shows what should be returned. An XQuery

expression to return title and author tags from an XML file of book

information will look something like this:

<results>

{

for $b in doc("http://bstore1.example.com/bib.xml")/bib/book

return

<result>

{ $b/title }

{ $b/author }

</result>

}

</results>

(taken from the W3C use cases – see below).

An important point is that tags can be embedded in the return

expression so the output can be valid XML or HTML. XQuery comes

with a huge range of predefined functions for text and mathematical

operations as well as the option for users to create their own

functions.

There are now several good introductions to XQuery on the web.

For example: X Is for XQuery by Jason Hunter (http://otn.oracle.com/oramag/oracle/03-may/o33devxml.html)

or What is XQuery? (http://www.xml.com/pub/a/2002/10/16/xquery.html).

The W3C set of XQuery use cases (http://www.w3.org/TR/xquery-use-cases)

are easy to follow and show what can be done with XQuery using

examples.

Conclusion

As XML becomes a more widely used standard, users are going to

need better ways to find information in their XML documents. XQuery

is already to powerful tool for data retrieval and no doubt is

going to become as popular as SQL.

A note on programming tools for XQuery

XQuery is only a standard, and must be implemented by

programming tools. One such tool set is SAXON. It is an XSLT and

XQuery Processor, and an open source product. It can be downloaded

from http://saxon.sourceforge.net/.

It is written in Java and can be called from a Java program or run

from a command line. Only the latest version 7.8 includes support

for XQuery 1.0 as well as XSLT 2.0. Version 7.8 updates the product

to align it with the 12 November 2003 working drafts.

Microsoft is giving away for free a set of class libraries for

use with the 1.0 and 1.1 releases of the Microsoft .NET framework

software development kit. This can be downloaded from http://xqueryservices.com/.

Also the .Net development kit is available for free from

http://msdn.microsoft.com/library/default.asp?url=/downloads/list/netdevframework.asp.

This is the command line version.

Using the class libraries an XQuery file can be run against an

XML document and the results either sent to a file or to the

command line.

Links

XML Path Language (XPath) Version 2.0. Available at: http://www.w3.org/TR/xpath20/

XSLT 2.0 Available at: http://www.w3.org/TR/xslt20/

XQuery 1.0: An XML Query Language Available at: http://www.w3.org/TR/xquery/

END OF NEWSLETTER NUMBER 2

This newsletter is copyright © Mark Bell and the individual authors, 2004.

Please contact the editor before reproducing material from this newsletter.

  • Home
  • Services
  • Projects
  • The Dark Ages
  • On the web
  • XML newsletter
  • Newsletter 1
  • Newsletter 2
  • Newsletter 3
  • Newsletter 4
  • Newsletter 5
  • Newsletter 6
  • Newsletter 7

User login

  • Create new account
  • Request new password

Navigation

  • Recent posts
archweb
Theme port sponsored by Duplika Web Hosting.