Zthes: a Z39.50 profile for thesaurus navigation, version 1.0

1st May 2006

Mike Taylor
$Id: zthes-z3950-1.0.html,v 1.3 2006-05-02 10:29:27 mike Exp $

1. Introduction
2. Z39.50 specification
        2.1. Overview
        2.2. tagSet-Zthes
        2.3. Schema
        2.4. Element sets
        2.5. The Zthes-1 attribute set
        2.6. Searching
        2.7. Explain
                2.7.1. Overview
                2.7.2. Identifying a Zthes database
                2.7.3. Locating a relevant Zthes database
3. Future directions

1. Introduction

This document provides a set of specifications prescribing the use of the ANSI/NISO Z39.50 protocol to navigate remote thesauri. An Z39.50 server conforming to this profile can expose its thesaurus to any conforming client, enabling its use in many applications.

This document is to be read in conjunction with version 1.0 of the Zthes abstract model.

This version of the Zthes Z39.50 profile, 1.0, corresponds to version 1.0 of the Zthes SRU profile.

2. Z39.50 specification

2.1. Overview

This profile builds on the work of others by using the Z39.50 Attribute Architecture together with the Utility Attribute Set and the Cross-Domain Attribute Set developed for it. As such, it requires servers and clients to support version 3 of the Z39.50 protocol.

The intention of this profile is that it is used in conjunction with other profiles. It is envisaged that an application will use the Zthes profile to navigate a thesaurus and thereby obtain terms suitable for searching in a target database; and use a second, domain-specific profile such as GILS or CIMI to search in and retrieve from that database.

The Z39.50 objects defined by this profile have the following OIDs.

Object OID
tagSet-Zthes 1.2.840.10003.14.8
The Zthes Schema 1.2.840.10003.13.8
The Zthes-1 Attribute Set 1.2.840.10003.3.13
The AuthorityFileInfo External 1.2.840.10003.10.11

2.2. tagSet-Zthes

This profile defines a tag set called tagSet-Zthes, which describes the additional tags needed in the schema beyond those found in the standard tagSet-M and tagSet-G. It contains the following elements, corresponding to the same-named elements in the abstract model.

Tag Name ASN.1 Datatype
1 termQualifier InternationalString
2 termType InternationalString
3 relationType InternationalString
4 postings structured
5 fieldName InternationalString
6 hitCount INTEGER
7 termUpdate InternationalString
8 termVocabulary InternationalString
9 termCategory InternationalString
10 termStatus InternationalString
11 termApproval InternationalString
12 termSortkey InternationalString
13 termNoteLabel InternationalString
14 termNoteVocab InternationalString
15 relationWeight InternationalString

2.3. Schema

In the Zthes schema, tag types indicate elements from the following tag sets:

Type Meaning
1 tagSet-M, defined in appendix TAG.2.1 of the Z39.50 standard
2 tagSet-G, defined in appendix TAG.2.2 of the Z39.50 standard
3 application-defined string tags
4 tagSet-Zthes, defined above

The abstract model is represented in Z39.50 by a GRS-1 record encoded with the tag-paths specified in the following table. Where possible, standard tags from tagSet-M and tagSet-G are re-used; in these cases, the generic names of the tags are listed in the right-hand column.

Tag Path Occurrence Element Generic Name
(1,14) 1 termId localControlNumber
(4,7) [0,1] termUpdate  
(2,1) 1 termName title
(4,1) [0,1] termQualifier  
(4,2) [0,1] termType  
(2,20) [0,1] termLanguage language
(4,8) [0,1] termVocabulary  
(4,9) 0+ termCategory  
(4,10) [0,1] termStatus  
(4,11) [0,1] termApproval  
(4,12) [0,1] termSortkey  
(2,17) 0+ termNote description
(2,17)(4,13) [0,1] termNoteLabel  
(2,17)(4,14) [0,1] termNoteVocab  
(1,15) [0,1] termCreatedDate creation date
(1,27) [0,1] termCreatedBy record created by
(1,16) [0,1] termModifiedDate dateOfLastModifification
(1,28) [0,1] termModifiedBy record modified by
(4,4) 0+ postings  
(4,4)(2,36) 1 sourceDb databaseName
(4,4)(4,5) [0,1] fieldName  
(4,4)(4,6) 1 hitCount  
(2,30) 0+ relation relation
(2,30)(4,3) 1 relationType  
(2,30)(4,15) [0,1] relationWeight  
(2,30)(2,36) [0,1] sourceDb databaseName
(2,30)(1,14) 1 termId localControlNumber
(2,30)(2,1) 1 termName title
(2,30)(4,1) [0,1] termQualifier  
(2,30)(4,2) [0,1] termType  
(2,30)(2,20) [0,1] termLanguage language

The termLanguage element is expressed as one of the standard codes described in RFC 1766 (Tags for the Identification of Languages) and ISO 639 (Code for the representation of names of languages) - for example, en for English, fr for French and de for German.

The administrative date fields should be returned in the ASN.1 GeneralizedTime format. (The working group considered the Z39.50 ASN.1 date/time definition but reached the conclusion that the benefits would be outweighed by the barrier raised to implementation.)

The person-name elements, termCreatedBy and termModifiedBy, may be returned in whatever format is convenient for the server: this profile does not attempt to address the interpretation of such administrative information across multiple databases.

The sourceDb element should be returned in the form of a z39.50s URL as described in RFC 2056 (Uniform Resource Locators for Z39.50). For example, if the related term is in the database called aat on the server running on port 3950 on the host foo.bar.org, then the sourceDb element should have the value z39.50s://foo.bar.org:3950/aat.

Servers may, at their discretion, include additional tagSet-M, tagSet-G and string-tagged (type 3) elements in the records they return; they may include such additional elements at the top level, within postings or relation sub-records, or both. In particular, records may begin with a schemaIdentifier (1,1) element, in accordance with standard ``good practice''. Clients may display any such additional elements as they see fit, or may ignore them.

2.4. Element sets

Use of the element set f requests a full record, and so servers should respond by returning a record containing as many as possible of the elements listed above in section 2.3.

Use of the element set b requests a brief record. Servers should respond by returning a record omitting the administrative fields (termCreatedDate, termCreatedBy, termModifiedDate and termModifiedBy), any postings sub-records and all the relation sub-records.

This element set may be useful when constructing a summary of several records found by a search for initial entry points to a thesaurus; it is unlikely to be useful when navigating from term to term.

2.5. The Zthes-1 attribute set

This profile defines an attribute set called Zthes-1, which describes the additional access points needed for searching beyond those found in the standard utility and cross-domain attribute sets. It contains the following attributes, all of type Access Point:

Type Value Name Description
1 1 termQualifier searches in the termQualifier element of the top-level term record
1 2 termType searches in the termType element of the top-level term record
1 3 thesAdmin used for a variety of searches related to administrative details of thesaurus structure - see below
1 4 relatedTermID used in conjunction with a semantic qualifier (attribute type 2) with value equal to one of the relationTypes described in section 2.2 of the abstract model; searches for all records in the specified relation to the record whose termID is equal to the search term.

For example, a search for abc123 with access point relatedTermID and semantic qualifier NT finds all the narrower terms of the record whose termID is abc123.

1 5 termVocabulary Searches in the termVocabulary element of the top-level term record.
1 6 termCategory Searches in the termCategory element of the top-level term record.
1 7 termStatus Searches in the termStatus element of the top-level term record.
1 8 termApproval Searches in the termApproval element of the top-level term record.
1 9 termSortKey Searches in the termSortKey element of the top-level term record. This is not a useful thing to search for, but the index may also more reasonably be used in sorting.

The thesAdmin access point must be used with one of a small set of well-known strings as the search term. Servers may support the following values:

Searches for all records considered suitable as starting points for browsing.
Searches for a special record describing the thesaurus as a whole, and containing material such as introductory text and revision history that might be front matter in a printed thesaurus. This record, when it exists at all, may not be found in any other search.

This profile does not currently specify the format of this special record. If there is a need for a GRS-1 format, one will be defined; otherwise an XML record can be used, corresponding to the format defined for this record in the Zthes XML schema.

The Zthes-1 attribute set conforms to attribute set class 1 as described in the Z39.50 Attribute Architecture. However, it prescribes no rules to resolve conflict between its own semantics and those of another attribute set in the case where attributes for both are used in a single search term of a type-1 query and the top-level attribute-set of that query is Zthes-1.

2.6. Searching

Servers must support type-1 queries which use the following access points in a manner conformant to the definitions of the attribute sets which define them. Where possible, standard attributes from utility and cross-domain sets are re-used; in these cases, the generic names of the attributes are listed in the right-hand column.

Attribute Set Type Value Search For Generic Name
utility 1 4 termID local control number
cross-domain 1 1 termName title
zthes-1 1 1 termQualifier  
utility 1 11 all elements all access points

Note: Since the publication of version 0.2b of this profile, the numbering of the utility attribute set has changed, so that the all elements search now uses access point 11 rather than 10. Servers are encouraged to support searches from old clients which use the old attribute value 10 as well as the new one.

For the purpose of searches on the local control number access point, values of the termID function as opaque ``magic cookies''. Therefore, such search terms should not include any contentAuthority attribute, even if it happens that for the specific thesaurus in question, the termID identifiers are taken from a well-known source.

The following additional access points may optionally be supported:

Attribute Set Type Value Search For Generic Name
zthes-1 1 3 thesAdmin  
zthes-1 1 2 termType  
utility 1 3 termLanguage language
zthes-1 1 5 termVocabulary  
zthes-1 1 6 termCategory  
zthes-1 1 7 termStatus  
zthes-1 1 8 termApproval  
cross-domain 1 4 termNote description
utility 1 1 termCreatedDate record date
(with functional qualifier ``creation'')
utility 1 2 termCreatedBy record creator
(with functional qualifier ``creation'')
utility 1 1 termModifiedDate record date
(with functional qualifier ``modification'')
utility 1 2 termModifiedBy record creator
(with functional qualifier ``modification'')
utility 1 2 either termCreatedBy
or termModifiedBy
record creator
(with no functional qualifier)
zthes-1 1 4 relatedTermID  
(with a semantic qualifier from the relationType controlled vocabulary)

Note: Since the publication of version 0.2b of this profile, functional qualifiers for use with the Record Date/Time and Record Agent access points have been added to the utility attribute set. Accordingly, we now use the official functional qualifiers ``creation'' and ``modification''. Servers are encouraged to support searches from old clients which use the old functional qualifiers ``date/time created'', ``creator'', ``date/time last modified'' and ``last modifier'' as well as the new ones.

2.7. Explain

2.7.1. Overview

(The specification for the use of Explain with Zthes databases was contributed by Denis Lynch.)

A client can gain two kinds of information about Zthes databases from Explain: the fact that a particular database is a Zthes database, and the fact that a Zthes database is relevant to a particular TermList. These two uses are specified in the next two sections.

2.7.2. Identifying a Zthes database

Among the many features that a client could use to deduce that a particular database follows the Zthes profile, the profile distinguishes one required indicator. In the DatabaseInfo record for the database, the AccessInfo element must contain a schemas OID specifying the Zthes schema.

Once a client has observed the Zthes schema in the schemas for a database, it may presume that the server observes the behaviour described in this profile. (The client may still need other information from Explain, for example additional record syntaxes that may be available.)

2.7.3. Locating a relevant Zthes database

Any database may use a Zthes thesaurus or other type of authority file for the basis of the vocabulary used for an access point. This is described in Explain as follows: in the access point's TermListDetails record, the commonInfo element must contain an OtherInformation item encoded as an AuthorityFileInfo External. The AuthorityFileInfo External is defined as follows:

     AuthorityFileInfo ::= SEQUENCE {
	 name        [1] IMPLICIT HumanString,  -- for display
	 database    [2] IMPLICIT InternationalString,
			  -- z39.50s URL to the authority database.
			  -- Simplifies to a database name if on the same server.
	 exclusive   [3] IMPLICIT NULL OPTIONAL
			  -- If present, all terms in the term
			  -- list come from this authority file.
			  -- If absent, other terms may or may not
			  -- be present in the term list.

Note: It may be desirable to include an additional item to indicate the kind of authority file being referenced. If it is desirable, an appropriate identification scheme will be required.

3. Future directions

This document has already discussed several possible directions for subsequent versions of this profile, or perhaps future companion profiles. Areas for consideration include, but may not be limited to, the following:

  • A specification for the schema of the optional whole-thesaurus descriptive record that can be found by searching for thesAdmin=whole.
  • A way for clients to request servers to include postings sub-records only for a specified set of ``interesting'' target databases.
  • Use of the Scan service on termName to derive alphabetic displays.
  • Support for version numbers of thesauri and/or terms.
  • Post-coordination: for example: coal mining USE COAL + MINING