|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Specifications Abstract model XML schema CQL context set SRU/SRW Z39.50 Annexes Implementations Applications Thesauri Historical Old profile Old XML DTD Navigation What's new? Site map |
Zthes: a Z39.50 Profile for Thesaurus Navigation, version 0.56th November 2001 Mike TaylorVersion 0.5 $Header: /home/mike/cvs/web/zthes2/comp/profile/zthes-05.html,v 1.1 2006-02-20 09:54:06 mike Exp $
1. Introduction
1.1. Overview 1.2. Status 1.3. Scope 1.4. Acknowledgements 2. Abstract Model 2.1. Overview 2.2. Schema 2.3. Searching 3. Z39.50 Specification 3.1. Overview 3.2. tagSet-Zthes 3.3. Schema 3.4. Element Sets 3.4.1. Element Set ``f'' 3.4.2. Element Set ``b'' 3.5. The Zthes-1 Attribute Set 3.6. Searching 3.7. Explain 3.7.1. Overview 3.7.2. Identifying a Zthes Database 3.7.3. Locating a Relevant Zthes Database 4. Future Directions 5. Appendix A. References 6. Appendix B. The Zthes Abstract Model in XML 6.1. The Zthes DTD for XML 6.2. Sample Zthes-in-XML Document 7. Appendix C. Implementations 8. Appendix D. Applications
Version history is now available at
zthes.z3950.org/profile
1. Introduction1.1. OverviewThis document describes an abstract model for representing and searching thesauri - semantic hierarchies of terms as described in ISO 2788 [2] - and specifies how this model may be implemented using the Z39.50 [1] protocol. It also suggests how the model may be implemented using other protocols and formats.1.2. StatusAlthough technically this document is open for review, and may be altered before being re-released as version 1.0, the reality is that its core specifications has remained constant for eighteen months. We expect that any further changes will be additions rather than modifications to the existing functionality.All feedback is very welcome, and should be emailed to the author at <mike@zthes.z3950.org> 1.3. ScopeThis profile is laid out in two main sections. The first is concerned solely with the abstract representation of thesaurus terms and how they may be searched; and the second with the implementation of these abstract concepts in Z39.50: how thesaurus terms are encoded in the GRS-1 record structure, how searches are encoded in the type-1 query, etc.It is intended that the abstract model described here is sufficiently general that it can also be implemented by protocols and data formats other than Z39.50. As an example, an appendix defines an XML DTD for thesaurus terms based on the model, and includes an example XML document using that DTD. Because the model is abstract, it will generally have little or no effect on the data model used by a server's local representation of its thesaurus. In particular, pre-existing thesaurus databases may well represent a more complete model of a term than what's required by this profile's abstract model. That's fine: the additional information is not be representable in the Zthes model, but may be used in any number of other ways beyond the scope of this profile. This profile does not mandate any relationship between a thesaurus and any other database. The model is that terms from any thesaurus database may be used to search any other database (called a target database). 1.4. AcknowledgementsThis document represents the consensual outcome of extensive discussions between the members of the informally convened Zthes working group:
2. Abstract Model2.1. OverviewThis profile represents a thesaurus as a database of inter-linked terms. If multiple thesauri are to be supported by a single server, then they must be presented as separate databases.Each individual term in a thesaurus is represented by a record in the database. In the interests of simplicity and orthogonality, even non-preferred terms must be represented by their own records. Term records consist of an initial part describing the term itself (with information such as its unique identifier, scope note, etc.), together with sub-records (that is, named sections within the main record) briefly describing related terms. The primary means of navigation from one term to another is by searching for the unique identifiers of the terms related to the first one. 2.2. SchemaIn the element tables in this profile, the occurrence columns describe whether the elements are mandatory and/or repeatable as follows:
The top level term record is composed of the following elements:
It is recognised that in many thesauri there is no explicit unique identifier field, and the term itself, perhaps in combination with the qualifier, uniquely identifies a record. Thesauri such as these must nevertheless provide a termID field, which may be automatically generated simply by combining the term and qualifier. The termType element may take the following values:
Servers may return other values of termType at their discretion. It is recommended that such extension values begin with the string ``X-''. Each postings sub-record is composed of the following elements:
If a server wishes to communicate separate postings counts for a term in more than one field, then multiple postings sub-records with the same value of sourceDb should be used. Each relation sub-record is composed of the following elements:
The relationType element may take the following values:
Servers may return other values of relationType at their discretion. It is recommended that such extension values begin with the string ``X-''. With a single exception, this profile deliberately restricts its set of supported relations to those discussed in ISO 2788 [2], in the belief that it is better for a small set of relations to be used interoperably than for a larger set to be specified, with different servers and clients in practice using different subsets. That sole exception is the addition to the standard relation types of ``LE'', introduced to model the multilingual links described in ISO 5964 [7]. The ``NT'' and ``BT'' relationships are reciprocal; so are ``USE'' and ``UF''; and ``RT'' and ``LE'' are reflexive. That is, when any term T1 points to another T2 using the relation ``NT'', T2 should point back to T1 using ``BT'' and vice versa; when T1 points to T2 using the relation ``USE'', T2 should point back to T1 using ``UF'' and vice versa; and when T1 points to T2 using the relation ``RT'' or ``LE'', T2 should point back to T1 using the same relation. The termType element in a relation sub-record may take the same values as in the top-level record. 2.3. SearchingThe following searches must be supported:
Support for additional searches, including the following, may be useful.
3. Z39.50 Specification3.1. OverviewThis profile builds on the work of others by using the Z39.50 Attribute Architecture [3] together with the Utility Attribute Set [4] and the Cross-Domain Attribute Set [5] developed for it.As such, it requires servers and clients to support version 3 of the Z39.50 protocol. The intention of this profile is that it is used in conjunction with other profiles. It is envisaged that an application will use the Zthes profile to navigate a thesaurus and thereby obtain terms suitable for searching in a target database; and use a second, domain-specific profile such as GILS or CIMI to search in and retrieve from that database. The Z39.50 objects defined by this profile have the following OIDs.
3.2. tagSet-ZthesThis profile defines a tag set called tagSet-Zthes, which describes the additional tags needed in the schema beyond those found in the standard tagSet-M and tagSet-G. It contains the following elements, corresponding to the same-named elements in the abstract model schema described above:
3.3. SchemaIn the Zthes schema, tag types indicate elements from the following tag sets:
The abstract schema described in section 2.2 is represented in Z39.50 by a GRS-1 record encoded with the tag-paths specified in the following table. Where possible, standard tags from tagSet-M and tagSet-G are re-used; in these cases, the generic names of the tags are listed in the right-hand column.
The termLanguage element is expressed as one of the standard codes described in RFC 1766 [8] and ISO 639 [9] - for example, ``en'' for English, ``fr'' for French and ``de'' for German. The administrative date fields should be returned in the ASN.1 GeneralizedTime format. (The working group considered the Z39.50 ASN.1 date/time definition [11], but reached the conclusion that the benefits would be outweighed by the barrier raised to implementation.) The person-name elements, termCreatedBy and termModifiedBy, may be returned in whatever format is convenient for the server: this profile does not attempt to address the interpretation of such administrative information across multiple databases. The sourceDb element should be returned in the form of a z39.50s URL as described in RFC 2056 [10]. For example, if the related term is in the database called ``aat'' on the server running on port 3950 on the host foo.bar.org, then the sourceDb element should have the value z39.50s://foo.bar.org:3950/aat. Servers may, at their discretion, include additional tagSet-M, tagSet-G and string-tagged (type 3) elements in the records they return; they may include such additional elements at the top level, within relation sub-records, or both. In particular, records may begin with a schemaIdentifier (1,1) element, in accordance with standard ``good practice''. Clients may display any such additional elements as they see fit, or may ignore them. 3.4. Element Sets3.4.1. Element Set ``f''Use of the element set ``f'' requests a full record, and so servers should respond by returning a record containing as many as possible of the elements listed above in the table in section 3.3.3.4.2. Element Set ``b''Use of the element set ``b'' requests a brief record. Servers should respond by returning a record omitting the administrative fields (termCreatedDate, termCreatedBy, termModifiedDate and termModifiedBy), and all the relation sub-records.This element set may be useful when constructing a summary of several records found by a search for initial entry points to a thesaurus; it is unlikely to be useful when navigating from term to term. 3.5. The Zthes-1 Attribute SetThis profile defines an attribute set called Zthes-1, which describes the additional access points needed for searching beyond those found in the standard utility and cross-domain attribute sets. It contains the following attributes, all of type Access Point:
The thesAdmin access point must be used with one of a small set of well-known strings as the search term. Servers may support the following values:
The Zthes-1 attribute set conforms to attribute set class 1 as described in the Z39.50 Attribute Architecture. However, it prescribes no rules to resolve conflict between its own semantics and those of another attribute set in the case where attributes for both are used in a single search term of a type-1 query and the top-level attribute-set of that query is Zthes-1. 3.6. SearchingServers must support type-1 queries which use the following access points in a manner conformant to the definitions of the attribute sets which define them. Where possible, standard attributes from utility and cross-domain sets are re-used; in these cases, the generic names of the attributes are listed in the right-hand column.
Note: Since the publication of version 0.2b of this profile, the numbering of the utility attribute set has changed, so that the all elements search now uses access point 11 rather than 10. Servers are encouraged to support searches from old clients which use the old attribute value 10 as well as the new one. For the purpose of searches on the local control number access point, values of the termID function as opaque ``magic cookies''. Therefore, such search terms should not include any contentAuthority attribute, even if it happens that for the specific thesaurus in question, the termID identifiers are taken from a well-known source. The following additional access points may optionally be supported:
Note: Since the publication of version 0.2b of this profile, functional qualifiers for use with the Record Date/Time and Record Agent access points have been added to the utility attribute set. Accordingly, we now use the official functional qualifiers ``creation'' and ``modification''. Servers are encouraged to support searches from old clients which use the old functional qualifiers ``date/time created'', ``creator'', ``date/time last modified'' and ``last modifier'' as well as the new one. 3.7. Explain3.7.1. Overview(The specification for the use of Explain with Zthes databases was contributed by Denis Lynch.)A client can gain two kinds of information about Zthes databases from Explain: the fact that a particular database is a Zthes database, and the fact that a Zthes database is relevant to a particular TermList. These two uses are specified in the next two sections. 3.7.2. Identifying a Zthes DatabaseAmong the many features that a client could use to deduce that a particular database follows the Zthes profile, the profile distinguishes one required indicator. In the DatabaseInfo record for the database, the AccessInfo element must contain a schemas OID specifying the Zthes schema.Once a client has observed the Zthes schema in the schemas for a database, it may presume that the server observes the behaviour described in this profile. (The client may still need other information from Explain, for example additional record syntaxes that may be available.) 3.7.3. Locating a Relevant Zthes DatabaseAny database may use a Zthes thesaurus or other type of authority file for the basis of the vocabulary used for an access point. This is described in Explain as follows: in the access point's TermListDetails record, the commonInfo element must contain an OtherInformation item encoded as an AuthorityFileInfo External. The AuthorityFileInfo External is defined as follows:AuthorityFileInfo ::= SEQUENCE { name [1] IMPLICIT HumanString, -- for display database [2] IMPLICIT InternationalString, -- z39.50s URL to the authority database. -- Simplifies to a database name if on the same server. exclusive [3] IMPLICIT NULL OPTIONAL -- If present, all terms in the term -- list come from this authority file. -- If absent, other terms may or may not -- be present in the term list. } Note: It may be desirable to include an additional item to indicate the kind of authority file being referenced. If it is desirable, an appropriate identification scheme will be required. 4. Future DirectionsThis document has already discussed several possible directions for subsequent versions of this profile, or perhaps future companion profiles. Areas for consideration include, but may not be limited to, the following:
5. Appendix A. References
6. Appendix B. The Zthes Abstract Model in XMLThis appendix has been removed, since more up to date information about the XML representation of the Zthes data model can now be found at zthes.z3950.org/xml 6.1. The Zthes DTD for XML6.2. Sample Zthes-in-XML Document7. Appendix C. ImplementationsThis appendix has been removed, since more up to date information about implementations can now be found at zthes.z3950.org/impl 8. Appendix D. ApplicationsThis appendix has been removed, since more up to date information about applications can now be found at zthes.z3950.org/apps |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Feedback to <mike@zthes.z3950.org> is welcome! |