Proposed XML Schemas for Dublin Core Metadata
NOTE: These schemas have no endorsement beyond that of the authors.
These schemas (developed around 2002-07 by Tom Habing and Tim Cole) represent early explorations and prototypes for the schemas developed for
Notes on the W3C XML Schemas for Qualified Dublin Core which should be
considered the more definitive source.
Additional pages related to this work are:
Examples
Revisions
DTD Proposal
Some points about these schemas, in no particular order:
- One of the goals was to provide maximum flexibility in terms
of schema reuse. It should be easy to import the schemas into other schemas and redefine certain elements or types in order to create custom
schemas for specific applications.
- Another goal of these schemas was 'self-containedness' in that interpretation of the schemas should not require significant access to
external resources. Specifically, the information to simplify a refined element into its corresponding simple element should be contained in
the schema itself. This idea also includes liberal use of annotation, documentaion, and appinfo elements inside of the schemas to faciliate either
human or machine interpretation of the schemas. See the dc1.1.xsd for the best example of annotation and appinfo usage. Taken to the extreme,
a complete specification for DC could be embedded in annotations in the XML Schema. [Has anyone seen or thought of embedding RDF or RDFS inside
of appinfo elements -- might be an interesting way to combine XML and RDF Schemas?]
- Some of the concepts employed in the XML schemas are based on RDF concepts; however, the schemas do not represent RDF.
- Because of the way qualified encodings are specified, we know of no way currently to restrict their usage to only the elements or
refinements to which they are currently limited by the DC specifications. For example, the xsi:type="dcterms:MESH" type could be applied
to the dc:creator element, if desired, and the metadata would still validate using these schemas. This is an area we are still investigating.
XML schema attributes final, blocked, or abstract may be useful for this type of enforecement, but we are not sure if this can be done yet, or
even if this level of restriction is needed or desired.
- The xxx_container.xsd are not intended to be part of the canonical schema set, but are intended as examples of how the canonical schema
can be imported and reused inside of other schema.
- For what its worth, an attempt was made at some definitions.
Without further ado, the schemas and sample document instances:
- dc1.1.xsd
- This is the basic/simple root schema. It defines the core 15 elements, allowing only text values, with an optional xml:lang attribute.
However, the complexType used for the elements (SimpleLiteral) is defined in such a way as to be maximally extensible.
This schema also includes a convenience type which can be used to define a container element for the basic
15 elements. For example, an importing schema could use this type in a single statement to easily define a container element for which all the 1.1 element are included.
- simple_dc_container.xsd
- This schema imports the dc1.1.xsd and defines a root element which may contain any of the simple dc elements. The OAI namespace is
used only for illustration. A sample document instance which uses this schema is also included.
-
- dc1.1complex.xsd
- This schema includes the dc1.1.xsd, but it adds several new complexTypes based off of the dc1.1.xsd SimpleLiteral type. These types must be
indicated in an instance document by using the xsi:type attribute. They define several types which roughly correspond to the the RDF parseTypes of
'Literal' and 'Resource'. A type is also defined which uses XLink attributes to create a type that corresponds to an RDF property with a rdf:resource
attribute. It can be used to link a DC element to a resource defined elsewhere, either internally or externally. The xlink.xsd
schema was borrowed from the METS project.
- complex_dc_container.xsd
- This schema imports the dc1.1complex.xsd and defines a root element which may contain any of the simple dc elements. The OAI namespace is
used only for illustration. A sample document instance which uses this schema is also included.
-
- dc1.1complex_vcard.xsd
- This schema includes the dc1.1complex.xsd, but it defines a vCard complexType based on the ComplexResource. If this type is specified,
via xsi:type='dc:vCard' in an instance document, the DC element must contain only valid vCard subelements. The vCard XML schema we developed for this
purpose is also available.
- vcard_dc_container.xsd
- This schema imports the dc1.1complex_vcard.xsd and defines a root element which may contain any of the simple dc elements. The OAI namespace is
used only for illustration. A sample document instance which uses this schema is also included.
-
- dcterms.xsd
- This schema defines the qualified dublin core element refinements and encodings. This schema imports the dc1.1.xsd schema. It defines the
various refinements as substitutionGroups of the appropriate imported base 15 DC elements. By using substitutionGroups, the schema is self documenting,
but it also allows for enforcement of any constraints that might have been applied to the base 15 elements by an importing schema.
For example, a schema that imports the dc1.1.xsd and the dcterms.xsd, but then goes on to disallow certain of the core 15 elements will also automatically
disallow any of the corresponding refinement elements (see restricted_dc_container.xsd and test_restricted.xml).
The various qualified DC encodings are defined as complexTypes derived from the SimpleLiteral type. This means that they must be specified in instance
documents by use of the xsi:type attribute. Otherwise, they are identical to the SimpleLiteral type -- allow only text content with an optional xml:lang
attribute. However, as will be shown later, it is possible to easily restrict these encodings using XML data typing, regular expressions, or
enumerations.
- qualified_dc_container.xsd
- This schema imports the dc1.1.xsd and dcterms.xsd and defines a root element which may contain any of the simple dc elements, or their refinements.
The OAI namespace is used only for illustration. A sample document instance which uses this schema is also included.
-
- dcterms_tight.xsd
- This schema is identical to the dcterms.xsd except that (for illustration purposes) it restricts some of the encodings to only certain data types, specifically:
W3CDTF is restricted to only valid XML dateTime values. DCMIType is restricted to only values enumerated in the dcmitype.xsd.
URI are restricted to the XML anyURI type, the RFC1766 encoding is limited to the XML language type, the ISO639-2 encoding is restricted to values defined
in iso639-2.xsd, and the Period encoding is restricted to only the strings which conform to the pattern given in
dcmi-period.xsd. Given additional time and motivation it would
be possible to restrict other of the encodings so as to only allow valid data types. This could be done via XML regular expressions,
enumerations, or possibly other means. NOTE: The XSV validator currently will not validate all possible XML data types, specifically the dateTime type
among others.
- tightly_qualified_dc_container.xsd
- This schema imports the dc1.1.xsd and dcterms_tight.xsd and defines a root element which may contain any of the simple dc elements, or their refinements.
The OAI namespace is used only for illustration. A sample document instance which uses this schema is also included.
Dumb-down
Below are two XSLT stylesheets that can be used to
dumb-down qualified DC refinement elements, as defined in our XML schema,
into their equivalent simple DC elements. All other elements in the source
XML are left unchanged.
The first, DCDD.xsl, isn't that interesting in that it just hardcodes the
mappings from DCQ to DC elements in the stylesheet.
The second, DCDD2.xsl, is more interesting in that it actually uses the
substitutionGroup attributes in the dcterms XML Schema to determine how to
dumb-down the qualified elements. Theoretically, if additional refinements
are added to the dcterms namespace, the dumb-down stylesheet would not
require changes. Right now I am using the schema at
http://homes.ukoln.ac.uk/~lispj/boston/boston3/dcterms.xsd, but this would
be changed to where ever the final home for the stylesheet is located.
Both stylesheets currently throw out any unknown elements from the dcterms
namespace. They also both leave the xsi:type attributes, even on the
dumb-downed elements. Both of these behaviors could be easily changed.
Tom Habing, 2002-07