|
Home
| Resource Description, Discovery
and Retrieval (Summary)
Full Text: Page 1 | Page 2 | Page 3 | Page 4
Clumps
Clumps are groups of organisations, typically public libraries,
academic libraries and other organisations (museums and archives
for example) which agree to co-operate in areas such as
consortium purchase of resources (particularly electronic
resources), inter-lending and by linking their catalogues
together. An important element of most clumps is that they set
up a gateway or portal which includes a facility for users to
carry out simultaneous searches across the catalogues and/or
databases of all the participating organisations. In order to
achieve this the clump has to adopt a common metadata profile,
to which the various profiles in use amongst consortium members
can be mapped. A number of profiles have been developed
specifically to support this kind of activity, for example the
Bath Profile and the ONE-2 Profile. Z39.50 or XML will be used
to facilitate searches across the various catalogues and/or
databases. Licensing and copyright of the various electronic
resources made available via the clump may be problematic.
Community information (see
also Diverse cultural
content, Information
services, E-government)
Essex County Council in the UK is pioneering the development of
a distributed community information system. Instead of the
traditional centralised database of community information Essex
Online (which grew out of the SEAMLESS project) uses Z39.50 and
Harvest to search the databases and websites of participating
organisations in a single integrated search, this is also known
as deep integration. Over 30 local organisations (councils,
universities and colleges, business agencies, health
authorities, utility companies, the local newspaper group,
voluntary agencies etc) currently make their data available to
the system and the number is growing fast.
In order for the system to search across these distributed
datasets the partners have adopted a common metadata profile
based on the e-gms, which is itself based on Dublin Core, and a
common thesaurus for assigning subject descriptors. Essex has
recently been awarded funding by the New Opportunities Fund to
develop the system nationally by bringing in data from big
national information providers such as nhsDirect Online and
making the search facilities available in a further 8 Local
Authority areas, covering 6 million people. For more information
see the seamlessUK website).
Learning Support
Some public libraries are beginning to work in partnership with
other organisations, colleges, universities etc to provide
better support to learners. Typically the partners will agree to
open their libraries (and their services) to each other’s
customers and may introduce a common library ticket. They will
often develop systems to allow all of their catalogues and/or
databases to be searched in a single search (see Clumps above
for an explanation of what this involves). Examples include:
Libraries Access Sunderland scHeme and Glasgow Digital Library
Project.
In the UK the Joint Information Services Committee (JISC) is
developing the Distributed National Educational Resource (DNER)
to provide information and resources to support research and
learning. This is a very large and innovative project which
builds on the RDN idea of subject portals and is leading the way
in the provision of resources in the networked information
environment.
Many academic libraries are also involved with developing
Managed or Virtual Learning Environments (MLEs, VLEs). In the
UK, UKOLN are leading the Metadata for Education Group (MEG) to
consider the implications for metadata to support such systems.
Public libraries too may have a role to play here particularly
with regard to their role in supporting formal learning and
lifelong learning.
GOOD PRACTICE GUIDELINES: UNDERLYING TECHNOLOGIES
Underlying technologies
XML
Extensible Mark-up Language (XML) is the successor to HTML. It
offers a number of advantages over HTML and a key distinction is
that it allows the separation of content and presentation. The
W3C has produced a 10 point summary which covers the main points
about XML. For more detailed technical information see the xml
section of the W3C website or the xml.com site. There is a great
deal of research and development activity focussed around XML –
for an update on XML query tools such as xpath see the webpage
for the W3C XML Query Group. Libraries may consider moving to
XHTML. Metadata sharing and XML are briefly described including
its limitations and how it supports the effective sharing of
information in different contexts in a UKOLN report.
RDF
The Resource Description Framework (RDF) enables the encoding,
exchange and re-use of structured metadata, using XML as an
interchange syntax. In this way it supports the integration of a
variety of applications from library catalogues and world-wide
directories; to syndication and aggregation of news, software,
and content; to personal collections of music, photos, and
events.
RDF allows statements to be made about a resource as a set of
properties that conform to a named schema. Statements are
recorded in rdf:Description XML elements.
The reason it is so powerful is that it imposes structural
constraints which support the consistent and unambiguous
encoding and exchange of standardized metadata and this provides
for the interchangeability of separate packages of metadata
defined by different resource description communities. In
addition RDF provides a means for publishing both human-readable
and machine-processable vocabularies designed to encourage the
reuse and extension of metadata semantics among disparate
information communities – see an introduction to RDF for more
information. Descriptions of RDF sometimes use slibraries as an
analogy. For technical information see the W3C RDF
webpages.seamlessUK is an example of a community information
system that uses the RDF to encode its metadata.
Z39.50
Z39.50 is an international search and retrieve protocol (ISO
23950, 1998) which allows searching of (usually remote)
heterogeneous databases and retrieval of data, via one user
interface. Z39.50 defines a standard way for two computers to
communicate and share information. Designed to support searching
and information retrieval - full-text documents, bibliographic
data, images and multimedia – it is based on client-server
architecture and is fully operational over the Internet. It
allows users to search several catalogues, or other databases,
in a single integrated search. Note: Until XML query languages
evolve further Z39.50 may still be the preferred search and
retrieve protocol for systems offering complex distributed
searching.
Harvest
Harvesting software is a means of harvesting, or gathering,
metadata information from a list of pre-determined web sources,
for example the webpages of participating organisations. In the
Seamless project (see above), Harvest creates an index file
which the system searches in response to a user query. The
results are then integrated with the results of the Z39.50
searches and are presented to the user as a single ‘hit list.’
Clicking on any of the results either opens the webpage (for
harvested records) or the database record (for Z39.50 records).
Harvesting forms the basis of the Open Archives Initiative (OAI)
which is a co-operative venture aimed at making it easier to
find information over the web through the development and
application of interoperability standards. It also has potential
for the Museums community.
Image Retrieval
Until recently image had to be searched for by text descriptors
or classification codes supported in some cases by text
retrieval packages designed or adapted specially to handle
images. The Getty’s AAT (Art and Architecture Thesaurus)
consists of 120,000 terms for describing objects, textural
materials, images, architecture, and cultural heritage material.
Images can also be classified using systems such as ICONCLASS
for works of art and museum exhibits; TELCLASS for television
and video and the Social History and Industrial Classification
for museum objects.
More modern systems can now retrieve images which have not been
verbally described.
CBIR (Content Based Image Retrieval) does not use keyword
indexing. Rather the image is retrieved using inherent
characteristics of the image itself such as colour, texture or
shape e.g. a beach scene would be blue at the top and yellow at
the bottom. There is a technical discussion of the different
types of CBIR. There are commercial systems of CBIR: IBM’s QBIC
(Query by Image Content): the image is described in terms of
areas of colour and shape and the retrieval software executes
the search for images matching the description. It is not
necessary to say what the subject of the image is. For a
demonstration look on the website of the Hermitage
Museum,
Excalibur’s Image retrieval Ware.
There is a discussion (in French) of image description and
classification in Cursus. There is also a technical discussion
of the problems of types of image retrieval on the Diffuse
website which covers ‘lossy’ versus ‘non-lossy’ types of image
retrieval: files which move electronic images without loss of
definition are bulkier and slower than those which involve some
kind of loss.
Home
| Resource Description, Discovery
and Retrieval (Summary)
Full Text: Page 1 | Page 2 |
Page 3 | Page 4
|