Home | Resource Description, Discovery 
and Retrieval
(Summary)
Full Text: Page 1 | Page 2 | Page 3 | Page 4

Clumps
Clumps are groups of organisations, typically public libraries, academic libraries and other organisations (museums and archives for example) which agree to co-operate in areas such as consortium purchase of resources (particularly electronic resources), inter-lending and by linking their catalogues together. An important element of most clumps is that they set up a gateway or portal which includes a facility for users to carry out simultaneous searches across the catalogues and/or databases of all the participating organisations. In order to achieve this the clump has to adopt a common metadata profile, to which the various profiles in use amongst consortium members can be mapped. A number of profiles have been developed specifically to support this kind of activity, for example the Bath Profile and the ONE-2 Profile. Z39.50 or XML will be used to facilitate searches across the various catalogues and/or databases. Licensing and copyright of the various electronic resources made available via the clump may be problematic.

Community information (see also Diverse cultural content, Information services, E-government)
Essex County Council in the UK is pioneering the development of a distributed community information system. Instead of the traditional centralised database of community information Essex Online (which grew out of the SEAMLESS project) uses Z39.50 and Harvest to search the databases and websites of participating organisations in a single integrated search, this is also known as deep integration. Over 30 local organisations (councils, universities and colleges, business agencies, health authorities, utility companies, the local newspaper group, voluntary agencies etc) currently make their data available to the system and the number is growing fast.

In order for the system to search across these distributed datasets the partners have adopted a common metadata profile based on the e-gms, which is itself based on Dublin Core, and a common thesaurus for assigning subject descriptors. Essex has recently been awarded funding by the New Opportunities Fund to develop the system nationally by bringing in data from big national information providers such as nhsDirect Online and making the search facilities available in a further 8 Local Authority areas, covering 6 million people. For more information see the seamlessUK website).

Learning Support
Some public libraries are beginning to work in partnership with other organisations, colleges, universities etc to provide better support to learners. Typically the partners will agree to open their libraries (and their services) to each other’s customers and may introduce a common library ticket. They will often develop systems to allow all of their catalogues and/or databases to be searched in a single search (see Clumps above for an explanation of what this involves). Examples include: Libraries Access Sunderland scHeme and Glasgow Digital Library Project.

In the UK the Joint Information Services Committee (JISC) is developing the Distributed National Educational Resource (DNER) to provide information and resources to support research and learning. This is a very large and innovative project which builds on the RDN idea of subject portals and is leading the way in the provision of resources in the networked information environment.

Many academic libraries are also involved with developing Managed or Virtual Learning Environments (MLEs, VLEs). In the UK, UKOLN are leading the Metadata for Education Group (MEG) to consider the implications for metadata to support such systems. Public libraries too may have a role to play here particularly with regard to their role in supporting formal learning and lifelong learning.

GOOD PRACTICE GUIDELINES: UNDERLYING TECHNOLOGIES

Underlying technologies

XML
Extensible Mark-up Language (XML) is the successor to HTML. It offers a number of advantages over HTML and a key distinction is that it allows the separation of content and presentation. The W3C has produced a 10 point summary which covers the main points about XML. For more detailed technical information see the xml section of the W3C website or the xml.com site. There is a great deal of research and development activity focussed around XML – for an update on XML query tools such as xpath see the webpage for the W3C XML Query Group. Libraries may consider moving to XHTML. Metadata sharing and XML are briefly described including its limitations and how it supports the effective sharing of information in different contexts in a UKOLN report.

RDF
The Resource Description Framework (RDF) enables the encoding, exchange and re-use of structured metadata, using XML as an interchange syntax. In this way it supports the integration of a variety of applications from library catalogues and world-wide directories; to syndication and aggregation of news, software, and content; to personal collections of music, photos, and events.

RDF allows statements to be made about a resource as a set of properties that conform to a named schema. Statements are recorded in rdf:Description XML elements.

The reason it is so powerful is that it imposes structural constraints which support the consistent and unambiguous encoding and exchange of standardized metadata and this provides for the interchangeability of separate packages of metadata defined by different resource description communities. In addition RDF provides a means for publishing both human-readable and machine-processable vocabularies designed to encourage the reuse and extension of metadata semantics among disparate information communities – see an introduction to RDF for more information. Descriptions of RDF sometimes use slibraries as an analogy. For technical information see the W3C RDF webpages.seamlessUK is an example of a community information system that uses the RDF to encode its metadata.

Z39.50
Z39.50 is an international search and retrieve protocol (ISO 23950, 1998) which allows searching of (usually remote) heterogeneous databases and retrieval of data, via one user interface. Z39.50 defines a standard way for two computers to communicate and share information. Designed to support searching and information retrieval - full-text documents, bibliographic data, images and multimedia – it is based on client-server architecture and is fully operational over the Internet. It allows users to search several catalogues, or other databases, in a single integrated search. Note: Until XML query languages evolve further Z39.50 may still be the preferred search and retrieve protocol for systems offering complex distributed searching.

Harvest
Harvesting software is a means of harvesting, or gathering, metadata information from a list of pre-determined web sources, for example the webpages of participating organisations. In the Seamless project (see above), Harvest creates an index file which the system searches in response to a user query. The results are then integrated with the results of the Z39.50 searches and are presented to the user as a single ‘hit list.’ Clicking on any of the results either opens the webpage (for harvested records) or the database record (for Z39.50 records).

Harvesting forms the basis of the Open Archives Initiative (OAI) which is a co-operative venture aimed at making it easier to find information over the web through the development and application of interoperability standards. It also has potential for the Museums community.

Image Retrieval
Until recently image had to be searched for by text descriptors or classification codes supported in some cases by text retrieval packages designed or adapted specially to handle images. The Getty’s AAT (Art and Architecture Thesaurus) consists of 120,000 terms for describing objects, textural materials, images, architecture, and cultural heritage material. Images can also be classified using systems such as ICONCLASS for works of art and museum exhibits; TELCLASS for television and video and the Social History and Industrial Classification for museum objects.

More modern systems can now retrieve images which have not been verbally described.

CBIR (Content Based Image Retrieval) does not use keyword indexing. Rather the image is retrieved using inherent characteristics of the image itself such as colour, texture or shape e.g. a beach scene would be blue at the top and yellow at the bottom. There is a technical discussion of the different types of CBIR. There are commercial systems of CBIR: IBM’s QBIC (Query by Image Content): the image is described in terms of areas of colour and shape and the retrieval software executes the search for images matching the description. It is not necessary to say what the subject of the image is. For a demonstration look on the website of the Hermitage Museum, Excalibur’s Image retrieval Ware.


There is a discussion (in French) of image description and classification in Cursus. There is also a technical discussion of the problems of types of image retrieval on the Diffuse website which covers ‘lossy’ versus ‘non-lossy’ types of image retrieval: files which move electronic images without loss of definition are bulkier and slower than those which involve some kind of loss.

Home | Resource Description, Discovery 
and Retrieval
(Summary)
Full Text: Page 1 | Page 2 | Page 3 | Page 4


Select a country to view information on public libraries


Digital Guidelines Manuals
Click here to view


The PULMAN
Online Database of Education Resources


Private Section for PULMAN partners only.
Click here to Enter

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Last updated 11/05/2004
Site best viewed with IE 4.0 or above