ProQuest

Deep Indexing added to selected databases

 
About CSA Products Support & Training News and Events Discovery Guides Contact Us
Quick Links
> Illustrata: Natural Sciences
> Illustrata: Technology
> ProQuest Deep Indexing: Agriculture
> ProQuest Deep Indexing: Aquatic Sciences
> ProQuest Deep Indexing: Atmospheric Sciences
> ProQuest Deep Indexing: Biological Sciences
> ProQuest Deep Indexing: Earth Science
> ProQuest Deep Indexing: Engineering
> ProQuest Deep Indexing: Environmental Sciences
> ProQuest Deep Indexing: High Technology and Aerospace
> ProQuest Deep Indexing: Materials Science
> ProQuest Deep Indexing: Polymer Science
> ProQuest Deep Indexing: Technology
 
 

Deep Indexing: FAQs

 
 
FAQ Contents
Indexing and Record Format

    Q: What do we mean by "deep indexing"?

    A: Deep indexing is the process of extracting and interpreting data about tables, figures, or other media objects from the full text of an electronic document, such as an article from a scholarly journal. During the indexing process, records describing each object is created and associated with the abstract record of the original article.

    Q: How are Deep Indexing terms created?

    Object descriptor terms and/or phrases are added to object records to enhance retrievability. These descriptors are divided into four groups:

    1. Subject descriptors – natural language terms describing an object, e.g., "oxygen consumption," "growth rate," "water temperature," or "mercury concentration."
    2. Geographic descriptors - controlled geographic terms describing an object. These terms may not be hierarchical. For example, if a rule exists, we automatically assign the index terms "Canada, Ontario, Toronto, Don R." when "Don River" appears in the caption. However, if no rule exists, only "Don R." will be assigned.
    3. Taxonomic descriptors – Latin taxonomic terms describing an object, often accompanied by a common name. The terms are controlled, but may not be hierarchical at this time. An example of a taxonomic term is "Esox lucius."
    4. Statistical descriptors - controlled statistical analyses terms describing an object, e.g. ANOVA - Analysis of Variance. View the full list of object statistical terms.

    An additional indexing component is the addition of Classifications - or Categories - to object records, defining the object in terms of format. "Table" is self-evident, but "Figure" can be split into dozens of sub categories, such as "Line Graph", "Pie Chart", etc. View the full list of object categories.

    The Indexing Process

    Indexers identify the key variables (or data) that best describe the data illustrated in the images. For example, they know that the terms along the axes of a graph are important, or the terms that are column/row headers in a table. If the caption has some important terms, they are supposed to capture those as well (e.g. names of organisms, geographic terms, or other subject terms that are important, but may not always be displayed in the actual figure or table).

    To assist the indexers, the entire caption, table and other relevant text is sent through some automated indexing routines which match terms in the caption, for example, with terms in our controlled vocabularies. Any of the matches may be useful for the index, although these terms may be removed at the discretion of the indexers thus providing a natural language index to the images.

    Q: Is ProQuest using a controlled thesaurus for its Deep Indexing?

    A: For the most part, ProQuest uses Natural Language to index the tables and figures. This allows a researcher better recall when using vocabulary specific to his or her field. However, in specific instances such as taxonomic and geographic indexing, a controlled vocabulary is applied.

    Q: What "pick lists" does ProQuest use for Deep Indexing?

    A: A "pick list", is a set of directories and subdirectories used to organize data. These have the advantage of pre-collecting a set of resources about a topic. They are often a quick and reliable way to find a starting point for your research.

    ProQuest indexers have 2 "pick lists" of terms to choose from which they use to classify the Object Categories and Statistical Terms.

    Around 30% of objects in the database fall under the category of table. The remaining 70% fall under the broad heading of "figure". The 5 main Object Categories for figures are: Graph, Illustration, Map, Photograph and Transmission/Emission Image. Each of these categories are then subdivided further (except for Transmission/Emission Image) allowing you to be very specific, or quite broad in the type of figure you are searching for. View the full list of object categories. Each level of the hierarchy is indexed so an individual record could have all three levels represented in the category field (M1).

    There are over 140 different Statistical Terms in use within ProQuest Deep Indexing resources. View the full list of object statistical terms. You can search for a specific technique by using the field code for the field Object Statistical Terms, Q8. This is most easily done in the Advanced Search, under the Tables & Figures tab. Enter a term exactly as is appears on the list or a unique word from a term.

    Q: What is the benefit of Natural Language Indexing?

    A: Natural Language indexing can assign an unlimited number of free text terms to a given table or figure. This allows researchers to locate objects using terms they use in day-to-day research.

    Q: What kind of metadata are we attaching to the object?

    A: Attached to each object will be a number of different sets of metadata. See a list of all possible fields and explanations in the table below:

    Field name:

    Label

    Examples*:

    Accession Number

    AN=

    AN=301-0001107432

    Affiliation

    AF=

    AF=Roseland Observatory

    Author

    AU=

    AU= Thelen Giles

    Caption

    C1=

    C1=( Salmon River Basin) and (water temperature)

    Category

    M1=

    M1=line graph

    DOI

    DO=

    DO=10.1605/01.301-0001070647.2006

    ISSN

    IS=

    IS=0036-8075

    Object DOI

    OI=

    OI=10.1605/01.301-0000094637.2005

    Object Descriptors:

    OD=

    OD=Absorption units

    Object Geographic Terms

    Q7=

    Q7=USA, Maryland

    Object Statistical Terms

    Q8=

    Q8=Standard Deviation

    Object Subject Terms

    Q5=

    Q5=climate sensitivity

    Object Taxonomic Terms

    Q6=

    Q6=Algae

    Publication Year

    PY=

    PY=2007

    Publisher

    PB=

    PB=Blackwell Publishing

    Taxonomic Terms

    TX=

    TX=

    Title

    TI=

    TI= Solar eclipse: Testing IR flux during solar eclipse

    *Examples are not all taken from the same record

    Q: What browsable indexes exist for this product?

    A: There are four browsable indexes for ProQuest Deep Indexing resources: Author, Journal Name, Category and Object Descriptors.

    The Author and Journal name indices can be used to identify authors or journals included in ProQuest Deep Indexing resources.

    The Category index highlights around 60 different options you can use to limit or specify the particular type of image you wish to locate

    The Object Descriptors index allows you to search through the Natural Language index terms used against any/all of the objects contained within ProQuest Deep Indexing resources.

    This last index, the Objects Descriptors Index, may be particularly useful because it is an alphabetical list of all the terms from each of the more specific fields:

    Object Geographic Terms, Q7=
    Object Statistical Terms, Q8=
    Object Subject Terms, Q5=
    Object Taxonomic Terms, Q6=

    Q: Do we include the specific page number of the article in which the Object may be found?

    A: Yes, you can find the specific page number of the article the object was found on both within the caption attached to the object itself, and as a separate entry on the object record page, under the source field.

    Keeping this information stored on the caption within the object means that even if you copy the object itself from the specific ProQuest Deep Indexing resource you will always be able to easily locate the source information for this object.

Searching, Display, Linking & Rights

    Q: Do ProQuest Deep Indexing resources search the full text of the articles?

    A: ProQuest Deep Indexing resources do not search the full text of the articles. Instead, they enable precision searching by searching the text and data surrounding tables and figures. They are able to search:

    • The caption of the image
    • The image category (graph, satellite image, etc)
    • Terms used in the deep indexing of the document: these include subject, taxonomic, geographic and statistical descriptor terms taken from the image caption, data variable labels and surrounding text
    • Units for subject variables.

    Often, databases that search the full text of an article are not able to search the tables and figures, as the text in tables and figures form part of an image.

    Q: Can we search ProQuest Deep Indexing resources both by themselves and combined with other content?

    A: Login links can be created to Illustrata: Natural Sciences or Illustrata: Technology, or the database can be selected from the databases page. Additionally, these resources will be invoked for subscribers when any database from the Natural Sciences or Technology area is searched. Note that some databases span disciplines, so Illustrata: Natural Sciences or Illustrata: Technology may be invoked when searching another subject area, e.g., Social Sciences, if the institution subscribes to one of these 'spanning' databases. Other deep indexing resources that supplement specific databases (E.g. ProQuest Deep Indexing: Earth Sciences for GeoRef) will always be invoked when the original file is searched and cannot be searched separately.

    Q: Can we search by coordinates of the X and Y axis for tables, charts and graphs?

    A: We do not extract coordinate data when indexing objects at this time. We do, however, extract and index the text describing the axes whenever possible.

    Q: Is retrieval different using Natural Language Index terms instead of controlled vocabulary terms?

    A: When using Natural Language Index terms, it is not necessary for the searcher to be familiar with the controlled vocabulary terms for the particular database, and is therefore more likely to retrieve relevant results when using search terms that are familiar and used on a regular basis. Natural Language Index terms (aka free text terms) use the author’s own words as index terms, rather than assigning pre-determined or pre-existing controlled vocabulary that ultimately may not be as precise as the natural language terms. With regard to Illustrata: Natural Sciences and Illustrata: Technology, the use of searchable Natural Language Index terms is of great benefit because of the inclusion of terms found in the title of the X- and Y-axis, and within the caption via the deep indexing process.

    Q: What are those colored borders around the pinky and thumbnail images?

    A: The colored borders in the result display indicate tables or figures that match the user's search terms. This alerts users to the fact that images are included in the record and allows them to quickly determine the relevance of a result to their search.

    Q: On the results page under the Tables & Figures tab, what is the significance of the additional tables and figures tab breakdown?

    A: Approximately 30% of the objects in the database are tables, while the remaining 70% fall under the broad heading of "figure". The Figure category is divided into Graph, Illustration, Map, Photograph, and Transmission/Emission Image. View the full list of object categories.

    Q: Can we link to the full text or OpenURL from ProQuest Deep Indexing resources?

    A: Yes. The CSA Illumina Administrative Module includes the "Resource Options" tab for subscribers. This tab allows the selection of more than 880 titles for linking. If the library subscribes to other full-text resources, the library may enable those resources in the “Full-Text” tab or the OpenURL options in the Administrative Module to provide linking to the resources. Existing full-text linking and OpenURL settings will apply to ProQuest Deep Indexing resources as it does to all other CSA databases subscribed to on the Illumina platform.

    Q: Are COS: Scholar Universe records linked to ProQuest Deep Indexing resources?

    A: While an author may have a Scholar Profile and records in the ProQuest Deep Indexing resources, there is no icon displayed and no direct correlation between the two databases at this time. Also, ProQuest Deep Indexing resources records are not included in the Scholar Profile Selected Publications list.

    Q: Save/Print/Email options provide links to images from Object records. Are these links persistent?

    A: Yes, using any of the Save/Print/Email options from either the abstract record or the object record itself will record a persistent link for you back to the object itself. Clicking on this link will take you straight to a web page where just the image itself is displayed for you. This link will work for anyone who has authentication rights to gain access to the deep indexed record.

    Please note that it is important you always use the link from the Save/Print/Email option itself. Once you click on this link if you were to then copy and paste the URL on the webpage of the image, this link will contain a session ID (represented with the sessid=xxx) number in the URL – this link is not a persistent one and will not work if you sent this to someone else as the session would be expired, therefore you must always use the URL link in the original exported record.

    Q: Can we use save object image files from Deep Indexed records?

    Through the Save, Print, Email option on the CSA Illumina platform, records for individual objects (or references with multiple objects) can be saved, printed or emailed. Each record then provides a persistent link back to the image in deep indexed record. Due to publisher restrictions, tables and figures may only be available in a thumbnail format.

    Q: Can we use the object image file even if we don't subscribe to the journal?

    A: A: Each object that is Deep Indexed belongs to the original document publisher, and is subject to that publisher's permissions. Publisher details are available in the Publisher field of the Object Record. Additionally, each object is marked with an attribution that includes the publisher name.

    Q: Q: Can we export Tables and Figures to Excel or PowerPoint?

    A. A. Objects can be saved to your PC and imported into production software such as MS PowerPoint, MS Word, or MS Excel as image files. Some production software may also allow you to 'drag and drop' object image files from the Web page to the application.

    **Note: In all cases, the user is bound by the copyright law as it applies to the article from which the object was extracted.

    Q: Do we have to subscribe to the electronic journal to see the full text?

    A: A: Yes, this is generally the case. Through the CSA Illumina Administration Module, each institution subscribing to ProQuest Illustrata has various options when deciding how they will link to full text. If your library has 'turned on' any of these options, you may be able to link out to the full text of the document. We also link to Open Access titles which may not require a subscription to access the full text article.

    Q: Will the object image file be exportable to RefWorks in the future?

    A: Through either using the RefWorks button or the Save, Print, Email option on the CSA Illumina platform, records for individual objects (or references with multiple objects) can be easily exported to RefWorks. Each record provides a persistent link back to the image in the ProQuest file. We have begun exploring the technical issues of exporting the object image file along with the citation, but no release date for that enhancement has been set.