Publication of work in peer-reviewed journals is the critical
final step of the scientific research process. Abstract databases
are the primary means by which readers search the literature,
and the abstract is the author's best opportunity to interest
readers in the full article. However, many abstracts fall short
of the goal by not including enough of the right information or
by including irrelevant information. By understanding the basic
means by which literature databases are constructed and searched,
authors can write abstracts that will increase the number of times
their articles are cited in literature searches. In this Discovery
Guide, an abstracting and indexing editor explores the effectiveness
of two approaches to literature searching by comparing CSA Illumina
and Google Scholar and shares some abstract writing tips to ensure
published articles reach the broadest audience possible.
It is a common experience. A University professor recalls there
was an important paper that was published a few years back that
applies directly to the proposal he is writing. He remembers what
the paper was about but is utterly clueless as to the title, author,
and publication title. It's late and the proposal deadline is
only hours away but this article really must be cited in the literature
survey. What to do? A quick literature search based on what the
professor remembers about the article should do the job. Among
the available options are the tried-and-true abstract databases
that he has been using since undergraduate days or the all new,
highly touted Google Scholar. Does it matter which choice he makes?
Probably the most effective means by which researchers disseminate scientific data and establish their reputations in their fields is through publications. Motivated by a desire to disseminate scientific information, as well as by non-scientific reasons such as promotion and tenure, authors want the widest possible audience to find their papers.
Researchers tend to write articles with a specific journal's
readership in mind. Doing so, however, can lead to abstracts that
are narrower than a work's true scope. For example, an article
where the abstract describes casting of aluminum alloys might
really be about casting them for weight reduction in automobiles,
a topic that will appeal to a wider audience.
By sharing some basic tips about database construction and search
methods, this article shows authors how to maximize an abstract's
effectiveness and expose the article to the widest possible audience,
thereby extending its life-span. A good abstract will facilitate
the literature search process. Let's take a look at how literature
searches are conducted.
SURVEY OR COLLECT?
It would be difficult to find anyone today who pines for the old-fashioned methods of searching the literature. In the old pre-computer, pre-internet days a good literature search involved many hours, and often days, in the library poring over thick volumes of abstract databases and scouring the library shelves to find articles in bound journal volumes. Nowadays an effective literature search can be done in a matter of several hours (or minutes) from a computer anywhere in the world with internet access. In a moment hundreds of papers can be found on any topic, sometimes with full text access. This must be a better system, right? The answer is yes, mostly. More than ever, it is important for those searching and those contributing to the literature to understand how modern search methods can affect their goals.
There are basically two vehicles for searching the literature: abstract databases and web-based search engines. The interfaces for both methods feel quite similar, so a user can easily believe that the results will be comparable; however, significant differences exist between the two methods.
A web-based search is just that, a search of the World Wide
Web based on terms that the user provides. Among the most popular
are Google and Yahoo!. The word "google" has even entered our
culture to such an extent that it has become a verb in the vernacular,
as in "I googled my doctor's name to find out what hospitals she
is affiliated with." Specialized search engines can improve the
quality of the results (hits) by searching a subset of all websites
and through access to normally restricted archives and abstract
databases. Examples include Google Scholar, a search engine for
scholarly literature and Scirus, a science-oriented search engine.
These search engines crawl the sources (websites, journal archives,
abstract databases) and automatically index them. The search engine
approach is to survey information, hence the popular phrase "surfing
An abstract database takes a philosophically different approach. Instead of information being surveyed, it is collected, organized and stored. Abstract databases are built around specific topics by abstracting and indexing (A & I) companies or by professional societies like the American Chemical Society (Chemical Abstracts) and the Institution of Electrical Engineers (Inspec). The leading A & I companies include CSA, Elsevier, and the Emerald Group Publishing Limited. CSA Illumina hosts over 100 databases, both proprietary and nonproprietary, with titles like Library and Information Sciences Abstracts, Engineering Material Abstracts, Criminal Justice Abstracts, Sociological Abstracts, Corrosion Abstracts, and many more.
Abstract database producers are secondary publishers in that they publish information that has been gathered from primary publications like scholarly journals, trade magazines, conference proceedings, etc. The collection of information is intended for digging deep into the literature.
In addition to collecting abstracts, A & I services add value by organizing the abstracts through the application of keyterms and classification codes. The abstract database is like a book; the classifications are like chapter headings and the keyterms are the index at the back of the book. Information can be found either by going directly to the chapter or by using the index.
When doing a literature search, the searcher assumes that the search method will cover the target information, that the information will be timely, and that records will be retrievable. The user benefits from a flexible and intuitive user interface, and from results that are presented in a useful, practical fashion. This article compares CSA Illumina and Google Scholar on each of these points to demonstrate the differences between abstract database searches and web searches.
More than anything, literature searchers rely on the underlying assumption that the information they seek is there to be found. A searcher can very quickly determine the content covered by any of CSA's abstract databases by checking the website (www.csa.com). All of this information is available to anyone; one need not be a subscriber.
As an example, CSA's Technology Research Database (TRD) alone
has over 4,000 sources. Journals are listed online by title and
ISSN along with other important information, such as scope of
coverage, update frequency, number of records in the database,
yearly increase in records, etc. The website also indicates the
extent to which a particular source is covered. There are two
coverage grades for sources in CSA's TRD: core and selective.
Core coverage means that every article of technical merit in the
journal is included in the database; basically, it is cover-to-cover
coverage regardless of topical relevance. Selective, however,
means that only articles relevant to TRD's scope are taken. Often
selective journals will be covered in their entirety, but CSA
Illumina's users know not to expect it.
At CSA (and probably other fee-based A & I services) Acquisitions Specialists aggressively identify and pursue new sources, adding a level of purposefulness which search engines lack. They are alert to new publications and to new opportunities to acquire existing publications. They ensure that source coverage is complete (no missing issues) and that new sources will improve the topical coverage of a database. Search engines are at the mercy of the material to which they have been given access and to what their crawlers can find on the web.
By comparison, Google Scholar (www.scholar.google.com)
is not forthcoming regarding their sources, making it difficult to evaluate the quality of any given search. Here is how Google Scholar describes their coverage:
Google Scholar covers peer-reviewed papers, theses, books, abstracts, and other scholarly literature from all broad areas of research. You'll find works from a wide variety of academic publishers and professional societies, as well as scholarly articles available across the web. Google Scholar may also include multiple versions of an article, possibly preliminary, which you may be able to access.
Google Scholar covers most important scholarly publishers but there are substantive gaps. In particular Elsevier, the largest publisher of scholarly literature, did not give Google Scholar's crawlers access to its resources, not surprising since SCOPUS (Elsevier's abstract database product) is a competitor. Interestingly, Google Scholar has some access to the Elsevier literature through a backdoor; stripped-down CSA abstracts are available through Google Scholar. Google Scholar crawlers were given access to some important databases such as IEEE, ACM, Wiley, Macmillan, and University of Chicago, as well as to archived information at professional societies and government agencies like the American Physical Society, National Institute of Health, and NOAA. However, this source information is not readily available. The Google Scholar approach seems to be "trust us". Is that trust earned?
In an independent study, Péter Jacsó at the University of Hawaii
found significant gaps in Google Scholar's coverage. For instance,
the most recent eight years and archives going back to 1987 of
65 journals put out by the Nature Publishing Group (NPG), which
includes the prestigious journal Nature, were incompletely indexed.
In Jacsó's simple test, the NPG website (www.nature.com)
found 87,000 records for articles published just in Nature. A
Google Scholar query found only 13,700 records for the entire
nature.com website which includes Nature plus 64 other publications.
Similarly, Google Scholar found only about one-fourth of the known
records for Science magazine (sciencemag.org = ~40,000 records,
Google Scholar = ~12,000 records). The pattern continued, with
Google Scholar finding only 268,600 of the 4.1 million records
in Harvard's well-regarded abstract database Astrophysics Data
System [Jacsó, 2005]. This discrepancy between
the number of records found by Google Scholar and the number of
records known to exist is disturbing. A searcher does not know
if information is nonexistent or just not found.
In comparative study of the web vs. fee-based databases as sources of scientific information, Doldi and Bratengeyer at Danube University in Austria found that results from search engines tend to favor USA websites, perhaps because of the difficulty of writing indexing algorithms in multiple languages. They also found that fee-based databases cover sources from a wider geographic range. [Doldi and Bratengeyer, 2005]. At CSA the overwhelming majority of records are in English either because that is the source language, or because foreign language sources often include an English title and abstract. In the absence of English, the abstracting editor will translate the title and write a brief abstract. In some cases the abstract and title will be out-sourced for translation. The goal is to give the user at least a basic idea of what the article is about and to provide enough information to make the article findable. The assumption is that users themselves will have the ability to read or thoroughly translate the article. Google Scholar offers an option to search exclusively in English, Chinese, or Portuguese which excludes articles in many significant languages. Looking again at the CSA TRD database, non-English source languages include Chinese, Japanese, German, French, Romanian, Spanish, Italian, Portuguese, Polish, Russian, and Danish.
These days, coverage of print sources alone is not enough because websites include significant information. How do CSA Illumina and Google Scholar compare in their coverage of websites? CSA covers websites through its Web Resources Related to Technology Database, which runs in conjunction with searches in the other databases. All websites have been reviewed by an editor. Each URL is checked monthly and dead-links are removed from the database on a continuing basis. The database is updated daily and currently has over 300,000 records. Websites can be viewed by clicking on a tab that appears with the search results.
The value of human judgment is demonstrated by a very low dead-link rate of less than 1%. Google Scholar's coverage statement above does not quite say whether or not websites are covered; presumably they are.
In addition to comprehensive content, searchers expect databases to be updated frequently with records of recently published articles. CSA Illumina abstract databases are updated monthly with fully edited records. Details are provided online for each database regarding number of records per update, number of records in the database, oldest record, and how recent the latest 50% of records are. For METADEX (the metals database) there are over 1,425,000 records. Approximately 45,000 records are added per year in monthly increments. Coverage is from about 1966 to the present with the oldest record having a 1939 publication date. About 50% of the records have publication dates from 1986 or later. This depth of detail gives CSA Illumina its credibility as a high-quality abstracting and indexing provider.
CSA knows how important it is to be as current as possible and has an innovative supplemental database called Recent References that is updated weekly. This database contains unedited records, i.e., records that have not been proofread, edited for clarity, and indexed. However, the title, journal title, authors, and abstract are available for searching. It is a terrific tool since it allows access to the newest literature before being assigned to a permanent database.
How about Google Scholar? It has not revealed how often it intends to update. The Jacsó study reports that between November 2004 and January 2005 there was no evidence of an update. As mentioned, Google Scholar has access to some well-regarded sources but there is no information forthcoming with regard to how often they are accessed. CSA provides monthly updates of skeleton records to Google Scholar so at least that much is known with confidence. The best option is to narrow the search by publication date, but there is no way to know if a particular record is absent because the journal is not covered at all or if Google Scholar just hasn't gotten around to it.
RETRIEVABILITY OF RECORDS
The ultimate goal of any search is to retrieve useful records as quickly as possible. Retrievability is a function of the search terms and the record's searchable fields.
Most fields in an abstract database record are searchable, the most noticeable ones being abstract, journal title, article title, author names, author affiliations, keyterms, classifications, and cited references. A word or set of words are designated as search criteria and the database returns results which have found the search terms in any of the fields. The most important fields are the title, abstract and keyterms. There are some fields which are useful for finding a specific article, for example, the journal ISSN or the DOI (Data Object Identifier). When a CSA record is viewed, the search terms found (in this case: aluminum, composite, automo*, welding) are highlighted with boldface and italics as shown. More important, the value of the keyterms is demonstrated; the terms themselves suggest possibilities for new searches. By clicking on them and choosing a search criterion (the Boolean AND, OR) better searches can be devised on the spot.
The abstract editor's primary task is to make the record as retrievable as
possible, which means placing it in as many databases as appropriate
and making the record as searchable as possible by editing the
abstract, assigning keyterms, and assigning classification codes.
Keyterms are assigned from a controlled vocabulary, which ensures consistency within the databases, including historical data. Searches can be done using any combination of words, but controlled vocabulary terms from the thesaurus will generate more targeted results.
There are approximately 41,500 technology terms alone, so abstract
editors have plenty of flexibility. On the other hand, few abstract
editors will remember them all, so machine-aided indexing (a computer
program that compares words in the record to the controlled vocabulary)
is used to suggest terms. The abstract editor eliminates redundant
and irrelevant terms and adds necessary terms. Efforts are made
to assign keyterms that are non-redundant with other parts of
the record, especially the title and abstract. Machine-aided indexing
is invaluable to the abstract editor because it increases the
chances of all concepts being caught and suggests terms that might
otherwise be overlooked. The controlled vocabulary and machine-aided
indexing algorithms are regularly updated as the need for new
keyterms arises. A good keyterm list will have some general terms
and some specific ones; machine-aided indexing helps the editor
think more creatively and thoroughly about this list. The limitation
of machine-aided indexing is that only the "raw" record is searched
so important information that comes later in the article is not
As helpful as machine-aided indexing is, the abstract editor
has the final responsibility for the keyterms. However, the article
must in some way suggest the keyterms; the editor cannot make
assumptions about the author's intent! The most common places
in an article where the abstract editor looks for information
are the abstract, first paragraph of the introduction, last paragraph
of the introduction, first paragraph of experiment description,
conclusions, and figures and captions. Google Scholar uses computer
algorithms, or crawlers, to index the first 100-120 K-bytes that
it finds, but it is not unusual for a scholarly article to be
nearly 1M-byte. Good abstracting and indexing editors know that
critical information is often found later in the article.
If the abstract editors know to look for information in other
parts of the article, why are the abstract and title so important
for the purpose of indexing? Not all abstracts come in journal
form; a significant portion (about 30% for TRD, more for other
databases) of all records come to the abstract editors in electronic
form and the original journals are never seen. In those cases,
all the abstract editor has to work with are the abstract, article
title, and journal title. CSA hires only college graduates (many
with advanced degrees) to work on databases in their respective
fields. A history major, for instance, would never work on the
Entomology Abstracts database. This specialization means that
CSA abstract editors are skilled at "reading between the lines"
to look for clues to support keyterms for concepts that are not
explicitly mentioned, something even the very best computer indexing
algorithms cannot do.
Google Scholar's search results are not always sensible and consistent. In June 2005 Jacsó (http://www.gale.com) ran three simple searches using the terms protein, proteins, and "protein or proteins" and found that more records were generated for single terms than for the search on both terms. Also it was found that narrowing the publication from 1970-2005 to 1972-2005 date range increased the number of hits. To test whether or not such anomalies still exist, the searches were run again in February 2006 by De Guire with results summarized in Table 1.
- Comparison of Google Scholar and CSA searches
|protein or proteins
It is obvious that the Google Scholar results do not make sense.
How can it be that there are fewer records for "protein or proteins"
than for either individually? How is it that more records are
found in the smaller time frame? The numbers for CSA Illumina
are consistent with common sense. Also telling is the incredible
discrepancy in the number of records found by CSA Illumina vs.
Google Scholar. When Jacsó ran this search in 2005 there were
1,080,000 hits for protein, many more than this search found.
Where have they gone? It appears that many publications are lost
on Google Scholar. The rounding of Google Scholar results to the
nearest hundred also seems peculiar.
What happens if a very narrow search is performed? For this article De Guire searched the terms (aluminum and composite and automo* and welding) to find articles about welding of aluminum-based composite materials for automotive applications. The publication range was limited to 2001 and later.
CSA Illumina found 35 results; Google Scholar found 17. The average publication date of the Google Scholar records is 2003; the average of the first 17 CSA Illumina records is 2004. How many results in common did they find? None. This is worrisome since CSA shares records with Google Scholar on a monthly basis, and lots of them. CSA's METADEX alone (the database containing these 35 records) contributes about 3500 records per month. Keyterms are not included with the CSA-provided records but there should have been some overlap since the titles and abstracts are fully searchable.
How closely did the records match the purpose of the search? The first two Google Scholar hits (shown above) do not appear to have anything to do with the topic and they are pretty stale - 3 and 4 years old respectively. However, clicking on the first result shows that the title was truncated, the full one being more promising: "Study of diffusion welding SiCp/ZL 101 with Cu interlayer." The searcher has to go to some extra effort to determine that this first record is, in fact, useful. The second result is about magnesium and, obviously, not on the topic. Seven of the last nine Google Scholar results are on unrelated topics: automotive refinishing, hot isostatic pressing, corrosion of magnesium alloys, adhesive technology, green industrial engineering, transistor design, and microchip fabrication. Of the 35 CSA Illumina records, 25 were directly on topic; 8 were on related topics like welding of aluminum, heat transfer during welding, etc. One record was on magnesium alloy matrix composites as a substitute for aluminum matrix composites in automobiles and 2 records were unrelated to the topic (adhesive bonding for aircraft, welding of polymers and wood). In all, about 10% of the CSA Illumina records were not useful compared to about 40% of the Google Scholar records.
These examples (protein search and aluminum search) demonstrate the effectiveness of the different search methods. By surveying records and websites, Google Scholar skims the surface of available information and returns incomplete, inconsistent results. As a collector of information, CSA Illumina allows searchers to dig deep for target information.
Anyone who has prepared a meal knows that certain tools are preferred over others. What difference does it make which spoon is used to stir the pot? A favorite spoon will be just the right size, shape, and heft. The cook interacts well with it and it works well with the food. The cooking experience is satisfying because the tool used was effective and pleasant to use. The cook and tool worked well together.
Similarly, the way in which a user interacts with a literature search tool can make a difference in how satisfying and effective the search is. There are two complementary points of interaction: the search screen and the results screen.
Setting up a good search is the key to getting the desired results. The CSA Illumina Advanced Search screen offers multiple pathways to target articles. Search terms can be linked with Boolean operators and searches can be restricted to specified fields.
The above search is looking for articles on aluminum base or magnesium base composite materials that were not written by De Guire. It is limited to the Technology Research Database which is actually a set of 25 databases. It can be saved for running again later, run in combination with another search, or revised for better precision. The search screen might look complicated at first glance, but this only reflects its goal of telling CSA Illumina where to dig deep in the collection of records for the target information.
In contrast to the CSA Illumina Advanced Search, the Google Scholar Advanced Scholar Search is quite a bit simpler because its highly general results don't require precision; its purpose is to tell the search engine which articles to grab when it surveys the web.
As shown above, only a few parameters can be set. As many search terms as desired are entered in fields with Boolean functions but they cannot be narrowed beyond the article or the title. It is not possible to search on fields like ISSN or DOI, which are useful for finding specific articles or a set of articles in a publication (e.g., a special issue dedicated to one topic). Because the search screen is so simple, it is useful for searchers who are not sure what they are looking for and for beginners. The search can be narrowed by choosing up to 7 very broad subject areas (Chemistry and Materials Science; Engineering, Computer Science and Mathematics, etc.). Searches cannot be saved or combined so it's back to the beginning every time.
The purpose of a literature search is to get useful results, and the way in which results are presented and can be managed after they are found is important. Usually, search results are presented either by relevance or by date. Ranking by relevance means that each record has been evaluated and is listed from most relevant to least relevant. Results presented by date show the most recent record by publication date first.
Google Scholar only presents results by relevance ranking, which is done by
extracting metadata, citation frequency, and other information.
The ranking can be manipulated by authors who make available multiple
versions of the same material: web site, preprint, conference
presentation, and published article. Google Scholar's website
gives tips on how to improve an article's relevance ranking, perhaps
of more interest to the author than to the searcher. Providing
tips on improving relevance rankings of scholarly articles seems
ironic given the assumption of objectivity underlying scientific
CSA Illumina's default is to present results by date but results can also
be ranked by relevance. Relevance ranking is determined by comparing
search terms with the first eight descriptors (keyterms) provided
by the abstract editor. More matches means higher relevance ranking.
The downside is that abstract editors might put the most important
descriptors later than eighth in a list. Also, effort is made
to assign keyterms that are not redundant with other parts of
the record so a word or phrase that is prominent in the title
or abstract might not be included in the keyterm list. However,
a well-designed search will generate only relevant results, and
with the ability to save searches, the most recent results are
likely to be relevant.
The CSA Illumina results screen allows the searcher to see how many results were found for several types of source material. If the searcher is only interested in a certain type of source, such as peer-reviewed journals, a simple click is all that is needed.
This snapshot shows CSA Illumina's short format but records can also be presented
in full format (full abstract, bibliographic, and classification
information) with or without references, or in a custom format
determined by the user. Even in short format, the record includes
full title, authors, bibliographic information, and three lines
of abstract. Combined with the descriptors (right side of the
record), the user can very quickly determine whether or not this
record is useful. Records can be selected from the list by clicking
on the box next to the record and then saved, printed, or emailed.
Google Scholar results generally offer only two lines of fragmented abstract, so it is necessary to click on each result individually to see what it is. There are no mechanisms to save the records other than cutting and pasting them into another file for saving.
Google Scholar has received plenty of attention since its launch in November 2004, mostly on the strength of the Google brand name. A Google search is a popular, quick way to get basic information and Google Scholar can do the same, but the user should be aware that this just does not measure up to CSA (or any other A & I database) in terms of coverage, currentness, quality, and ease of use. Probably the best use for Google Scholar is to get some quick, basic information which can be used to design an effective search in an abstract database.
Authors have no control over the means by which their articles are found, but they can exert some control over the process by writing abstracts that are effective. A well-written abstract will contain vocabulary that is likely to match search terms. Also, an understanding of how abstract databases are put together will improve an author's ability to write an abstract that attracts readers.
ABSTRACT DATABASE CONSTRUCTION
Database construction is a volume operation; more records are
better because the goal of abstract databases is to comprehensively
cover the literature in a given subject area. At CSA abstract
editors work on several databases simultaneously and can assign
an article to as many databases as are appropriate. For example,
CSA's Technology Research editors in Ohio produce 25 database
files; 16 are constructed directly by the editors and 9 are derivative,
i.e. specialized subsets of the other 16. About 3,300 serials
and hundreds of non-serial publications, such as conferences,
books, monographs, standards, etc., are monitored so that well
over 4,000 sources are covered. A full-time Technology Research
editor edits about 80 abstracts per day and, on average, will
assign a record to 2.1 databases. The abstract editor needs to
Abstract editors are responsible for proofreading records. Strange, even humorous, things can happen during record production since the process is automated with scanners and optical character recognition. CSA's Technology Research editors know to change "suicides" to "silicides" and "theological properties" to "rheological properties." Every CSA TRD record is looked at individually to correct spelling and to verify the accuracy of all bibliographic information, ensuring that the records are of the highest quality. As a surveyor of information, Google Scholar has no ability to edit any part of a record.
While some customers will purchase only one database, purchase
of database sets is more common. The more databases that an article
legitimately can be in, the more likely it is that the article
will be found.
KEYTERMS AND CLASSIFICATIONS
In the pre-computer days of literature searching, keyterms and classifications were the primary way to find an article. Now that computers search all fields in the records, the function of keyterms and classifications has changed. Classifications are still similar to chapter headings but not many searches are restricted to only one classification/chapter heading. Rather, classifications guide the editor by defining topical coverage (what articles to include in the database) and directing the emphasis of the keyterms.
Beyond improving the searchability and retrievability of a record, keyterms serve other useful purposes. They help "tell the story," i.e., by skimming the keyterms it should be possible to determine an article's main points. In this example, the reader knows just from the keyterms whether or not the article is of interest:
base alloys; Steels; Automotive engines; Decoration;
Corrosion prevention; Piston rings; Corrosion resistance;
Case hardenability; Chemical vapor deposition; Thermal spraying
This example demonstrates another function of the keyterms, which is to build derivative databases such as CSA's Corrosion Abstracts. Records are not assigned directly to Corrosion Abstracts; instead they are culled during the production process to meet the database's topical coverage. The corrosion-related keyterms here ensure that this record will be included. Finally, keyterms are useful in helping to develop better search parameters by suggesting additional search terms.
The abstract is an author's best opportunity to interest the reader, making it one of the most important parts of an article. Abstracting and indexing editors, as well as computer indexers (like Google Scholar), rely heavily on the abstract for information. It needs to be large enough to cover all the important information, but small enough to be interesting to the reader (the so-called bikini principle). The target length of an abstract for the Technology Research Database is 300 words or less and longer abstracts will be shortened. The author maintains maximum control over the abstract's content by restraining the length.
The abstract should answer the reader's question: "Will this article tell
me something new that I need to know?" It should tell the reader
what was done, how it was done, and why it was done (the reason
the reader should care). A materials science article, for example,
would tell the reader the material, the form of the material (film,
bulk, powder, etc.), the experiment, and the application. Here
is an example of an excellent opening sentence for an abstract:
"Effects of aging treatment on high temperature strength of Nb added ferritic stainless steels for automotive parts were investigated." [Ahn, Sim, Lee, 2005]
Right away, the reader knows the material, the experiment, and the application.
Readers who are interested in any of these aspects will continue
to the rest of the abstract and the article. The abstract editor
knows immediately that this article will be placed in two abstract
databases: METADEX and Mechanical and Transportation Engineering
Abstracts. While this is primarily a metallurgy article, the author's
mention of the automotive application guarantees that the record
will be placed in the database that automotive engineers use most,
Mechanical and Transportation Engineering Abstracts. The article
can be found by all interested readers.
COMMON ABSTRACTING MISTAKES
A few examples of abstracts recently added to CSA's Technology Research Database illustrate the most common abstracting shortcomings. By far, the most frequent mistake is too little information.
Example #1 - incomplete description
In this example [Kharkovsky, Hepburn, Walker, Zough, 2005] the
material is never specifically mentioned, which narrows the possible
audience for the article. The abstract editor can guess, but cannot
assume, that the foam is a polymer, so the keyterms make no mention
of polymers. No search in which 'polymer' was a required term
would find this article. Also, it is entirely possible that the
technique described in this abstract could be useful for other
applications, so mention of the material would have expanded the
audience to include researchers interested in nondestructive testing
methods for the particular material.
shuttle Columbia's catastrophic failure has been attributed to
a piece of external tank spray on foam insulation striking the
left wing of the orbiter, causing significant damage to some of
the reinforced carbon/carbon leading edge wing panels. Subsequently,
several nondestructive testing (NDT) techniques have been considered
for testing the external tank. One such technique involves using
millimeter waves, which have been shown to easily penetrate the
foam and provide high resolution images of its interior structures.
This paper presents the results of testing three different spray
on foam insulation covered panels by reflectometers at millimeter
wave frequencies, specifically at 100 GHz. Each panel was fitted
with various embedded discontinuities/inserts representing voids
and unbonds of different shapes, sizes and locations within each
panel. In conjunction with these reflectometers, radiators, including
a focused lens antenna and a small horn antenna, were used. The
focused lens antenna provided for a footprint diameter of approximately
12.5 mm (0.5 in.) at 254 mm (10 in.) away from the lens surface.
The horn antenna was primarily operated in its near field for
obtaining relatively high resolution images. These images were
produced using two dimensional scanning mechanisms. Discussion
of the difference between the capabilities of these two types
of antennas (radiators) for the purpose of testing the spray on
foam insulation as it relates to the produced images is also presented.
Panels; Discontinuity; Nondestructive testing; Microwaves;
Space shuttles; Millimeter waves; Fuel tanks; Tiles; Insulation;
Life cycle engineering
Example #2 - lack of relevance
In this case the application for the work, i.e. the "why should we care" aspect, was not included. The abstract describes everything well except for the application. That comes in the first sentence of the introduction and was added by the abstract editor (see italics). As written, the article would be included in two databases: METADEX and Aluminum Industry Abstracts. With the editorial change the article will also be included in Mechanical and Transportation Engineering Abstracts and Aerospace Abstracts - twice the coverage. This article came to the abstract editor as a journal and the important sentence was found. [Gao, Starink, Davin, Cerezo, Wang, Gregson, 2005]
Al-6Li-1Cu-1Mg-0.2Mn (at.-%) (Al-1.6Li-2.2Cu-0.9Mg-0.4Mn, wt-%)
and Al-6Li-1 Cu-1 Mg-0.03Zr (at.-%) (Al-1.6Li-2.3Cu-1 Mg-0.1Zr,
wt-%) alloys developed for age forming were studied by tensile
testing, electron backscatter diffraction (EBSD), three-dimensional
atom probe (3DAP), transmission electron microscopy (TEM) and
differential scanning calorimetry (DSC). For both alloys, DSC
analysis shows that ageing at 150DGC leads initially to formation
of zones/clusters, which are later gradually replaced by S phase.
On ageing at 190DGC, S phase formation is completed within 12
h. The precipitates identified by 3DAP and TEM can be classified
into (a) Li rich clusters containing Cu and Mg, (b) a plate shaped
metastable precipitate (similar to GPB2 zones/S"), (c) S phase
and (d) delta spherical particles rich in Li. The Zr containing
alloy also contains beta' (Al3Zr) precipitates and composite beta'/delta'
particles. The beta' precipitates reduce recrystallisation and
grain growth leading to fine grains and subgrains. Age forming
is a key innovation in the fabrication of curved structural components
for aerospace applications, e.g. wing skin.
Example #3 - distracting information
Abstracts should not contain information that does not serve the primary purpose of interesting readers in the article. The most common extraneous information includes references, experimental details such as suppliers, or model numbers (unless the article is about the latest model of an instrument or piece of equipment). This kind of information is important to the article but should be placed in the paper's references and experimental procedure sections. In the abstract, it is distracting and the reader does not learn anything new. Abstract editors will remove this information to improve readability.
Here is an example of distracting referencing. The first four lines and the last two lines of the abstract are consumed with references. The reader has to read 5 lines before finding out what the article is about. In this case the abstract editor removed the references (shown in italics) before assigning the article to a database. If it is necessary to refer to specific work then just mention the names, as in "Numerical simulation was done to validate the model of Jones and West in the higher temperature range." The reader will correctly assume that Jones and West are in the references. [Zhang, He, Du, 2005]
situ SEM observations (Zhang JZ. A shear band decohesion model
for small fatigue crack growth in an ultra-fine grain aluminum
alloy. Eng Fract Mech 2000;65:665-81; Zhang JZ, Meng ZX. Direct
high resolution in-site SEM observations fo very small fatigue
crack growth in the ultra fine grain aluminum alloy IN 9052. Script
Mater 2004;50:825-28; Halliday MD, Poole P, Bowen P. New perspective
on slip band decohesion as unifying fracture event during fatigue
crack growth in both small and long cracks. Mater Sci Technol
1999; 15:382-90) have revealed that fatigue crack propagation
in aluminium alloys is caused by the shear band decohesion around
the crack tip. The formation and cracking of the shear band is
mainly caused by the plasticity generated in the loading part
of a load cycle. This shear band decohesion process has been observed
to occur in a continuous way over the time period during the loading
part of a cycle. Based on this observation, in this study, a new
parameter has been introduced to describe fatigue crack propagation
rate. This new parameter, daldS, defines the fatigue crack propagation
rate with the change of the applied stress at any moment of a
stress cycle. The relationship between this new parameter and
the conventional daldN parameter which describes fatigue crack
propagation rate per stress cycle is given. Using this new parameter,
it is proven that two loading parameters are necessary in order
to accurately describe fatigue crack propagation rate per stress
cycle, da/dN. An analysis is performed and a general fatigue crack
propagation model is developed. This model has the ability to
describe the four general type of fatigue crack propagation behaviours
summarised by Vasudevan and Sadananda (Vasudevan AK, Sadananda
K. Fatigue crack growth in advanced materials. In: Fatigue 96,
Proceedings of the sixth international conferene on fatigue and
fatigue threshold, vol. 1, Oxford: Pergamon Press; 1996. p. 473-8).
Example #4 - too little information
Sometimes the abstract editor suspects that the article has broader interest than the abstract indicates but the article never quite gives enough information. In this example, the abstract editor might suspect that the article should be included in Civil Engineering Abstracts and Earthquake Engineering Abstracts; however, neither the abstract nor the article provides enough information for doing so. It is possible, even likely, that this article is not in as many databases as it should be. [Wakatsuki, Watanabe, Okada, 2005]
studies, it has been found that the shape memory effect of the
embedded straight and wavy shape memory alloy (SMA) fibers enhance
the strength and energy absorption prior to fracture of the composite,
where the embedded SMA fibers shrink due to their shape memory
effect. In the case of wavy fiber reinforced composites, the SMA
fibers were subjected to pre-tensile strain using fiber holder
with rotatable rollers to maintain the constant periodicity and
amplitude of wavy fibers. In this study, on the other hand, the
wavy SMA fibers were subjected to pre-tensile strain without using
fiber holder, and therefore, periodicity and amplitude of wavy
fibers were varied during the deformation. Then the wavy SMA fiber
reinforced smart composite is fabricated. For the mechanical property
characterization, three-point bending test is performed for the
Here are a few other practices that are not abstracting mistakes but can decrease the effectiveness of abstracts:
This tends to decrease the effectiveness of machine-aided indexing since the controlled vocabulary thesaurus minimizes use of hyphens. Use the online thesaurus to verify hyphenation.
These do not compress well into easy-to-read, paragraph-style
copy. The bullet points are still completely searchable and
the article will be found but the abstract will be clumsy to
- Non-standard nomenclature
Strong efforts are made by abstracting and indexing providers to limit the controlled vocabulary terms to standard nomenclature. For example "chemical vapor evaporation" is better described as "chemical vapor deposition." CSA Illumina has online thesauri, which can be helpful to authors as they write.
- Vague or unqualified terms
This is most common when describing properties such as strength or conductivity. The abstract editor would much prefer to apply a specific keyterm whenever possible, like tensile strength, bend strength, ionic conductivity, electrical conductivity, superconductivity, etc.
- Undefined mathematical symbols
Mathematical symbols should be defined as they are used. An abstract editor might be unable to assign critical keyterms through lack of knowledge. It is also possible that the same symbol has a completely different meaning in other disciplines.
- Mathematical and chemical equations
Equations are the essential language of mathematicians, physicists, and chemists but, generally, equations do not scan well and are often removed by the abstract editor. Describe what the equation says to guard against information being lost if the equation is edited out, for example: "Differential geometry was used to model heat transfer in piston rings."
- Figure and table references
The abstract should be independent from other parts of the paper. Figures and tables should not be referred to in the abstract, only in the article itself.
Publish or perish - the only thing worse is to publish and perish! This article has shown that the quality of information retrieved in a literature search is dependent on the tool that was used to seek it. Google Scholar searches are useful for getting basic information and then building a more robust search in an abstract database like those on CSA Illumina. However, Google Scholar will never be able to replace abstract databases because of the fundamental differences between the two approaches to gathering information. Google Scholar surveys the available information, which leaves it susceptible to inconsistency. Abstract databases collect information so that current and historical articles are always available. In either case, authors need not be at the mercy of the whims of search engines. This article has also shown how abstracts can be written to maximize searchability and prevent articles from perishing after publishing.
- Jacsó, Péter. Google
Scholar: the pros and the cons. Online Information Review,
Vol. 29, No. 2, 2005, pp. 208-214.
- Doldi, Luisa M. and Erwin Bratengeyer. The
web as a free source for scientific information: a comparison
with fee-based databases. Online Information Review,
Vol. 29, No. 4, 2005, pp. 400-411.