ProQuest www.csa.com
 
 
RefWorks
  
Discovery Guides Areas
>
>
>
>
>
 
  
e-Journal

 

Publish or Perish: Afterlife of a Published Article
(Released April 2006)

 
  by Eileen J. De Guire  

Review

Key Citations

Web Sites

Conferences

Editor
 
Review Article

ABSTRACT

Publication of work in peer-reviewed journals is the critical final step of the scientific research process. Abstract databases are the primary means by which readers search the literature, and the abstract is the author's best opportunity to interest readers in the full article. However, many abstracts fall short of the goal by not including enough of the right information or by including irrelevant information. By understanding the basic means by which literature databases are constructed and searched, authors can write abstracts that will increase the number of times their articles are cited in literature searches. In this Discovery Guide, an abstracting and indexing editor explores the effectiveness of two approaches to literature searching by comparing CSA Illumina and Google Scholar and shares some abstract writing tips to ensure published articles reach the broadest audience possible.

Review Article

It is a common experience. A University professor recalls there was an important paper that was published a few years back that applies directly to the proposal he is writing. He remembers what the paper was about but is utterly clueless as to the title, author, and publication title. It's late and the proposal deadline is only hours away but this article really must be cited in the literature survey. What to do? A quick literature search based on what the professor remembers about the article should do the job. Among the available options are the tried-and-true abstract databases that he has been using since undergraduate days or the all new, highly touted Google Scholar. Does it matter which choice he makes?

INTRODUCTION

Probably the most effective means by which researchers disseminate scientific data and establish their reputations in their fields is through publications. Motivated by a desire to disseminate scientific information, as well as by non-scientific reasons such as promotion and tenure, authors want the widest possible audience to find their papers.

Researchers tend to write articles with a specific journal's readership in mind. Doing so, however, can lead to abstracts that are narrower than a work's true scope. For example, an article where the abstract describes casting of aluminum alloys might really be about casting them for weight reduction in automobiles, a topic that will appeal to a wider audience.

By sharing some basic tips about database construction and search methods, this article shows authors how to maximize an abstract's effectiveness and expose the article to the widest possible audience, thereby extending its life-span. A good abstract will facilitate the literature search process. Let's take a look at how literature searches are conducted.

SURVEY OR COLLECT?

It would be difficult to find anyone today who pines for the old-fashioned methods of searching the literature. In the old pre-computer, pre-internet days a good literature search involved many hours, and often days, in the library poring over thick volumes of abstract databases and scouring the library shelves to find articles in bound journal volumes. Nowadays an effective literature search can be done in a matter of several hours (or minutes) from a computer anywhere in the world with internet access. In a moment hundreds of papers can be found on any topic, sometimes with full text access. This must be a better system, right? The answer is yes, mostly. More than ever, it is important for those searching and those contributing to the literature to understand how modern search methods can affect their goals.

There are basically two vehicles for searching the literature: abstract databases and web-based search engines. The interfaces for both methods feel quite similar, so a user can easily believe that the results will be comparable; however, significant differences exist between the two methods.

A web-based search is just that, a search of the World Wide Web based on terms that the user provides. Among the most popular are Google and Yahoo!. The word "google" has even entered our culture to such an extent that it has become a verb in the vernacular, as in "I googled my doctor's name to find out what hospitals she is affiliated with." Specialized search engines can improve the quality of the results (hits) by searching a subset of all websites and through access to normally restricted archives and abstract databases. Examples include Google Scholar, a search engine for scholarly literature and Scirus, a science-oriented search engine. These search engines crawl the sources (websites, journal archives, abstract databases) and automatically index them. The search engine approach is to survey information, hence the popular phrase "surfing the web."

An abstract database takes a philosophically different approach. Instead of information being surveyed, it is collected, organized and stored. Abstract databases are built around specific topics by abstracting and indexing (A & I) companies or by professional societies like the American Chemical Society (Chemical Abstracts) and the Institution of Electrical Engineers (Inspec). The leading A & I companies include CSA, Elsevier, and the Emerald Group Publishing Limited. CSA Illumina hosts over 100 databases, both proprietary and nonproprietary, with titles like Library and Information Sciences Abstracts, Engineering Material Abstracts, Criminal Justice Abstracts, Sociological Abstracts, Corrosion Abstracts, and many more.

Abstract database producers are secondary publishers in that they publish information that has been gathered from primary publications like scholarly journals, trade magazines, conference proceedings, etc. The collection of information is intended for digging deep into the literature.

In addition to collecting abstracts, A & I services add value by organizing the abstracts through the application of keyterms and classification codes. The abstract database is like a book; the classifications are like chapter headings and the keyterms are the index at the back of the book. Information can be found either by going directly to the chapter or by using the index.

When doing a literature search, the searcher assumes that the search method will cover the target information, that the information will be timely, and that records will be retrievable. The user benefits from a flexible and intuitive user interface, and from results that are presented in a useful, practical fashion. This article compares CSA Illumina and Google Scholar on each of these points to demonstrate the differences between abstract database searches and web searches.

CONTENT

More than anything, literature searchers rely on the underlying assumption that the information they seek is there to be found. A searcher can very quickly determine the content covered by any of CSA's abstract databases by checking the website (www.csa.com). All of this information is available to anyone; one need not be a subscriber.

As an example, CSA's Technology Research Database (TRD) alone has over 4,000 sources. Journals are listed online by title and ISSN along with other important information, such as scope of coverage, update frequency, number of records in the database, yearly increase in records, etc. The website also indicates the extent to which a particular source is covered. There are two coverage grades for sources in CSA's TRD: core and selective. Core coverage means that every article of technical merit in the journal is included in the database; basically, it is cover-to-cover coverage regardless of topical relevance. Selective, however, means that only articles relevant to TRD's scope are taken. Often selective journals will be covered in their entirety, but CSA Illumina's users know not to expect it.

At CSA (and probably other fee-based A & I services) Acquisitions Specialists aggressively identify and pursue new sources, adding a level of purposefulness which search engines lack. They are alert to new publications and to new opportunities to acquire existing publications. They ensure that source coverage is complete (no missing issues) and that new sources will improve the topical coverage of a database. Search engines are at the mercy of the material to which they have been given access and to what their crawlers can find on the web.

By comparison, Google Scholar (www.scholar.google.com) is not forthcoming regarding their sources, making it difficult to evaluate the quality of any given search. Here is how Google Scholar describes their coverage:

Google Scholar covers peer-reviewed papers, theses, books, abstracts, and other scholarly literature from all broad areas of research. You'll find works from a wide variety of academic publishers and professional societies, as well as scholarly articles available across the web. Google Scholar may also include multiple versions of an article, possibly preliminary, which you may be able to access.

Google Scholar covers most important scholarly publishers but there are substantive gaps. In particular Elsevier, the largest publisher of scholarly literature, did not give Google Scholar's crawlers access to its resources, not surprising since SCOPUS (Elsevier's abstract database product) is a competitor. Interestingly, Google Scholar has some access to the Elsevier literature through a backdoor; stripped-down CSA abstracts are available through Google Scholar. Google Scholar crawlers were given access to some important databases such as IEEE, ACM, Wiley, Macmillan, and University of Chicago, as well as to archived information at professional societies and government agencies like the American Physical Society, National Institute of Health, and NOAA. However, this source information is not readily available. The Google Scholar approach seems to be "trust us". Is that trust earned?

In an independent study, Péter Jacsó at the University of Hawaii found significant gaps in Google Scholar's coverage. For instance, the most recent eight years and archives going back to 1987 of 65 journals put out by the Nature Publishing Group (NPG), which includes the prestigious journal Nature, were incompletely indexed. In Jacsó's simple test, the NPG website (www.nature.com) found 87,000 records for articles published just in Nature. A Google Scholar query found only 13,700 records for the entire nature.com website which includes Nature plus 64 other publications. Similarly, Google Scholar found only about one-fourth of the known records for Science magazine (sciencemag.org = ~40,000 records, Google Scholar = ~12,000 records). The pattern continued, with Google Scholar finding only 268,600 of the 4.1 million records in Harvard's well-regarded abstract database Astrophysics Data System [Jacsó, 2005]. This discrepancy between the number of records found by Google Scholar and the number of records known to exist is disturbing. A searcher does not know if information is nonexistent or just not found.

In comparative study of the web vs. fee-based databases as sources of scientific information, Doldi and Bratengeyer at Danube University in Austria found that results from search engines tend to favor USA websites, perhaps because of the difficulty of writing indexing algorithms in multiple languages. They also found that fee-based databases cover sources from a wider geographic range. [Doldi and Bratengeyer, 2005]. At CSA the overwhelming majority of records are in English either because that is the source language, or because foreign language sources often include an English title and abstract. In the absence of English, the abstracting editor will translate the title and write a brief abstract. In some cases the abstract and title will be out-sourced for translation. The goal is to give the user at least a basic idea of what the article is about and to provide enough information to make the article findable. The assumption is that users themselves will have the ability to read or thoroughly translate the article. Google Scholar offers an option to search exclusively in English, Chinese, or Portuguese which excludes articles in many significant languages. Looking again at the CSA TRD database, non-English source languages include Chinese, Japanese, German, French, Romanian, Spanish, Italian, Portuguese, Polish, Russian, and Danish.

These days, coverage of print sources alone is not enough because websites include significant information. How do CSA Illumina and Google Scholar compare in their coverage of websites? CSA covers websites through its Web Resources Related to Technology Database, which runs in conjunction with searches in the other databases. All websites have been reviewed by an editor. Each URL is checked monthly and dead-links are removed from the database on a continuing basis. The database is updated daily and currently has over 300,000 records. Websites can be viewed by clicking on a tab that appears with the search results.

results page including websites

The value of human judgment is demonstrated by a very low dead-link rate of less than 1%. Google Scholar's coverage statement above does not quite say whether or not websites are covered; presumably they are.

CURRENTNESS

In addition to comprehensive content, searchers expect databases to be updated frequently with records of recently published articles. CSA Illumina abstract databases are updated monthly with fully edited records. Details are provided online for each database regarding number of records per update, number of records in the database, oldest record, and how recent the latest 50% of records are. For METADEX (the metals database) there are over 1,425,000 records. Approximately 45,000 records are added per year in monthly increments. Coverage is from about 1966 to the present with the oldest record having a 1939 publication date. About 50% of the records have publication dates from 1986 or later. This depth of detail gives CSA Illumina its credibility as a high-quality abstracting and indexing provider.

CSA knows how important it is to be as current as possible and has an innovative supplemental database called Recent References that is updated weekly. This database contains unedited records, i.e., records that have not been proofread, edited for clarity, and indexed. However, the title, journal title, authors, and abstract are available for searching. It is a terrific tool since it allows access to the newest literature before being assigned to a permanent database.

How about Google Scholar? It has not revealed how often it intends to update. The Jacsó study reports that between November 2004 and January 2005 there was no evidence of an update. As mentioned, Google Scholar has access to some well-regarded sources but there is no information forthcoming with regard to how often they are accessed. CSA provides monthly updates of skeleton records to Google Scholar so at least that much is known with confidence. The best option is to narrow the search by publication date, but there is no way to know if a particular record is absent because the journal is not covered at all or if Google Scholar just hasn't gotten around to it.

RETRIEVABILITY OF RECORDS

The ultimate goal of any search is to retrieve useful records as quickly as possible. Retrievability is a function of the search terms and the record's searchable fields.

Most fields in an abstract database record are searchable, the most noticeable ones being abstract, journal title, article title, author names, author affiliations, keyterms, classifications, and cited references. A word or set of words are designated as search criteria and the database returns results which have found the search terms in any of the fields. The most important fields are the title, abstract and keyterms. There are some fields which are useful for finding a specific article, for example, the journal ISSN or the DOI (Data Object Identifier). When a CSA record is viewed, the search terms found (in this case: aluminum, composite, automo*, welding) are highlighted with boldface and italics as shown. More important, the value of the keyterms is demonstrated; the terms themselves suggest possibilities for new searches. By clicking on them and choosing a search criterion (the Boolean AND, OR) better searches can be devised on the spot.

abstract for metal matrix composites

The abstract editor's primary task is to make the record as retrievable as possible, which means placing it in as many databases as appropriate and making the record as searchable as possible by editing the abstract, assigning keyterms, and assigning classification codes.

Keyterms are assigned from a controlled vocabulary, which ensures consistency within the databases, including historical data. Searches can be done using any combination of words, but controlled vocabulary terms from the thesaurus will generate more targeted results.

There are approximately 41,500 technology terms alone, so abstract editors have plenty of flexibility. On the other hand, few abstract editors will remember them all, so machine-aided indexing (a computer program that compares words in the record to the controlled vocabulary) is used to suggest terms. The abstract editor eliminates redundant and irrelevant terms and adds necessary terms. Efforts are made to assign keyterms that are non-redundant with other parts of the record, especially the title and abstract. Machine-aided indexing is invaluable to the abstract editor because it increases the chances of all concepts being caught and suggests terms that might otherwise be overlooked. The controlled vocabulary and machine-aided indexing algorithms are regularly updated as the need for new keyterms arises. A good keyterm list will have some general terms and some specific ones; machine-aided indexing helps the editor think more creatively and thoroughly about this list. The limitation of machine-aided indexing is that only the "raw" record is searched so important information that comes later in the article is not considered.

As helpful as machine-aided indexing is, the abstract editor has the final responsibility for the keyterms. However, the article must in some way suggest the keyterms; the editor cannot make assumptions about the author's intent! The most common places in an article where the abstract editor looks for information are the abstract, first paragraph of the introduction, last paragraph of the introduction, first paragraph of experiment description, conclusions, and figures and captions. Google Scholar uses computer algorithms, or crawlers, to index the first 100-120 K-bytes that it finds, but it is not unusual for a scholarly article to be nearly 1M-byte. Good abstracting and indexing editors know that critical information is often found later in the article.

If the abstract editors know to look for information in other parts of the article, why are the abstract and title so important for the purpose of indexing? Not all abstracts come in journal form; a significant portion (about 30% for TRD, more for other databases) of all records come to the abstract editors in electronic form and the original journals are never seen. In those cases, all the abstract editor has to work with are the abstract, article title, and journal title. CSA hires only college graduates (many with advanced degrees) to work on databases in their respective fields. A history major, for instance, would never work on the Entomology Abstracts database. This specialization means that CSA abstract editors are skilled at "reading between the lines" to look for clues to support keyterms for concepts that are not explicitly mentioned, something even the very best computer indexing algorithms cannot do.

Google Scholar's search results are not always sensible and consistent. In June 2005 Jacsó (http://www.gale.com) ran three simple searches using the terms protein, proteins, and "protein or proteins" and found that more records were generated for single terms than for the search on both terms. Also it was found that narrowing the publication from 1970-2005 to 1972-2005 date range increased the number of hits. To test whether or not such anomalies still exist, the searches were run again in February 2006 by De Guire with results summarized in Table 1.

Table 1 - Comparison of Google Scholar and CSA searches
Search Term Google Scholar
Publication years
CSA Illumina
Publication years
1970-2005 1972-2005 1970-2005
protein

94,500 2,320,575
proteins
98,700 1,626,193
protein or proteins 90,800 118,000 2,821,185

It is obvious that the Google Scholar results do not make sense. How can it be that there are fewer records for "protein or proteins" than for either individually? How is it that more records are found in the smaller time frame? The numbers for CSA Illumina are consistent with common sense. Also telling is the incredible discrepancy in the number of records found by CSA Illumina vs. Google Scholar. When Jacsó ran this search in 2005 there were 1,080,000 hits for protein, many more than this search found. Where have they gone? It appears that many publications are lost on Google Scholar. The rounding of Google Scholar results to the nearest hundred also seems peculiar.

What happens if a very narrow search is performed? For this article De Guire searched the terms (aluminum and composite and automo* and welding) to find articles about welding of aluminum-based composite materials for automotive applications. The publication range was limited to 2001 and later.

CSA Illumina found 35 results; Google Scholar found 17. The average publication date of the Google Scholar records is 2003; the average of the first 17 CSA Illumina records is 2004. How many results in common did they find? None. This is worrisome since CSA shares records with Google Scholar on a monthly basis, and lots of them. CSA's METADEX alone (the database containing these 35 records) contributes about 3500 records per month. Keyterms are not included with the CSA-provided records but there should have been some overlap since the titles and abstracts are fully searchable.

google scholar with two results

How closely did the records match the purpose of the search? The first two Google Scholar hits (shown above) do not appear to have anything to do with the topic and they are pretty stale - 3 and 4 years old respectively. However, clicking on the first result shows that the title was truncated, the full one being more promising: "Study of diffusion welding SiCp/ZL 101 with Cu interlayer." The searcher has to go to some extra effort to determine that this first record is, in fact, useful. The second result is about magnesium and, obviously, not on the topic. Seven of the last nine Google Scholar results are on unrelated topics: automotive refinishing, hot isostatic pressing, corrosion of magnesium alloys, adhesive technology, green industrial engineering, transistor design, and microchip fabrication. Of the 35 CSA Illumina records, 25 were directly on topic; 8 were on related topics like welding of aluminum, heat transfer during welding, etc. One record was on magnesium alloy matrix composites as a substitute for aluminum matrix composites in automobiles and 2 records were unrelated to the topic (adhesive bonding for aircraft, welding of polymers and wood). In all, about 10% of the CSA Illumina records were not useful compared to about 40% of the Google Scholar records.

These examples (protein search and aluminum search) demonstrate the effectiveness of the different search methods. By surveying records and websites, Google Scholar skims the surface of available information and returns incomplete, inconsistent results. As a collector of information, CSA Illumina allows searchers to dig deep for target information.

INTERFACES

Anyone who has prepared a meal knows that certain tools are preferred over others. What difference does it make which spoon is used to stir the pot? A favorite spoon will be just the right size, shape, and heft. The cook interacts well with it and it works well with the food. The cooking experience is satisfying because the tool used was effective and pleasant to use. The cook and tool worked well together.

Similarly, the way in which a user interacts with a literature search tool can make a difference in how satisfying and effective the search is. There are two complementary points of interaction: the search screen and the results screen.

Setting up a good search is the key to getting the desired results. The CSA Illumina Advanced Search screen offers multiple pathways to target articles. Search terms can be linked with Boolean operators and searches can be restricted to specified fields.

search in progress

The above search is looking for articles on aluminum base or magnesium base composite materials that were not written by De Guire. It is limited to the Technology Research Database which is actually a set of 25 databases. It can be saved for running again later, run in combination with another search, or revised for better precision. The search screen might look complicated at first glance, but this only reflects its goal of telling CSA Illumina where to dig deep in the collection of records for the target information.

In contrast to the CSA Illumina Advanced Search, the Google Scholar Advanced Scholar Search is quite a bit simpler because its highly general results don't require precision; its purpose is to tell the search engine which articles to grab when it surveys the web.

google scholar search

As shown above, only a few parameters can be set. As many search terms as desired are entered in fields with Boolean functions but they cannot be narrowed beyond the article or the title. It is not possible to search on fields like ISSN or DOI, which are useful for finding specific articles or a set of articles in a publication (e.g., a special issue dedicated to one topic). Because the search screen is so simple, it is useful for searchers who are not sure what they are looking for and for beginners. The search can be narrowed by choosing up to 7 very broad subject areas (Chemistry and Materials Science; Engineering, Computer Science and Mathematics, etc.). Searches cannot be saved or combined so it's back to the beginning every time.

The purpose of a literature search is to get useful results, and the way in which results are presented and can be managed after they are found is important. Usually, search results are presented either by relevance or by date. Ranking by relevance means that each record has been evaluated and is listed from most relevant to least relevant. Results presented by date show the most recent record by publication date first.

Google Scholar only presents results by relevance ranking, which is done by extracting metadata, citation frequency, and other information. The ranking can be manipulated by authors who make available multiple versions of the same material: web site, preprint, conference presentation, and published article. Google Scholar's website gives tips on how to improve an article's relevance ranking, perhaps of more interest to the author than to the searcher. Providing tips on improving relevance rankings of scholarly articles seems ironic given the assumption of objectivity underlying scientific research.

CSA Illumina's default is to present results by date but results can also be ranked by relevance. Relevance ranking is determined by comparing search terms with the first eight descriptors (keyterms) provided by the abstract editor. More matches means higher relevance ranking. The downside is that abstract editors might put the most important descriptors later than eighth in a list. Also, effort is made to assign keyterms that are not redundant with other parts of the record so a word or phrase that is prominent in the title or abstract might not be included in the keyterm list. However, a well-designed search will generate only relevant results, and with the ability to save searches, the most recent results are likely to be relevant.

The CSA Illumina results screen allows the searcher to see how many results were found for several types of source material. If the searcher is only interested in a certain type of source, such as peer-reviewed journals, a simple click is all that is needed.

CSA Illumina results

This snapshot shows CSA Illumina's short format but records can also be presented in full format (full abstract, bibliographic, and classification information) with or without references, or in a custom format determined by the user. Even in short format, the record includes full title, authors, bibliographic information, and three lines of abstract. Combined with the descriptors (right side of the record), the user can very quickly determine whether or not this record is useful. Records can be selected from the list by clicking on the box next to the record and then saved, printed, or emailed.

Google Scholar results generally offer only two lines of fragmented abstract, so it is necessary to click on each result individually to see what it is. There are no mechanisms to save the records other than cutting and pasting them into another file for saving.

Google Scholar has received plenty of attention since its launch in November 2004, mostly on the strength of the Google brand name. A Google search is a popular, quick way to get basic information and Google Scholar can do the same, but the user should be aware that this just does not measure up to CSA (or any other A & I database) in terms of coverage, currentness, quality, and ease of use. Probably the best use for Google Scholar is to get some quick, basic information which can be used to design an effective search in an abstract database.

Authors have no control over the means by which their articles are found, but they can exert some control over the process by writing abstracts that are effective. A well-written abstract will contain vocabulary that is likely to match search terms. Also, an understanding of how abstract databases are put together will improve an author's ability to write an abstract that attracts readers.

ABSTRACT DATABASE CONSTRUCTION

Database construction is a volume operation; more records are better because the goal of abstract databases is to comprehensively cover the literature in a given subject area. At CSA abstract editors work on several databases simultaneously and can assign an article to as many databases as are appropriate. For example, CSA's Technology Research editors in Ohio produce 25 database files; 16 are constructed directly by the editors and 9 are derivative, i.e. specialized subsets of the other 16. About 3,300 serials and hundreds of non-serial publications, such as conferences, books, monographs, standards, etc., are monitored so that well over 4,000 sources are covered. A full-time Technology Research editor edits about 80 abstracts per day and, on average, will assign a record to 2.1 databases. The abstract editor needs to work fast!

Abstract editors are responsible for proofreading records. Strange, even humorous, things can happen during record production since the process is automated with scanners and optical character recognition. CSA's Technology Research editors know to change "suicides" to "silicides" and "theological properties" to "rheological properties." Every CSA TRD record is looked at individually to correct spelling and to verify the accuracy of all bibliographic information, ensuring that the records are of the highest quality. As a surveyor of information, Google Scholar has no ability to edit any part of a record.

While some customers will purchase only one database, purchase of database sets is more common. The more databases that an article legitimately can be in, the more likely it is that the article will be found.

KEYTERMS AND CLASSIFICATIONS

In the pre-computer days of literature searching, keyterms and classifications were the primary way to find an article. Now that computers search all fields in the records, the function of keyterms and classifications has changed. Classifications are still similar to chapter headings but not many searches are restricted to only one classification/chapter heading. Rather, classifications guide the editor by defining topical coverage (what articles to include in the database) and directing the emphasis of the keyterms.

Beyond improving the searchability and retrievability of a record, keyterms serve other useful purposes. They help "tell the story," i.e., by skimming the keyterms it should be possible to determine an article's main points. In this example, the reader knows just from the keyterms whether or not the article is of interest:

Descriptors Aluminum base alloys; Steels; Automotive engines; Decoration; Corrosion prevention; Piston rings; Corrosion resistance; Case hardenability; Chemical vapor deposition; Thermal spraying

This example demonstrates another function of the keyterms, which is to build derivative databases such as CSA's Corrosion Abstracts. Records are not assigned directly to Corrosion Abstracts; instead they are culled during the production process to meet the database's topical coverage. The corrosion-related keyterms here ensure that this record will be included. Finally, keyterms are useful in helping to develop better search parameters by suggesting additional search terms.

THE ABSTRACT

The abstract is an author's best opportunity to interest the reader, making it one of the most important parts of an article. Abstracting and indexing editors, as well as computer indexers (like Google Scholar), rely heavily on the abstract for information. It needs to be large enough to cover all the important information, but small enough to be interesting to the reader (the so-called bikini principle). The target length of an abstract for the Technology Research Database is 300 words or less and longer abstracts will be shortened. The author maintains maximum control over the abstract's content by restraining the length.

The abstract should answer the reader's question: "Will this article tell me something new that I need to know?" It should tell the reader what was done, how it was done, and why it was done (the reason the reader should care). A materials science article, for example, would tell the reader the material, the form of the material (film, bulk, powder, etc.), the experiment, and the application. Here is an example of an excellent opening sentence for an abstract:

"Effects of aging treatment on high temperature strength of Nb added ferritic stainless steels for automotive parts were investigated." [Ahn, Sim, Lee, 2005]

Right away, the reader knows the material, the experiment, and the application. Readers who are interested in any of these aspects will continue to the rest of the abstract and the article. The abstract editor knows immediately that this article will be placed in two abstract databases: METADEX and Mechanical and Transportation Engineering Abstracts. While this is primarily a metallurgy article, the author's mention of the automotive application guarantees that the record will be placed in the database that automotive engineers use most, Mechanical and Transportation Engineering Abstracts. The article can be found by all interested readers.

COMMON ABSTRACTING MISTAKES

A few examples of abstracts recently added to CSA's Technology Research Database illustrate the most common abstracting shortcomings. By far, the most frequent mistake is too little information.

Example #1 - incomplete description
In this example [Kharkovsky, Hepburn, Walker, Zough, 2005] the material is never specifically mentioned, which narrows the possible audience for the article. The abstract editor can guess, but cannot assume, that the foam is a polymer, so the keyterms make no mention of polymers. No search in which 'polymer' was a required term would find this article. Also, it is entirely possible that the technique described in this abstract could be useful for other applications, so mention of the material would have expanded the audience to include researchers interested in nondestructive testing methods for the particular material.

The space shuttle Columbia's catastrophic failure has been attributed to a piece of external tank spray on foam insulation striking the left wing of the orbiter, causing significant damage to some of the reinforced carbon/carbon leading edge wing panels. Subsequently, several nondestructive testing (NDT) techniques have been considered for testing the external tank. One such technique involves using millimeter waves, which have been shown to easily penetrate the foam and provide high resolution images of its interior structures. This paper presents the results of testing three different spray on foam insulation covered panels by reflectometers at millimeter wave frequencies, specifically at 100 GHz. Each panel was fitted with various embedded discontinuities/inserts representing voids and unbonds of different shapes, sizes and locations within each panel. In conjunction with these reflectometers, radiators, including a focused lens antenna and a small horn antenna, were used. The focused lens antenna provided for a footprint diameter of approximately 12.5 mm (0.5 in.) at 254 mm (10 in.) away from the lens surface. The horn antenna was primarily operated in its near field for obtaining relatively high resolution images. These images were produced using two dimensional scanning mechanisms. Discussion of the difference between the capabilities of these two types of antennas (radiators) for the purpose of testing the spray on foam insulation as it relates to the produced images is also presented.

Descriptors Foams; Panels; Discontinuity; Nondestructive testing; Microwaves; Space shuttles; Millimeter waves; Fuel tanks; Tiles; Insulation; Life cycle engineering

Example #2 - lack of relevance
In this case the application for the work, i.e. the "why should we care" aspect, was not included. The abstract describes everything well except for the application. That comes in the first sentence of the introduction and was added by the abstract editor (see italics). As written, the article would be included in two databases: METADEX and Aluminum Industry Abstracts. With the editorial change the article will also be included in Mechanical and Transportation Engineering Abstracts and Aerospace Abstracts - twice the coverage. This article came to the abstract editor as a journal and the important sentence was found. [Gao, Starink, Davin, Cerezo, Wang, Gregson, 2005]

Hot rolled Al-6Li-1Cu-1Mg-0.2Mn (at.-%) (Al-1.6Li-2.2Cu-0.9Mg-0.4Mn, wt-%) and Al-6Li-1 Cu-1 Mg-0.03Zr (at.-%) (Al-1.6Li-2.3Cu-1 Mg-0.1Zr, wt-%) alloys developed for age forming were studied by tensile testing, electron backscatter diffraction (EBSD), three-dimensional atom probe (3DAP), transmission electron microscopy (TEM) and differential scanning calorimetry (DSC). For both alloys, DSC analysis shows that ageing at 150DGC leads initially to formation of zones/clusters, which are later gradually replaced by S phase. On ageing at 190DGC, S phase formation is completed within 12 h. The precipitates identified by 3DAP and TEM can be classified into (a) Li rich clusters containing Cu and Mg, (b) a plate shaped metastable precipitate (similar to GPB2 zones/S"), (c) S phase and (d) delta spherical particles rich in Li. The Zr containing alloy also contains beta' (Al3Zr) precipitates and composite beta'/delta' particles. The beta' precipitates reduce recrystallisation and grain growth leading to fine grains and subgrains. Age forming is a key innovation in the fabrication of curved structural components for aerospace applications, e.g. wing skin.

Example #3 - distracting information
Abstracts should not contain information that does not serve the primary purpose of interesting readers in the article. The most common extraneous information includes references, experimental details such as suppliers, or model numbers (unless the article is about the latest model of an instrument or piece of equipment). This kind of information is important to the article but should be placed in the paper's references and experimental procedure sections. In the abstract, it is distracting and the reader does not learn anything new. Abstract editors will remove this information to improve readability.

Here is an example of distracting referencing. The first four lines and the last two lines of the abstract are consumed with references. The reader has to read 5 lines before finding out what the article is about. In this case the abstract editor removed the references (shown in italics) before assigning the article to a database. If it is necessary to refer to specific work then just mention the names, as in "Numerical simulation was done to validate the model of Jones and West in the higher temperature range." The reader will correctly assume that Jones and West are in the references. [Zhang, He, Du, 2005]

In situ SEM observations (Zhang JZ. A shear band decohesion model for small fatigue crack growth in an ultra-fine grain aluminum alloy. Eng Fract Mech 2000;65:665-81; Zhang JZ, Meng ZX. Direct high resolution in-site SEM observations fo very small fatigue crack growth in the ultra fine grain aluminum alloy IN 9052. Script Mater 2004;50:825-28; Halliday MD, Poole P, Bowen P. New perspective on slip band decohesion as unifying fracture event during fatigue crack growth in both small and long cracks. Mater Sci Technol 1999; 15:382-90) have revealed that fatigue crack propagation in aluminium alloys is caused by the shear band decohesion around the crack tip. The formation and cracking of the shear band is mainly caused by the plasticity generated in the loading part of a load cycle. This shear band decohesion process has been observed to occur in a continuous way over the time period during the loading part of a cycle. Based on this observation, in this study, a new parameter has been introduced to describe fatigue crack propagation rate. This new parameter, daldS, defines the fatigue crack propagation rate with the change of the applied stress at any moment of a stress cycle. The relationship between this new parameter and the conventional daldN parameter which describes fatigue crack propagation rate per stress cycle is given. Using this new parameter, it is proven that two loading parameters are necessary in order to accurately describe fatigue crack propagation rate per stress cycle, da/dN. An analysis is performed and a general fatigue crack propagation model is developed. This model has the ability to describe the four general type of fatigue crack propagation behaviours summarised by Vasudevan and Sadananda (Vasudevan AK, Sadananda K. Fatigue crack growth in advanced materials. In: Fatigue 96, Proceedings of the sixth international conferene on fatigue and fatigue threshold, vol. 1, Oxford: Pergamon Press; 1996. p. 473-8).

Example #4 - too little information
Sometimes the abstract editor suspects that the article has broader interest than the abstract indicates but the article never quite gives enough information. In this example, the abstract editor might suspect that the article should be included in Civil Engineering Abstracts and Earthquake Engineering Abstracts; however, neither the abstract nor the article provides enough information for doing so. It is possible, even likely, that this article is not in as many databases as it should be. [Wakatsuki, Watanabe, Okada, 2005]

In previous studies, it has been found that the shape memory effect of the embedded straight and wavy shape memory alloy (SMA) fibers enhance the strength and energy absorption prior to fracture of the composite, where the embedded SMA fibers shrink due to their shape memory effect. In the case of wavy fiber reinforced composites, the SMA fibers were subjected to pre-tensile strain using fiber holder with rotatable rollers to maintain the constant periodicity and amplitude of wavy fibers. In this study, on the other hand, the wavy SMA fibers were subjected to pre-tensile strain without using fiber holder, and therefore, periodicity and amplitude of wavy fibers were varied during the deformation. Then the wavy SMA fiber reinforced smart composite is fabricated. For the mechanical property characterization, three-point bending test is performed for the specimens.

Other situations
Here are a few other practices that are not abstracting mistakes but can decrease the effectiveness of abstracts:

  • Hyphenation
    This tends to decrease the effectiveness of machine-aided indexing since the controlled vocabulary thesaurus minimizes use of hyphens. Use the online thesaurus to verify hyphenation.
  • Bullets
    These do not compress well into easy-to-read, paragraph-style copy. The bullet points are still completely searchable and the article will be found but the abstract will be clumsy to read.
  • Non-standard nomenclature
    Strong efforts are made by abstracting and indexing providers to limit the controlled vocabulary terms to standard nomenclature. For example "chemical vapor evaporation" is better described as "chemical vapor deposition." CSA Illumina has online thesauri, which can be helpful to authors as they write.
  • Vague or unqualified terms
    This is most common when describing properties such as strength or conductivity. The abstract editor would much prefer to apply a specific keyterm whenever possible, like tensile strength, bend strength, ionic conductivity, electrical conductivity, superconductivity, etc.
  • Undefined mathematical symbols
    Mathematical symbols should be defined as they are used. An abstract editor might be unable to assign critical keyterms through lack of knowledge. It is also possible that the same symbol has a completely different meaning in other disciplines.
  • Mathematical and chemical equations
    Equations are the essential language of mathematicians, physicists, and chemists but, generally, equations do not scan well and are often removed by the abstract editor. Describe what the equation says to guard against information being lost if the equation is edited out, for example: "Differential geometry was used to model heat transfer in piston rings."
  • Figure and table references
    The abstract should be independent from other parts of the paper. Figures and tables should not be referred to in the abstract, only in the article itself.

SUMMARY

Publish or perish - the only thing worse is to publish and perish! This article has shown that the quality of information retrieved in a literature search is dependent on the tool that was used to seek it. Google Scholar searches are useful for getting basic information and then building a more robust search in an abstract database like those on CSA Illumina. However, Google Scholar will never be able to replace abstract databases because of the fundamental differences between the two approaches to gathering information. Google Scholar surveys the available information, which leaves it susceptible to inconsistency. Abstract databases collect information so that current and historical articles are always available. In either case, authors need not be at the mercy of the whims of search engines. This article has also shown how abstracts can be written to maximize searchability and prevent articles from perishing after publishing.

© Copyright 2006, All Rights Reserved, CSA