Building a Vocabulary



P. Battistini and L. Benacchio


This report describes the criteria used in building a vocabulary for the management of astronomical catalogues. The work was born and was developed within a program for the realization of an astronomical data base in the Astronet network. The creation 'ex novo' of such a vocabulary has in itself a series of mainly astronomical considerations that we have thought better to separate from the whole informatic work in order to illustrate them more widely.


1.1 Why a vocabulary is needed.

One of the main difficulties in the automatic management of astronomical catalogues is that each catalogue has its own peculiar characteristics (Benacchio,1984). Up to now two completely different solutions have been used: ad hoc management of each catalogue, or creation of a homogeneous data base, i.e. of a univocal data set for each object. Both solutions have advantages and disadvantages which we do not want to deal with here (Benacchio, 1985; Egret, 1983 and references therein). To quote only the most obvious we can state that the first system implies an informatics practice level generally higher than the average level of the user, whereas the second allows to find out only the information contained in the data base since the choice has already been made by the person who has inserted and homogenised them.

The "Astronet Data Base Group" has approached the problem in a different way, partly complementary to the philosophy of homogeneous data base: this has led to the development of a software system for catalogue management in VAX DEC environment: DIRA (Hunt and Nanni, 1985). DIRA enables an astronomer to self manage the astronomical catalogues, both the existing ones and the user's work catalogues.

The management of complex files from an astronomical and informatics viewpoint is made easier by a series of logical assignements used as a link between the software architecture and the user. A class of assignements is represented by the names of the catalogues. In order to utilize the catalogue he wants, the user must specify the logical name with which the catalogue is known in the system he is using. For example AGK3 is the logical name through which you can access the files containing the homonymous catalogue. It is to be noticed that, to avoid ambiguities, the complete identification of each catalogue is composed by many accesses that identify it completely both astronomically and informatically. An interactive documentation system manages this lot of information that the user can easy read on line as a "mask". (See DIRA users manual for details).

It is acted in the same way with the quantities contained in the catalogues, that is with the values contained in each column. A logical identifier is given to each column, by which identifier the variable or the physical parameter contained will be identified without ambiguity. For instance the declination is indicated as DEC. As a consequence the user interested in a certain parameter of a particular catalogue simply has to know the appropriate identifiers. If he doesn't, he can find them through DIRA documentation system. For example to access the declination column of AGK3 it is enough to specify the catalogue name: AGK3 and the column name DEC. (see DIRA User Manual for a precise description of the available commands and of the possibilities offered by the software itself).

This is the basic concept of DIRA structure, as seen by side of the final user. The idea, so simple that it looks banal, allows the untrained user to manage one or more catalogues (even simultaneously) very simply, provided he uses DIRA vocabulary.

1.2 Base choice for a vocabulary.

The practical realization of a vocabulary of this kind involves a lot of different problems. As stated above the main advantage of a vocabulary is to set up a logical link between the data and the user. This connection is well known and largely used by each of us every day. The main obstacle is the un-knowledge of the vocabulary itself. The problem has two aspects: a so called local problem and a transport one.

The local problem is faced by the user who approaches the management software without knowing, all or partly, the vocabulary. DIRA tries to solve this problem through a system of basic vocabulary consultation as flexible and friendly as possible. Besides we try to give meaning to the single words, as it will be seen later on.

The transport problem is met when several vocabularies are used. To sum up, since there is no standardization of nomenclature for the astronomical quantities contained in the catalogues internationally accepted, it is evident that any vocabulary can only have a local value. This situation is improving owing to the extension of FITS to the catalogues and to the tables (Grosbol, 1983).

We have set to the compilation of this work trying

a) to impose as few constraints as possible so as not to make it difficult for the user;

b) to make it "natural" (self-explaining) for an astronomer, of course;

c) not to block future developments even user's ones. (DIRA also provides a "personal level" where the user can manage himself and as a consequence is allowed to modify the vocabulary of his own personal level).

1.2.1 Single words building rules.

When the compilation of a vocabulary is being set to, the first problem to be faced is the composition of the single words. About this matter problems arise because of the different way of dealing with the character strings by different systems hard/software. In order to be as compatible as possible with most systems and peripherals, without wasting too much, we have decided to conform to the following rules for the composition of words:

i) Each word can be formed by 8 characters at the most;

ii) The composition of the words is made by using only the English language characters and the characters representing digits 0 ... 9; the following characters "?" "," "/" may give unpredictable results and are therefore excluded.

We have built the single words of our vocabulary according to the above stated rules. Please note that rule ii) suggests, does not impose the use of a sub-set of ASCII characters. With these rules it should be possible to fit the most possible systems used in astronomy. The use of only capital letters is due to problems of back compatibility with different systems of the previous generation, still in use.

1.2.3 General properties of the vocabulary.

Once we have established the rules for the single words, as far as the building of the whole vocabulary is concerned, we have chosen two very simple, but very important in our opinion, informing criteria.

The first is the significance criterion. In fact this vocabulary is to be used in an astronomical context therefore it must be significant for an astronomer. For each word we have tried to use the term most frequently used by the specialists in that branch of astronomy to which that word refers. This is particularly clear in the words concerning the photometry. Obviously this implies a refining of the vocabulary itself if we receive suggestions from users specialists in various sections. Besides, each word is accompanied by a explanation and by a possible bibliography. This since each user knows only his specific field well.

The second criterion we have followed is that our vocabulary should cover the existing terminology without precluding future developments.

This point is of a particular importance especially if we consider the remarkable inflow of astronomical images taken with solid state detectors, whose production rate increases every year and the images coming from satellites. Such images have characteristics sometimes very different from the ones of conventional images. Stated the small number of the present rules, we think this vocabulary is suitable for subsequent addition of words, simply inserting them among the ones already on the vocabulary. Furthermore the existence of a list, this one or another, might be of great importance to avoid doublets, as it often happened in the past.


2.0 Examination of Strasbourg CDS catalogues.

As for the practical compilation of the vocabulary, at first we decided to examine the list of the catalogues available at the CDS in Strasbourg, and, if necessary, to integrate it later, because this is the largest and the most detailed list of catalogues on magnetic tape actually available for the astronomer's use. We thought it a good approach to take into account the present situation. We have made a statistics of the single words, corresponding to parameters present in the catalogues, grouped for logical and operating classes. In table 1, for some classes, is reported the incidence percentage of various items on the total of words considered.


Frequency 1 2 3-5 5-10 >10 Percentage 500% 17% 14% 8% 11%

Unfortunately it is evident that half of the designations is used only once. The conclusion, obvious as well as disappointing, is that this list like probably other ones, has the same troubles as catalogues listed in it, such as ambiguity, doubleness, etc.. It is therefore not usable for our purposes.

After this attempt to solve the problem 'a posteriori', we have taken up its resolution more systematically, including in some cases also the words given by the many, generally small, catalogues of the literature.

2.2 Practical compilation of the vocabulary

The vocabulary has been divided into 9 sections that in short reflects the classes of variables present in grater part of examined catalogues. The complete list is reported in table 2: the first column reports the word selected, the second one the symbol most commonly used to indicate the variable or the physical quantity and the third column a concise definition. In compiling various sections of the vocabulary we did not aim at completeness: the whole list is indicative and must be revised and completed.

The section [Magnitudes, color indexes, photometric parameters] has been compiled with a particular care. The most important photometric systems in use have been examined and we have introduced, besides magnitudes and colors, some derived parameters, characteristic of various systems. Building words we have largely used suffixes that permit to distinguish various photometric systems. For example: BMV (generic B-V color), BMVJ, BMVS, ... (B-V colors in the Johnson's, Stromgren's, ... systems). The "-" (minus sign) is always substituted with the letter "M".


We have defined a vocabulary with a minimum of rules and some hundreds of words. This last is due mainly to the photometric parameters. In the DIRA environment there are no problems in managing so many words, because of the presence of an automatic procedure of retrieval, but for a mnemonic use the vocabulary could be too large. In the other hand we can think that each astronomer will be interested only to that part pertaining the actual research in groups.

We will be very grateful to all the people, users or not, that will give us impressions, comments or suggestions.


Benacchio, L., 1984, Astronet 1983, Ed. G. Sedmak, 109

Benacchio, L., 1985, this volume

Egret, D., 1983, CDS Bulletin, 24, 109

Grosbol, P., 1983, private communication

Hunt L., Nanni, M., 1985, this volume

Previous: Vocabulary
Up: No Title
Next: References
Previous Page: Vocabulary
Next Page: References