The Novial98 Dictionary Project

Phase 1: NL finished

The Novial Lexike must be put online; since N30 is the baseline from which we are working, the NL is the baseline for our vocabulary. As such, we need to get it online in its entirety in some form.

Phase 2: SGML

The NL, being as it is a printed book, is in flat text format. In order to be more useful to us online, we need to convert this into some format which conveys more information: SGML. I have written a DTD and assorted software for an SGML-formatted dictionary, but the actual text itself needs to be tagged.

Phase 3: Internal dictionary

There are two types of dictionary. Multilingual dictionaries are for converting between two languages, and as such they typically have short, one or two-word definitions. Monolingual dictionaries serve to define the word in the same language. We should go through the dictionary and write out full definitions in Novial for each word.

Phase 4: Other languages

English, French, and German do not a quorum make. They do cover a large portion of our target audience, but we will need to expand the languages covered by the dictionary. Good early targets (imo) would be Spanish and Esperanto.


If you're interested in helping out on this, email me. The work may seem daunting, but it's not so much of a commitment to just do a page or two at a time. At the moment, most of the work that needs to be done is converting to SGML; not as hard as it sounds, as I can do some preprocessing on the file to get you started. So here's what you do if you want to help. Ideally, you email me, and I set aside a section for you, and email you back that it has been set aside. This way completely avoids duplication of work. However, I know that when I am helping other people on projects, I tend to have time right now, and I don't want to wait for approval before I can start work on it. If you are of this sort, then the best we can do to avoid a collision is for you to pick some section at random, download it, and email me so I can officially set it aside ASAP. Just make sure you don't take one of the "in progress" sections, as those are already being worked on by someone.

SGMLizing: If you speak SGML, you might want to read the DTD itself, but otherwise: the SGML format is a lot like HTML, but with different tags. In our project, each letter is stored in its own file, and each file has this basic format:

<section letter=a>
<pron type=pref><fra>a <eng>as in card, calm <deu>a

...

<group root=abat>
<word pos=sb>abate <deu>Abt <eng>abbot <fra>abbé
<word pos=sb>abata <fra>abbesse <eng>abbess <deu>Äbtissin
<word pos=sb>abatia <deu>Abtei <fra>abbaye <eng>abbey
</group>

...

<suffix from=sb to=sb>aje
<fra>quelquechose composé de ou ayant le caractère de <eng>something
made of, consisting of, having the character of <deu>bestehend aus, nach
Art von
<ex>lanaje <fra>article de laine <eng>woollen goods <deu>Wollware
<ex>infantaje <fra>enfantillage <eng>childish act <deu>Kinderei
</suffix>

...

<prefix>arki
<deu>Archi-, Erz- <eng>arch- <fra>arch-
<ex>arkianjele
<ex>arkiepiskope
<ex>arkiduke
</prefix>

...
</section>

Here is what each tag does: section Self-explanatory. pron Pronunciation. Its type can be one of pref(erred), pos(itional), or alt(ernate). There can be several prons for a certain letter; each contains a tag for some number of languages which have a corresponding pronunciation. group contains a group of words which are derived from each other. Its root is the root form; for nouns and adjectives, just hack off the last vowel, and for verbs keep the vowel but drop the R. We may come up with a better way to do this later, but if you do it this way now it can be mass-converted later. word is an individual word; its pos (part of speech) can be one of vb sb adj adv prep pron konj interj num. Also, it can have an irr argument if the word is irregularly derived (represented in NL by a double bar). suffix is, of course, a suffix; from and to are what the suffix takes, and what it converts to; they can be any part of speech, plus any and same. prefix is just that. eng, fra, and deu, and eventually others, are the languages we support. nov is used when a definition is given in Novial, and cf for a reference to another word (in NL, designated by "kp."). ex is an example, and can itself be followed by definitions.

Above all, take a look at one of the completed SGML sections to get a good idea of what to do.


Don Blaheta / dpb@cs.brown.edu
































hi spatie intentionalim fad vakui