It is possible to create usable web-based multimedia dictionaries using pure HTML.
If the dictionary is small enough, this can be done entirely by hand. If it is larger,
it becomes tedious to generate by hand, but the programming necessary to generate
it from a database is quite simple.
I here provide a demonstration of these statements by providing the code for
a simple set of programs that generate a web-based lexicon from a database
as well as a sample of the result.
The programs take as input a dictionary database in a simple format and generates
from it a web-based dictionary. It is not intended to compete with more sophisticated
approaches, such as Kirrkirr or HyperLex, but rather to demonstrate how
one can generate a usable web-based dictionary using only the most trivial computer
programming. It does not provide sophisticated search tools, and it makes no attempt
to handle exotic writing systems.
However, it is more than a theoretical demonstration. For languages written in ASCII
characters, for lexica of only a few thousand words, where fancy searches are not
required, the dictionaries it generates are perfectly usable.
It is generally easiest to download all of the files at once, in which case you will
get a compressed tar archive.
Download pmwd.tgz
If you have GNU tar, you can decompress and unpack this by giving the single command:
tar xzf pmwd.tgz
If you have a version of tar that does not know how to decompress such archives, you will
have to decompress first using gunzip. Then use tar without the z flag
to unpack. On Microsoft Windows systems I am told that WinZip can unpack compressed tar
archives.
If for some reason you cannot deal with the compressed tar archive, you can also
download the files individually. See the descriptions of the files below. The
entry for each file contains a link allowing you to download it.
The files provided include a sample dictionary. To look at it, open the file sdtop.htm
in your browser. The browser window will be divided into two parts known as frames
The upper frame, which will occupy most of the browser window, will contain the index
to the dictionary, that is, a list of the words in alphabetical order. You can use the
scrollbar to show other parts of the list assuming that it is long enough that not all of
it fits into the frame at once. Each of these words is a link. Clicking on a word
will cause the definition to be displayed in the smaller frame at the bottom of the screen.
Try clicking on tsachun. Notice that the definition is followed by the
words "show picture". Click on them to see the picture. Now try clicking on hoonliz.
Notice that the definition is followed by the words "play sound". Click on them to hear
the word.
The lexical database is assumed to be in the format used by the Summer Institute
of Linguistics Shoebox program since this is very widely used.
Records are separated by blank lines. Each field begins with a backslash followed by
the tag that identifies the field.
The tag is followed by one or more spaces and then the contents of the field.
The headword should be in a field with the tag head. The definition should be in
a field with the tag def. Two optional fields are also used. The tag cat
specifies the category of the word. The tag sci contains the scientific name
of biological organisms. Your records may contain additional fields.
The tag snd gives the name of a sound file containing the headword.
The tag pic gives the name of an image file illustrating the headword.
Your recoeds may contain additional fields.
They will simply be ignored. Here is a sample of what such a file might look like:
\head duchun
\head tsachun
\head hoonliz
The software is most easily run on a GNU/Linux system. If you have such a system
with msort installed, all you need to do is make a copy of your database file
named lexicon.ldb, edit the file language so that it contains the
name of your language, and type make.
The make program will then follow the instructions in the file makefile
and generate the HTML files.
The HTML files generated are:
To use these files, just open dtop.htm in your browser.
If you do not have msort but can get your lexicon into the desired order
in some other way, after renaming your lexicon database lexicon.ldb,
make a copy of it called lexicon.srt. Then give the command:
touch lexicon.srt
If you do not have acces to make, you can just give the necessary commands by hand,
assuming that you have awk:
Depending on the kind of system you are using, you may have to go about executing
awk differently. In the above, a filename following a less than sign is input to
awk; a filename following a greater than sign is output from awk. Also recall
that on some systems the newer version of awk is called nawk or gawk.
The bulk of the work is done by small programs written in AWK, a
language. More information about AWK is available
here.
The other piece of software that you need is a sorting program that is capable
of sorting the lexicon database file. Many sorting programs cannot do this
because they can only handle single lines. The program used here, msort, is my
own sophisticated sorting program. The program and the manual can be downloaded
from my web page. However, msort is only available for UNIX systems. If you are
on a non-UNIX system, you will have to find some other way to get your lexicon file
into the order desired.
\def tree, stick, wood in general
\cat N
\def cache for storing food in the form of a little cabin on posts
\pic tsachun.jpg
\cat N
\def skunk
\sci Mephitis mephitis
\snd hoonliz.wav
This will make it look like lexicon.srt was created more recently than lexicon.ldb,
so the make program will just use lexicon.srt instead of trying to generate it from
lexicon.ldb.
Files Provided
Sample Files
Program-Related Files