MSORT Full Change Log
8.53 - 2010-01-10
- Adapted to be compatible with libtre 0.8
- Removed unnecessary conditioning of Hybrid mapping code on availability of locale support.
- Added -Z option for copying the first record to the output without sorting it. This is useful for sorting files with a header.
- Considerably reduced the memory used for exclusions
- Fixed a bug in the reporting of exclusions
8.52 - 2008-12-06
- ISO8601 keys may now have an optional leading sign.
- If a key has comparison type "random", it is no longer stored
since it won't be used. This saves a little time and possibly a good bit of
storage.
- If one or more records have been discarded due to problems in key
extraction but the run is otherwise successful, the exit code is
now RECORDEXCLUDED (13) rather than BADRECORD (8).
- Cleaned up and improved the log.
- Made error-checking and reporting finer-grained in GetMonthNames.
- A few of the regression tests depend on the locale system, which may fail
for reasons independent of msort. These tests have now been separated
so that their failure will not suggest that msort itself is not working.
Typing "make test" runs the main set of tests. Typing "make localetest"
runs the locale-dependent tests, the results of which are written to
LocaleTestResults.
- Split time and iso8601 date/time regression tests so as not to
mix data with and without time zone offsets since mixing them causes
tests to fail if executed in some time zones.
- Added regression test for more complex substition.
- Added information to the manual section on random comparison.
If you don't know how random comparison can be useful other than
for unsorting, you might want to check this out.
8.51 - 2008-10-14
- It is now possible to set the random number generator seed from the command-line,
allowing replication of random sorts. Whatever its origin, the seed used is now
reported in the log.
- Added regression tests for angles and collating sequences.
- Rearranged the start and completion time stamps in the log so that
the former immediately precedes the latter, facilitating comparison.
8.50 - 2008-09-29
- Null keys resulting from the presence of a tag with no following value, are now
detected and handled like key fields that are missing in their entirety.
- Fixed a bug in which compound positional keys (those involving portions of more
than one field) were not properly null terminated, resulting in various sorts
of errors.
- Positional key specifications with a character offset of zero are now treated
as fatal errors and reported to the user.
- Made report of failure of key extraction on one or more records subject to Verbose flag,
so that giving the -q option completely silences msort so long as there are no non-fatal
errors.
- ISO8601 time/date stamps are now written into the beginning and end of the log file.
- Made failure of check for libintl during configuration non-fatal.
- Added 17 more regression tests.
8.49 - 2008-09-24
- Fixed bug resulting from allocation of insufficient memory in ws2u8.
- Removed inappropriate free of stack storage in GetSubstitutions.
- Removed debugging line inadvertently left in GetWhiteSpaceDefinition.
- Eliminated printing of message "Consult the log for details" if logging is not enabled.
- Added section on Dependencies to README, discussing use of non-standard
libraries and how to build the simplest configuration.
- Eliminated direct call to libgmp, which should simplify installation.
Now get gmp version info from function in libuninum if a sufficiently recent
version of libuninum is available.
- Added check for libintl.
- Added nine more regression tests.
8.48 - 2008-09-20
- Updated case-folding to Unicode 5.1.
- Fixed a bug that caused inappropriate rejection of ISO8601
date/time stamps lacking a numerical time zone offset.
- Fixed a bug that caused inappropriate rejection of well-formed
ISO8601 times.
-
Fixed a bug that caused a segmentation fault if
an attempt was made to analyze as a date a string not
containing the expected first date component separator.
- Added -A flag for lexicographic sorts using only the
first character (after substitutions, exclusions, etc.). This
emulates pre-modern alphabetization practice.
- Information about the system and machine type is now provided
along with the version information.
- Replaced definitions of Unicode character types in terms of unsigned long
etc. with minimum size types such as u_int32_t for greater portability.
- Added some regression tests, which may be executed by typing "make test".
- Added some sample sort order definition files to the distribution.
8.47 - 2008-07-01
- Fixed a bug causing occasional segfaults when using utf8proc for normalization.
- Modified domain name processing to handle optional protocol segment,
which is treated as lowest priority component like the username in an email address.
8.46 - 2008-05-27
- Fixes bugs in rearrangement of email addresses.
- Made license for msg consistently GPLv3.
8.45 - 2008-05-19
- Fixes a bug in Turkic case-folding.
- Fixes a bug in hybrid comparison arising on machines in which sizeof(int) != sizeof(long).
- Fills in missing text in section 10.7 of reference manual.
8.44
- Fixes a bug in which a character range with a positive first
index and negative second index was not accepted.
- Fixes a bug that prevented compilation with --disable-uninum
when uninum/uninum.h is not present.
- Msg now checks for Tk 8.5 and adapts font sizes accordingly.
8.43
-
Fixes bug that prevented compilation with configuration flag --disable-uninum if
file <uninum/nsdefs.h> is not present.
8.42
- The input and output files may now be specified by means of command-line options.
- If the same file is specified for input and output the input will be overwritten by
the output. (The output is written to a temporary file, which replaces the input file only
if the sort is successful.)
- A new command-line flag allows generation of the log file to be suppressed.
- The license has been changed to GPL version 3.
8.41
- Time and date-time keys may now contain ISO 8601 timezone specifiers.
- The hyphen and colon separators in ISO 8601 date-time strings are now optional, as per the standard.
8.40
-
This version provides a choice between the International Components for Unicode
and utf8proc libraries, which we use only for Unicode normalization. utf8proc
is a smaller library with fewer dependencies and is likely easier to install.
utf8proc is now used by default. To use ICU instead give the option
--disable-utf8proc to configure.
8.39
- All input is now normalized by default to Unicode Normalization Form C.
A new command-line option allows for normalization
to be disabled or for Normalization Form D to be used instead.
- Several bugs were fixed by triggering an error exit when a file cannot be opened.
- A configure option is now available that disables linking of libuninum and libgmp.
This eliminates the ability to handle numbers in non-Western number systems but makes
installation easier for those who do not need this capability.
8.38
- Date keys now permit month names to be used in the month field.
- The numbers in date keys may now be in any supported number system.
- Dates may now consist of just a year and a day-of-year component.
- The validity of numbers in date keys is more carefully checked than before.
8.37
- Adapted msort to a change in the libuninum API.
- Adapted the list of number systems in msg to version 2.1. of libuninum.
8.36.1
- This version now includes the same unicode.h as used by libuninum
and has the includes all sorted out so as to eliminate various installation and
portability problems.
-
Changed AC_FUNC_MALLOC in configure.ac to AC_CHECK_FUNCS([...malloc...])
since on some systems malloc fails the GNU test of returning non-null
when passed an argument of zero. This causes the compilation to fail
on unresolved references to rpl_malloc.
8.36
- The names of some macros have been changed to adapt to version 2.0 of libuninum.
- The version message now reports the version of libuninum.
- An option has been added to the configure script that disables comparison counting.
For runs involving a very large number of comparisons this produces a small speed up.
8.35
- A bug has been fixed that caused a segmentation fault when certain key-specific command-line
options were used prior to any key-selector.
- Makefile.am has been modified to force linkage with GMP since on OpenBSD and Mac OS X
the autoconfiguration system apparently doesn't take care of this.
8.34
- The ability to use keys in number systems other than the usual Western Indo-Arabic
system has been extended to numeric string comparison.
- Support has been added for the Arabic, Armenian, Cyrillic, Glagolitic and Greek
alphabetic number systems and for Limbu, New Tai Lue and Osmanya.
- Chinese numbers with units as large as 極 (1.0e48) are now supported.
- The code that deals with non-Western number systems is no longer part of msort. It is
instead necessary to link msort with the
Uninum library.
- The command line flag -N now lists the available number systems.
- When ill-formed UTF-8 is encountered in input, msort now produces an informative
error message rather than just halting silently.
- A bug that produced segmentation faults on some systems for some combinations of options has, I think, been fixed. As a result, it should no longer be necessary to install the TRE library
with --enable-system-abi.
8.33
- Numeric keys are no longer limited to the usual Indo-Arabic number system.
Integers written in any of the following number systems are now accepted:
Arabic,
Arabic (South Asian),
Bengali,
Burmese,
Chinese,
Devanagari,
Egyptian hieroglyphic,
Ethiopic (Amharic and Tigrinya),
Gujarati,
Gurmukhi (Panjabi),
Hebrew,
Kannada,
Klingon,
Lao,
Malayalam,
Nko,
Old Italic,
Old Persian cuneiform,
Oriya,
Phoenician,
Roman numerals,
Tamil,
Telugu,
Tengwar,
Thai,
and Tibetan.
The writing system for a key is specified by the -y flag. You may require a particular
writing system, have msort autodetect the writing system but require all records to use the
same writing system for that key, or have msort autodetect the writing system for each
record independently.
8.32.2
- Fixes bug introduced in 8.32 that caused crashes on numeric keys.
8.32
- Redid the handling of non-locale-dependent hybrid keys.
This eliminates occasional, apparently random, segfaults.
- Fixed a subtle non-fatal bug in multigraph mapping.
- Corrected an asymptomatic bug in temporary storage allocation
in the processing of hybrid keys.
- In command-line help, informational options (flag -H) are now listed separately
from other general options (flag -F) in order to shorten the latter listing.
8.31
- Domain name comparison has been extended to handle email addresses as well.
- Log messages now reflect the use of domain name comparison and have been improved in other minor ways.
8.30
- A new comparison type for domain names was added. It compares domain names as if they
were parsed into their components and the order of the components reversed, so that
subdomains belonging to the same domain are grouped together.
- In the log the copy of the command line by which msort was invoked is now indented for
greater readability.
8.29
- The reference manual has been updated. Among other things, a comparison
with GNU sort is now provided, including a list of equivalent command-line options.
- A new command-line option, -G, prints a list of equivalents to GNU sort
command-line options.
- The list of defaults printed by the -D option now includes the sort algorithm.
8.28.1
- Added an ifdef for alloca.h, which FreeBSD does not have.
This fixes a bug that prevented msort from building under FreeBSD.
8.28
- Fixes a segmentation fault on combination of -c n and -w options (Debian bug #383230).
- Fixes a free of free pointer on failed key extraction (Debian bug #383232).
8.27
- Case-folding was updated to
Unicode 5.0.
- If there is only one well-formed record in the input,
we still mention that there is no point in sorting only one
record but we now exit successfully after writing out that record.
This is desirable for non-interactive use.
8.26.2
- Fixes a bug that incorrectly treats a read interrupted in the middle
of a UTF-8 sequence as a truncated UTF-8 sequence.
8.26
- Case-folding was updated to Unicode 4.1.
- Turkic case-folding has been added as an option.
- The help menu buttons that bring up popups have been changed to toggles.
- Since the GUI was beginning to take up too much space, it has been
restructured so that each section can individually be displayed or rolled up. By
default everything starts out rolled up, which greatly reduces the footprint.
Sections may be displayed or rolled up interactively or by means of initialization
file commands. I couldn't decide whether I liked it better for the rolled up section
to be a button in its entirety or for most of it to be an inert label with just a
checkbutton active, so I implemented both. By default the inert label with checkbutton
is used. If you prefer the entire section to be a button, add the line "UseCheckbuttonP F"
to your .msgrc. See the file Docs/InitializationFiles in the distribution directory for
more information. The list of commands can also be obtained from the Help menu.
8.25.2
- This bug fix release corrects errors in misc.c that prevent compilation
on systems lacking gettext. If gettext is present on your system this does not affect you.
8.25
- Hybrid comparison using locale collation rules for the textual portions has been completely revamped and appears to be bug free.
- A new command-line option, -Q, causes msort to check whether the
input is sorted.
- Conversion of stylistic variants to their plain equivalents now includes
superscripts, subscripts, and the Hebrew presentation forms.
- A bug in the conversion of stylistic variants that erroneously converted "9" to "y" has been fixed.
- A small memory leak was eliminated.
- Explanatory popups were added to msg for conversion of stylistic variants
and enclosed forms and stripping of diacritics.
8.24
- Sorting by month names and abbreviations is now supported. Month names may be read from
a file or, if the glibc locale system is available, obtained from the locale.
8.23.2
- A space that causes trouble with some older versions of Tcl
somehow crept back into msg. This patch fixes it. If you don't use msg or
if msg starts up without any problem, you don't need this new version.
8.23
- A new command-line option -T allows the user to specify for a particular
key that certain classes of characters should be replaced by simpler
counterparts. One suboption strips separately encoded diacritics and replaces ASCII characters
with diacritics with their plain counterparts. A second suboption replaces characters enclosed
in circles or parentheses with their plain counterparts. A third suboption replaces "fancy
styles" with their plain counterparts. The "fancy styles" replaced include:
small capitals (e.g. U+1D04),
script forms (e.g. U+212C),
black letter forms (e.g. U+212D),
Arabic presentation forms (e.g. U+FE81),
fullwidth forms (e.g. U+FF01),
halfwidth forms (e.g. U+FF7B),
and the mathematical alphanumeric symbols (e.g. U+1D400).
8.22
- Memory usage has been considerably reduced by the elimination of some leaks.
- Some potential memory corruption was eliminated.
- The messages and log records of memory usage, number of comparisons, and number of
records processed have been neatened up and the format made locale-dependent (on systems
that support it).
8.21
- If the system supports getopt_long, the long options are now available.
- Help for command-line options is now split into general and key-specific help.
- A work around was added for a bug in some MS Windows implementations of Tcl that
triggered an error in msg.
8.20
- It is now possible to specify that comparisons on a particular key are to follow
the collation rules for a locale rather than supplying one's own sort order
definition. This is done by giving a locale name as argument to the -s flag
instead of a file name. If the argument is "locale", the collation rules for the current locale
will be used. This affects lexicographic, hybrid, and string length comparisons.
- A bug that prevented multigraph definitions from affecting string length has been fixed.
- The amount of memory dynamically allocated is now recorded in the log.
8.19
- The reference manual has been revised and expanded.
- In msg an error in the help popup for key selection by position range has been corrected.
- The -F option now shows both format options for angles.
8.18
- Space characters (as defined in the current locale)
are now ignored in numeric keys so as to
prevent errors from arising when records are not lines and fields
are not lines and field-final whitespace is encountered.
Otherwise msort will think that the field does not consist
entirely of numbers.
-
The configure script now checks for libtre
so that its absence will be detected prior to compilation.
8.17
- An additional comparison type, numeric string, has been added. In
ordinary numeric comparison, the strings are converted to floating
point numbers and stored and compared as such. In numeric
string comparison, numbers are stored as strings and compared using a
specialized form of string
comparison. This guarantees no loss of precision and allows the sort to
take into account
differences in the representation of the numbers as strings, such as
the difference between
5.00 and 5.0. It uses more memory than numeric comparison and is
slightly slower.
Numeric string comparison is available only for decimal numbers in
standard format.
It is not available for numbers in scientific notation or for bases
other than ten.
- A bug was fixed where msort failed to open the input file but tried to
read from it anyhow.
- A funky extern function declaration that prevented compilation under Mac OS X was fixed.
- Detection of invalid command line arguments was improved.
- In msg the selection is automatically copied from the message region to the clipboard,
facilitating copying and pasting.
- In msg a change was made to avoid triggering what appears to be a bug
in some older versions of Tcl.
8.16
- Key selection by position has been extended to allow keys to span contiguous fields
from one position to another, where a position consists of a field number and a character offset.
This makes key selection by position approximately the same as in GNU sort. One
difference is that unlike GNU sort, msort permits field numbers to be negative,
indicating a count from the end. The other is that in msort if only a single position
is specified only that field is included in the key, whereas in GNU sort
the key extends from the specified position to the end of the record.
8.15
- Adds to the accepted formats for angles the format in which the components are
separated by whitespace rather than a colon.
- Angles with absolute values greater than 360 are now accepted.
- A bug affecting negative angles was fixed.
8.14
- Fixed length records are now supported.
- Angles are now supported as a comparison type.
- A wider range of time formats are now accepted.
- A bug that tacked an extra newline onto the end of the output has been fixed.
8.13
- A choice of four sorting algorithms is now available: Insertionsort, Mergesort, Quicksort,
and Shellsort. This means that stable sorts are now available and a sort
efficient for nearly sorted input is available.
- The random number generator is now reseeded using the current time each time
msort is executed. This means that msort will generate different output
on different runs with the same input and parameters if random comparison is used.
- If the -B flag is used to specify that the input consists entirely of characters
within the Basic Multilingual Plane, msort now checks the input and if a character outside
the BMP is found, reports it and aborts.
- The appearance of the GUI has been improved.
- The formatting and alignment of various numbers reported by msort has been improved.
8.12
- If the machine/compiler combination supports the long double type, numeric keys are
now stored as long doubles rather than doubles. This uses more memory but allows a much
greater range of numeric keys to be distinguished.
- The -L flag now gives the maximum number of multigraphs per key for all conditions.
8.11
- A bug was fixed in which msort crashed if given a very long numeric string as key.
- Numeric keys are now checked for well-formedness and for internal representability.
- The reference manual has been updated and some information added.
- In msg only the base names of the input and output files are shown if they are in the current directory.
- Codepoint validation in the popup for entering characters by Unicode codepoint has been
improved. The message window is now cleared at the beginning of each attempt to insert a character
so that it will be clear whether an error has been detected on the current attempt.
- In msg the key field position and character ranges are now validated and invalid values rejected..
- Better defaults have been set for the color of active menu items.
8.10
- A bug was fixed that caused the program to crash if certain key-specific options were
specified before any keys were specified. (This bug has no effect if you use the GUI
since it always specifies keys before specifying options.)
8.9
- A thorough code cleanup has been done. gcc -Wall -pedantic now produces no warnings.
- Autoconfiguration has been added.
- There is no longer any limit on the number of keys that may be used.
- In msg a bug was fixed that made the widget for inserting characters
by their Unicode codepoint insert them into itself.
8.8
Please note that these changes create some incompatibilities with previous versions.
One incompatibility is the change in format of custom character widget specifications.
A second is that some commands now end in P that previously did not.
- A number of additional properties can now be set from the initialization file.
In particular, it is now possible to define custom character widgets from the init file,
either by loading a file, as from the menu, or by including the widget definition
directly in the init file.
- Errors in the initialization file are now handled more gracefully. In most
cases the error is trapped in such a way that execution of the init file is not
aborted.
- All Boolean initialization file commands now end in P.
- Since the row and column labels of the IPA consonant and vowel charts take up a lot
of space, they may now be removed if desired.
- Custom character entry widget specifications now have a different format.
On the first line, the title should now come first, before the number of columns.
Characters are now specified using \u escapes rather than raw hex, e.g. \u00E9
in place of 00E9.
- Msg uses an improved font control panel.
- Msg's character entry widgets have been improved.
- All Msg scrollable windows in which a significant number of lines might appear
now have have an added binding on the right mouse button (Control-Button for Mac users)
that scrolls in larger increments than the left mouse button.
- Msort now tries to open its log file in /tmp if it cannot open
it in the current directory. Only if that fails does it exit.
8.7
- A hybrid comparison type is now provided. It is identical to lexicographic comparison
except for the fact that sequences of digits are converted to numbers and compared
numerically. This is useful for sorting section headings and things like that.
- A random comparison type is now provided. It causes comparisons on the specified
key to be determined by a random number generator rather than by the data in the key
fields.
- In the GUI positional key fields now default to the key number.
- The font control panel was improved in various ways.
- Some small changes were made in the information writtin on stderr and to the log.
8.6.1
- Fixes a bug in the exclusion entry widget in the GUI.
8.6
- The GUI can now be configured via an initialization file.
It is possible to set both properties of the GUI, such as fonts and
colors, and defaults for sorting parameters. It is also possible to
load custom character entry widgets, either from another file or by
including the information in the initialization file.
- The fonts used for various purposes may now be selected interactively.
- A Configure menu was added to the menubar. The balloon help toggle was
moved from the Help menu to the new Configure menu.
- The appearance of the GUI has been considerably improved.
- Substitution files may now contain comments. Lines beginning with a crosshatch
(#) are treated as comments.
- A bug that truncated long substitution specifications has been fixed.
- A bug that prevented browser invocation from the help system from working
under Mac OS X has probably been fixed. (I haven't had the opportunity to try it
on a Mac yet.)
- Command line options were added for providing version and usage information,
preventing the init file from being read, and setting the debug flag.
- Various informational messages from the GUI have been improved in small ways.
8.5
- It is now possible to specify a set of regular expression substitutions for each key.
This allows such things as making names beginning with Mc sort as if they began
with Mac.
- It is now possible to specify a key as a range of characters, e.g. the third through sixth
characters. Negative indices may be used to count from the end of the record instead of
the beginning.
- A bug in Unicode case-folding was fixed.
- A bug in character exclusion was fixed.
- It is now possible to specify that end-of-line in the input data is marked by
Carriage Return (0x0D), as is usual on Macintoshen, rather than Line Feed (0x0A), as is
usual on Unix systems.
- The General page of the GUI was rearranged.
- Some unused code and images were removed from the GUI.
8.4
- A bug in msort was fixed that made numeric keys unreliable.
- A bug in msort was fixed that garbled the copy of an ill-formed record written to the log
when key extraction fails.
- Detection of the operating system and base graphics system by msg was improved.
- A bug in msg was fixed that triggered an error when the "How to Use this Program" popup
was popped up.
- The missing balloon help for the save and cancel buttons of the sort order file separator
control panel was added.
- Additional balloon help was provided for the sort order file separator control panel in msg.
- The Tk-Aqua adaptation of msg was improved. The command buttons are now positioned at the
top of the window and are available at all times, not just when the General page is selected.
8.3
- The GUI does a better job of identifying the operating system.
- The GUI's adaptation to Tk-Aqua under Mac OS X has been improved.
8.2
- A bug that caused apparently random segmentation faults with optional keys was fixed.
- A bug in case-folding in the ASCII range was fixed.
- Msort error messages regarding invalid UTF-8 input have been improved.
- A flag (-W) was added to msort to allow the user to set which characters are treated
as separators in the sort order file.
- The GUI was modified to handle the -W flag.
- A bug in sort order specification in the GUI was fixed.
- The manual page and reference manual have been updated.
8.1
- A bug that interfered with the use of multigraphs was fixed.
- A modified configuration of msg adapted to Tk-Aqua is now available.
msg attempts to detect whether it is running under Tk-X11 or Tk-Aqua and
configures itself accordingly.
- msort can now be conditionally compiled without the internationalization and
localization libraries that are not available under Mac OS X.
8.0
- Msort now understands UTF-8 Unicode
- The command line flag -B was added.
This flag, if present, informs the program that the characters in the input
are restricted to the Basic Multilingual Plane, which permits a signficant
reduction in memory usage.
- The command line flag -p was added.
This flag instructs msort not to make internal use of the
Unicode Private Use areas.
- Regular expressions are now executed by the TRE library rather than the old Henry Spencer library.
- A variety of character insertion widgets were added to msg so as to facilitate
the use of non-ASCII characters.
- Panels controlling the -B and -p command line flags were added to msg.
- Instead of just trying to use the default browser, msg now
works through a list of browsers, trying each in turn.
- A help link was added in msg for regular expression syntax.
7.1
Note: changes are entirely in msg. If you already have the package installed, it is not
necessary to recompile msort itself.
- Keys can now be re-ordered in the GUI. Dragging one key selection button over another with the right mouse button swaps them.
- Miscellaneous small improvements were made in the GUI's handling of errors in msort and in preventing such errors.
7.0
- A graphical user interface called Msg (for Msort GUI) is now available. It is a separate program from Msort, written in Tcl/Tk. It helps the user to set the various
parameters, then executes Msort as a child process.
- The restriction of exclusions to lexicographic and string length keys
has been eliminated.
- The default date format has been changed to the International Date Format.
- ISO8601 date/time combinations are now supported as a key type.
- A command-line flag for case-folding has been added.
6.19
Version 6.19 corrects the information on the man page and in the flag usage message,
which incorrectly indicated that date format was a general rather than key-specific option.
6.18
This version contains a more informative man page (thanks to Kai) and slightly improved
reference manual.
6.17
This release updates the man page and manual, eliminates the -a option, which was not
very useful, and cleans up code a little bit.
6.16
This release fixes a bug that arises when characters with values above the ASCII
range are present in the input. Anyone
with non-ASCII input using an earlier version should upgrade.
Back to main msort page