MSORT

ScreenshotScreenshot
Msort's graphical user interface

Contents

  1. Description
  2. Comparison with GNU Sort and BSD Sort
  3. Details
  4. Documentation
  5. Downloads
  6. Environment
  7. Change Log
  8. Bugs
  9. Roadmap

Description

msort is a program for sorting files in sophisticated ways. It was originally developed for alphabetizing dictionaries of "exotic" languages in formats like those used by Shoebox and Toolbox, for which it has been extensively used, but is useful for many other purposes. msort differs from typical sort utilities in providing greater flexibility in parsing the input into records and identifying key fields and greater control over the sort order. Its main distinctive features are:

msort understands UTF-8 Unicode. Unicode may be used anywhere that text is entered: in the text to be sorted, in sort order and exclusion definitions, as a field or record separator, or as a field tag. Full Unicode case-folding is available.

Review by Ben Martin at linux.com
     (上の日本語訳)

If you are looking for the specialized Hungarian sort program also called msort, try here.


Back to Top

Comparison with GNU Sort and BSD Sort

Msort's capabilities are very close to a superset of those of GNU sort and BSD sort. Msort provides greater flexibility in selecting key fields, more comparison types, the ability to use collation rules from different locales on different keys, the ability to handle numbers in non-Western number systems, and a variety of other options lacking in GNU sort and BSD sort. Whereas msort understands Unicode, GNU sort and BSD sort do not. It is a property of the UTF-8 transfer format that a binary sort will sort in Unicode codepoint order, so for some purposes GNU sort will behave in an acceptable manner on Unicode input. However, operations requiring an understanding of the encoding of the input do not work properly in GNU sort and BSD sort with Unicode input. Capabilities of GNU sort and BSD sort lacking in msort are the ability to merge files without sorting them (the --merge option) and the ability to emit only the first of an equal run (the --unique option).

Generally speaking, msort is the more powerful program, either the only choice or the more convenient choice in cases in which something other than standard sorts of positionally selected fields are required. On the other hand, if GNU sort or BSD sort is capable of doing what you want, it will generally be faster. The exact ratio varies with the details of the sort and the nature of the input, but in my tests, where msort and GNU sort are capable of performing the same sort, GNU sort is typically several times faster than msort. BSD sort seems to be slightly faster than GNU sort.

Back to Top

Details

LanguageCmain program
 Tcl/Tkfor GUI only
DependenciesTRE regular expression libraryrequired
 ICU - International Components for Unicodeone or the other
required
 Utf8proc
 Uninum number conversion libraryoptional
 GNU MP multiple precision arithmetic libraryoptional
used by uninum
 Tcl/Tk version 8.3 or higherfor GUI only
 Iwidgets (Tcl/Tk library)for GUI only
LicenseGNU General Public License,Version 3
Current version8.53
Last modified2010-01-10


Back to Top

Documentation

A standard Unix manual page is included in the package, or you can read it here. The full documentation is the reference manual (PDF), a copy of which is included in the package.

The manual contains a number of examples, including how to use msort to sort SIL Standard Dictionary Format files as used by Shoebox and Toolbox.

Back to Top

Downloads

FileSize (Bytes)MD5 Sum
msort-8.53.tar.bz2 440,307 01e78967b4e4197f867831f8c8f4c48d
msort-8.53.tar.gz 476,722 a6468fbb8503bb52331994f96eb7b54c
msort-8.53.zip 535,715 255966cfcf0470de93572e4f714707f8

If you would like to be notified of new releases, subscribe to msort at Freshmeat.

Packages

Debian
Debian package (testing)
Debian package (unstable)
FreeBSD
FreeBSD Freshport
Mac OS X
Macport
Mac OS X binaries
Softpedia (PPC and Intel)
Darwinports
Nexenta/GNU Solaris
Nexenta packages
OpenPKG
OpenPKG package
Redhat Linux
Redhat RPMs
SUSE Linux
Source and i686 executable RPMs courtesy of Pascal Bleser: SUSE RPMs.
Solaris (SPARC and Intel)
Solaris Package Index
T2
T2
Ubuntu
Ubuntu packages


Back to Top

Environment

The underlying command-line program msort should compile and run without difficulty on any POSIX-conformant system on which the requisite libraries are available. In practice, this should mean just about anywhere. It is known to compile and run without modification under GNU/Linux, FreeBSD, Mac OS X, and SunOs. I am note sure whether the current version will compile and run properly under MS Windows, even under Cygwin, due to the fact that MS Windows uses UTF-16 Unicode internally while msort expects UTF-32.

Note also that msort may be configured to compile without the GMP and Uninum libraries, at the cost of forgoing the ability to handle numbers in non-Western number systems. If you cannot or do not want to install these libraries, run configure with the option --disable-uninum. This will also disable linkage with libgmp.

The graphical user interface should run anywhere that Tcl/Tk is available, but a few features may not work on non-Unix systems. In particular, the Abort Sort command depends on the existence of a Unix-style kill program that can be used to send a signal to another process. It is known to run under GNU/Linux, FreeBSD, and SunOS. msg will run properly under Mac OS X if you have installed X11 and use Tk-X11. msg now adapts itself to Tk-Aqua sufficiently well as to be usable, but some details remain to be dealt with.


Note: obtaining the necessary Tcl/Tk environment.

The GUI requires both the basic Tcl/Tk distribution and the iwidgets library. If you already have Tcl/Tk and just need to add iwidgets, you can obtain the package from the Sourceforge project site. On the download page you will find source and binary packages for both [incr Tcl/Tk], which is the basic part of this package, and [incr widgets], which is the part that contains the widgets. You will need to install both. (iwidgets is an alternative name for [incr widgets].)

The easiest way to obtain the Tcl/Tk environment you need is to install the ActiveTcl distribution from ActiveState. This distribution provides the Tcl language, the Tk graphics library, and a bunch of extensions, including [incr tcl] and [incr widgets]. Don't be concerned by the fact that ActiveState is a commercial outfit. The Tcl/Tk distribution that they provide is free as in both beer and speech. They make their money selling services and programming tools. The ActiveTcl distribution is currently available for: GNU/Linux, HP-UX, AIX, Solaris, Mac OS X, and MS Windows.

For FreeBSD, Tcl and Tk are available at:


Back to Top

Changes

8.53 - 2010-01-10

8.52 - 2008-12-06

8.51 - 2008-10-14


Full Change Log


Back to Top

Known Bugs

Under obscure conditions date sorts may produce a segmentation fault or valid date fields may be rejected as invalid. I have been unable to reproduce this bug on my own system. It may or may not be significant that the machine on which this bug has been reported is a 64-bit machine.

Known bugs in the GUI are:


Roadmap

If you care about any of these, please feel free to drop me a line.


Back to Top


Back to Bill Poser's software page.
Valid HTML 4.01 Transitional Valid CSS!