This package provides conversion in both directions between UTF-8 Unicode and a variety of 7-bit ASCII equivalents. Such ASCII equivalents are useful when including Unicode text in program source, when debugging, and when entering text into web programs that can handle the Unicode character set but are not 8-bit safe. For example, MovableType, the blog software, truncates posts as soon as it encounters a byte with the high bit set. However, if Unicode is entered in the form of HTML numeric character entities, Movable Type will not garble the post.
For example, here is the Chinese for regular expression in Unicode:
正規表達式and here is the HTML hexadecimal numeric character reference output from uni2ascii:
正規表達式
The package consists of three programs. The actual work is done by uni2ascii and ascii2uni. The third program is u2a, a graphical interface to uni2ascii and ascii2uni.
The Unicode escapes handled include:
Microsoft-style HTML character entities and numeric character references without the final semi-colon are converted with a warning message.
The package can also be used to convert from one type of ASCII representation to another by passing through Unicode. For example, the pipeline:
ascii2uni -a U | uni2ascii -a J
will convert from \u-escapes (e.g. \u00e9) to RFC2396 URI format (e.g. %C3%A9).
ascii2uni -a H | uni2ascii -a D
will convert HTML hexadecimal numeric character references to decimal numeric character references.
ascii2uni -a H | uni2ascii -a H -a Q
will convert HTML hexadecimal numeric character references to HTML character entities where equivalent character entities exist, and
ascii2uni -a M | uni2ascii -a H
will convert SGML hexadecimal numeric character entities to HTML.
uni2ascii can also replace non-ASCII characters with approximate ASCII equivalents. For example, it can replaced stylistic variants (e.g. bold-face) with their plain counterparts, or characters with accents with their unaccented equivalents.
uni2ascii and ascii2uni are provided with standard Unix manual pages:
Both programs also provide a detailed summary of their command line options in response to the -h command line option.
The graphical user interface U2A provides balloon help and an explanation of how to use the program.
If you need to convert between UTF-8 Unicode and other encodings, you may find enca, iconv, recode, and uniconv useful. If you need to convert between textual representations of numbers and machine representations, you may find the programs ascii2binary and binary2ascii helpful. If you need to find out more about what is in a Unicode file (e.g. if you don't know the writing system, don't have the necessary font, think that the Unicode may be ill-formed, or need to examine details of representation such as composition) you may find the Unicode Utilities suite of programs useful.
| Language | C [basic programs], Tcl/Tk [GUI] |
| Environment | POSIX |
| License | GNU General Public License, version 3 |
| Current version | 4.9 |
| Last modified | 2008-05-06 |
| Contact | Bill Poser |
uni2ascii-4.9.tar.gz
[md5: 6a5a8c43bd02447710024d39937303d3]
uni2ascii-4.9.tar.bz2
[md5: bdc88503c395930fb7a15a5f72157a62]
uni2ascii-4.9.zip
[md5: 860263f9942bec31e5ce6d80ea7a6460]
If you wish to be informed of new releases, subscribe to uni2ascii at Freshmeat.
uni2ascii and ascii2uni have been compiled and tested under FreeBSD, GNU/Linux, Mac OS X and SunOS. They should compile and run without modification in any POSIX-compliant environment. u2a should run on any platform on which Tcl/Tk is available, which includes all major platforms.
The GUI requires both the basic Tcl/Tk distribution and the tablelist library. If you do not have the Tablelist library installed, you will find out when you execute U2A because it will generate the error message:
Error in startup script: can't find package Tablelistand abort. You can also check by starting wish and at the prompt entering the following:
lsearch [package names] Tablelist
If the result is -1, Iwidgets is not present. If the result is zero or greater, Tablelist is present.
If you already have Tcl/Tk and just need to add Tablelist, you can obtain the package from the website of the author, Csaba Nemethi: http://www.nemethi.de/. To install Tablelist, just copy the directory (currently named tablelist4.2) into the Tcl lib directory. For example, if you are using the ActiveTcl distribution and Tcl is installed in the default Unix location, /usr/local/ActiveTcl, copy the Tablelist directory into /usr/local/ActiveTcl/lib:
cp -r tablelist4.2 /usr/local/ActiveTcl/lib
The easiest way to obtain the Tcl/Tk environment you need is to install the ActiveTcl distribution from ActiveState. This distribution provides the Tcl language, the Tk graphics library, and a bunch of extensions including Tklib. Tablelist is included in Tklib as of Tcl/Tk version 8.4.12, the current stable release (as of 2005-12-12). Don't be concerned by the fact that ActiveState is a commercial outfit. The Tcl/Tk distribution that they provide is free as in both beer and speech. They make their money selling services and programming tools. The ActiveTcl distribution is currently available for: GNU/Linux, HP-UX, AIX, Solaris, Mac OS X, and MS Windows.
For FreeBSD, Tcl and Tk are available at:
ascii2uni contains a bug that affects impure mode conversions of standard hex (-X option). Version 3.9.2 fixes the bug for inputs within the BMP, that is, for hex values less than or equal to 0xFFFF. A more general fix is anticipated.