Uni2ascii Change Log
4.18 - 2011-05-15
- Fixed bug in uni2ascii in which in certain cases the subsitution
count was too high, fixing Debian bug #626268.
- Patched to handle situation in NetBSD which lacks getline.
- Clarified semantics of pure option as converting characters
in ascii range other than space and newline. Fixed bug in which this was not implemented correctly for UTF8 types.
4.17 - 2011-02-16
Added to uni2ascii the following conversions to nearest ascii equivalent:
U+2022 bullet to 'o', U+00B7 middle dot to period, U+0085 next line to newline,
U+2028 line separator to newline.
4.16 - 2010-12-12
- The Q format works again in ascii2uni.
- Added U+2033 DOUBLE PRIME to the characters converted to their closest ascii equivalent
under using the e format in uni2ascii.
4.15 - 2010-08-29
- Renamed endian.h to u2a_endian.h to eliminate conflict with external endian.h.
- Removed copy of GNU getline from ascii2uni.c as it is in standard library as of POSIX2008.
4.14 - 2009-08-04
- Fixed a bug that interfered with the use of the Q format in uni2ascii.
- Fixed bug in which ascification of U+2502 and U+2503 added double quote to output.
- Fixed a bug in which -a S option generated a "Converted so many chars" line
for each character due to leaving in debugging code.
4.13 - 2009-04-22
- Fixed Debian bug #511527 which caused the count
of characters converted to ASCII by uni2ascii to be excessive.
4.12 - 2009-03-25
- Both programs now allow the input file name to be specified on the command line without redirection.
4.11 - 2008-10-02
- Adds support for the %uXXXX format.
- Adds support for the <XX><XX><XX> format.
4.10 - 2008-08-30
- Fixed bug that made Y argument to -a flag in ascii2uni a no-op.
- Added documentation for Y argument to -a flag to man page for ascii2uni.
- Corrected documentation for Q argument to -a flag in man page for ascii2uni.
- The help flag for both programs now provides correct information about the Q argument to the -a flag.
- Giving Y as the argument to the -a flag is now an error for uni2ascii.
- The action summary is now more informative. The new version is incompatible
with u2a, which I am no longer updating. The old action summary can be
chosen at configuration time by using the option --disable-newsummary.
- More informative version information is now provided.
- Fixed a bug that produced bad output or a segmentation fault if a line
ended in the prefix to an escape.
- In quoted-printable format if a line ends in an equal-sign, both the equal sign
and the immediately following newline are now skipped by ascii2uni
in accordance with RFC 2045.
- Fixes a bug in ascii2uni that can cause incorrect output or a segmentation fault.
- Added -P option to uni2ascii to pass through untransformed Unicode.
- Added -B option to uni2ascii as shorthand for cdefx.
- Added characters to -e and -x options to uni2ascii (see ChangeLog for details).
- Since hardly anyone seems to use it, the GUI u2a is no longer being developed.
- Ascii2uni now handles arbitrarily long input lines. This fixes a bug in
which spurious breaks were introduced at the end of very long input lines.
- Added support for OOXML format (e.g. _x00E9_).
- Fixed bug affecting O, T, and U formats (three low byte formats).
- Microsoft-style HTML entities lacking final semi-colon are now
passed on by default rather than converted by ascii2uni. The new -m flag
causes them to be converted.
- Ascii2uni now defaults the format if the user fails to specify it, eliminating a bug.
- Error messages and warnings from ascii2uni now include the line number.
- Adds -y option which produces single-character approximations for the same
characters given multi-character approximations by the -x option as per patch
by Jesse Peterson.
- The license has been changed to version 3 of the GPL.
- Fixed a bug in ascii2uni that deleted blank lines in certain cases.
- Removed the obsolete -8 flag from the usage message of ascii2uni.
- Adds to uni2ascii a new option that allows custom mappings of Unicode characters
to ASCII characters to be defined from the command line. (The GUI has not yet been modified
to allow the use of this option.)
- Added U+2500-U+2503 to the characters handled by the -e flag.
- Both programs now complain about unrecognized format specification instead of crashing.
- The pattern matches for examplars of the I, J, and K formats, previously missing, have been added.
- Several format names have been added to the list recognized as format specifications.
- Removed inadvertently introduced direct calls to gettext() which prevent compilation
on systems without gettext.
- Fixed bugs that prevented several formats from working in ascii2uni.
- Eliminated leftover information about format options from ascii2uni usage message.
- The numerous options for specifying formats other than the -Z flag
have been replaced by a single option, -a, which takes an argument. The argument
may be the same letter or number as before (without the hyphen), an example of the
desired format, or in some cases a name such as "SGML_hexadecimal".
- Formats supplied as arguments to the -Z flag are now checked to ensure
that they do not contain more than one conversion specification.
- Fixed a bug introduced in version 3.10 in which an HTML numeric character
reference lacking the final semi-colon led to the program not termiminating.
- A number of additional expansions are now performed by uni2ascii with the -x flag.
- The list of expansions performed by uni2ascii with the -x flag
is now provided by a new option, the -E flag.
- Corrected some typos in the usage information for uni2ascii.
- Adds support for hexadecimal numbers with prefix "16#" as in Postscript.
- Adds support for hexadecimal numbers with prefix "16#r" as in Common Lisp.
- Adds support for hexadecimal numbers with prefix "16#" and suffix "#" as in ADA.
- The look of the GUI has been improved.
- Adds support for decimal numbers preceded by "v".
- Adds support for hexadecimal numbers preceded by "$".
- 38 missing characters were added to the set from which diacritics are stripped by
uni2ascii with the -d flag.
- A bug was fixed in which U+013C LATIN SMALL LETTER L WITH CEDILLA
was mapped to upper case L rather than lower case l by uni2ascii
with the -d flag.
- Fixed bug in ascii2uni in which -G command line flag was rejected, fixing
Debian bug #401084.
- Fixed bug in ascii2uni and uni2ascii in which a missing argument to the -Z flag
produced an error message incorrectly identifying the error as an invalid flag, fixing
Debian bug #401084..
- Fixed a bug in the -J format of uni2ascii in which the first hex digit was omitted,
fixing Debian bug #401084.
- Adds support for Common Lisp format hexadecimal numbers, e.g. #x00E9.
- If no Unicode Replacement Characters were emitted, nothing is said about them
in the informational message at the end of the run.
- Information is now printed about individual ill-formed HTML entities missing their
Fixes a bug in uni2ascii in which a space was not added after space characters
(as per -s option) and newlines (as per -n option) when the -w flag
was given for UTF-32 formats.
- Fixes a bug in uni2ascii in which a read interrupted in the middle of a UTF-8
sequence was incorrectly treated as truncated. Thanks to Dylan Thurston for the
- Corrects errors in ascii2uni manual page, fixing Debian bug #367546.
- Adds information to both manual pages about default ASCII format.
- This release fixes a bug in ascii2uni that produces incorrect results
in impure mode conversions of standard hex (-X option). The fix does not work
outside the BMP, that is, for hex values above 0xFFFF.
A more general fix will be made available shortly.
- Two bugs were fixed in uni2ascii in which the -f option erroneously
converted 9 to y and Z to a.
- The -f option of uni2ascii now converts superscript and subscript
forms to their ASCII equivalents.
- A bug in u2a that reversed the value of the switch for converting
ASCII characters in going from Unicode to ASCII was fixed.
- Uni2ascii now reports the total number of characters processed and the number
- Fixed miscellaneous bugs in u2a in the reporting of the number of characters
converted, replaced, etc.
- Various errors in the handling of small capitals with the -f option were
corrected. Some that had been omitted have been added. Small caps are all now
changed to the corresponding plain lower case letter as per the Unicode classification.
Previously, they were changed to the corresponding plain upper case letter. One
small capital letter that was changed to the wrong plain letter was fixed.
The characters expanded by the -x option now included
the ellipses U+2026 … and U+22EF ⋯ and the arrows
U+2190 ← U+2192 →, U+21D0 ⇐, and U+21D2 ⇒.
- The characters replaced by an approximate ASCII equivalent with the -e
option now include the union symbol U+222A ∪.
- The three formats used in POSIX portable charmap files
(octal byte, e.g. \115\141\171, hex byte, e.g. \x4d\x61\x79,
and decimal byte, e.g. \d77\d97\d121) are now supported.
- This patch fixes a bug that triggered an error in u2a when neither TMP nor TEMP
is defined as an environment variable. This would usually be under MS Windows, but
to my surprise these variables are unset on some Unix systems as well.
- The manual pages have been expanded and several typos corrected.
- Several typos in the usage messages have been corrected.
- A popup in u2a now lists the approximate replacments performed if the -e option is given.
- The handling of invalid Microsoft-style HTML character entities and
numeric character references has been improved.
- A warning is now issued by ascii2uni if invalid Microsoft-style
HTML character entities or numeric character references lacking the
final semi-colon are detected.
- Ascii2uni now provides conversion of HTML character entities in pure mode.
- The numbers of Microsoft-style tokens converted and number of
unrecognized character entities replaced with the Unicode replacement
character is now reported by u2a along with the number of tokens
- Some typos in the manual pages were corrected.
- The SGML hexadecimal and decimal numeric character reference formats are now supported.
- The color-scheme of the GUI has been improved.
- A link to the Unicode Consortium web site has been added to the Help menu.
- A bug that interfered with links from the Bug Reports page has been fixed.
- The Expand button is now disabled when converting from ASCII to Unicode.
- The format \uN with decimal N used in RTF is now supported.
- An option has been added to uni2ascii that expands certain single Unicode characters
into a sequence of plain ASCII characters. For example, German eszet may be expanded to
- The options for replacing non-ASCII characters with related ASCII characters have been
extended to apply to the UTF-8 output formats.
- A bug that interfered with the -G option of uni2ascii has been fixed.
- Options have been added to uni2ascii for converting non-ASCII
characters to related
ASCII characters instead of to a textual representation. One option
converts stylistic variants
(e.g. boldface). A second converts characters enclosed in circles or
parentheses to the unenclosed character. A third option removes
diacritics. A fourth option converts functional
equivalents, such as dashes of various lengths to hyphen.
- An optional graphical user interface u2a has been added.
- The option (-Q) is now available in uni2ascii of converting Unicode to
HTML character entities where possible.
- In uni2ascii the -q option now works.
The format X'xxxx', consisting of a a hexadecimal number within single quotes (apostrophes)
with the prefix X, was added to both programs.
- The code has been cleaned up and autoconfiscated. This minimizes portability
problems and simplifies installation.
The format \ooo, consisting of a backslash followed by three octal digits, where ooo is the
octal representation of one UTF-8 byte, was added to both programs.
- Two formats =XX and %XX, where XX is the hexadecimal representation of one UTF-8 byte,
were added to both programs.
- The ability to convert HTML character entities was added to ascii2uni.
- A command line flag was added to ascii2uni to convert all three types of HTML escape.
- Added the command line flag -q to suppress chat.
- Added four more formats: <U00E9>, U00E9, u00E9, U+00E9
- Both programs now provide a command-line option for defining the input/output
- Patches a bug in uni2ascii that caused a segmentation fault
if the program was called with no command line arguments. The problem was
that the initialization of the conversion format to the default had
- uni2ascii now offers three new output formats: \x-escapes, \x-escapes with
braces, and \u-escapes within the BMP but \U- beyond the BMP.
- uni2ascii now offers a choice of upper- or lower-case a-f in hexadecimal
- The program ascii2uni has been added. This program is the inverse
of uni2ascii. It generates UTF-8 Unicode from 7-bit ASCII files containing
various escapes for non-ASCII characters.
This version replaces the original Python program with a C program that generates
four additional major types of output with a further 8 variants for each.
The C program is also 20 times faster, has better error-reporting, and handles
the entire Unicode range rather than just the BMP.