Uni2ascii Change Log

4.18 - 2011-05-15

Fixed bug in uni2ascii in which in certain cases the subsitution count was too high, fixing Debian bug #626268.
Patched to handle situation in NetBSD which lacks getline.
Clarified semantics of pure option as converting characters in ascii range other than space and newline. Fixed bug in which this was not implemented correctly for UTF8 types.

4.17 - 2011-02-16

Added to uni2ascii the following conversions to nearest ascii equivalent: U+2022 bullet to 'o', U+00B7 middle dot to period, U+0085 next line to newline, U+2028 line separator to newline.

4.16 - 2010-12-12

The Q format works again in ascii2uni.
Added U+2033 DOUBLE PRIME to the characters converted to their closest ascii equivalent under using the e format in uni2ascii.

4.15 - 2010-08-29

Renamed endian.h to u2a_endian.h to eliminate conflict with external endian.h.
Removed copy of GNU getline from ascii2uni.c as it is in standard library as of POSIX2008.

4.14 - 2009-08-04

Fixed a bug that interfered with the use of the Q format in uni2ascii.
Fixed bug in which ascification of U+2502 and U+2503 added double quote to output.
Fixed a bug in which -a S option generated a "Converted so many chars" line for each character due to leaving in debugging code.

4.13 - 2009-04-22

Fixed Debian bug #511527 which caused the count of characters converted to ASCII by uni2ascii to be excessive.

4.12 - 2009-03-25

Both programs now allow the input file name to be specified on the command line without redirection.

4.11 - 2008-10-02

Adds support for the %uXXXX format.
Adds support for the <XX><XX><XX> format.

4.10 - 2008-08-30

Fixed bug that made Y argument to -a flag in ascii2uni a no-op.
Added documentation for Y argument to -a flag to man page for ascii2uni.
Corrected documentation for Q argument to -a flag in man page for ascii2uni.
The help flag for both programs now provides correct information about the Q argument to the -a flag.
Giving Y as the argument to the -a flag is now an error for uni2ascii.
The action summary is now more informative. The new version is incompatible with u2a, which I am no longer updating. The old action summary can be chosen at configuration time by using the option --disable-newsummary.
More informative version information is now provided.

4.9

Fixed a bug that produced bad output or a segmentation fault if a line ended in the prefix to an escape.
In quoted-printable format if a line ends in an equal-sign, both the equal sign and the immediately following newline are now skipped by ascii2uni in accordance with RFC 2045.

4.8

Fixes a bug in ascii2uni that can cause incorrect output or a segmentation fault.

4.7

Added -P option to uni2ascii to pass through untransformed Unicode.
Added -B option to uni2ascii as shorthand for cdefx.
Added characters to -e and -x options to uni2ascii (see ChangeLog for details).
Since hardly anyone seems to use it, the GUI u2a is no longer being developed.

4.6

Ascii2uni now handles arbitrarily long input lines. This fixes a bug in which spurious breaks were introduced at the end of very long input lines.
Added support for OOXML format (e.g. _x00E9_).
Fixed bug affecting O, T, and U formats (three low byte formats).

4.5

Microsoft-style HTML entities lacking final semi-colon are now passed on by default rather than converted by ascii2uni. The new -m flag causes them to be converted.
Ascii2uni now defaults the format if the user fails to specify it, eliminating a bug.
Error messages and warnings from ascii2uni now include the line number.

4.4

Adds -y option which produces single-character approximations for the same characters given multi-character approximations by the -x option as per patch by Jesse Peterson.
The license has been changed to version 3 of the GPL.

4.3.2

Fixed a bug in ascii2uni that deleted blank lines in certain cases.
Removed the obsolete -8 flag from the usage message of ascii2uni.

4.3

Adds to uni2ascii a new option that allows custom mappings of Unicode characters to ASCII characters to be defined from the command line. (The GUI has not yet been modified to allow the use of this option.)
Added U+2500-U+2503 to the characters handled by the -e flag.

4.2

Both programs now complain about unrecognized format specification instead of crashing.
The pattern matches for examplars of the I, J, and K formats, previously missing, have been added.
Several format names have been added to the list recognized as format specifications.

4.1.1

Removed inadvertently introduced direct calls to gettext() which prevent compilation on systems without gettext.

4.1

Fixed bugs that prevented several formats from working in ascii2uni.
Eliminated leftover information about format options from ascii2uni usage message.

4.0

The numerous options for specifying formats other than the -Z flag have been replaced by a single option, -a, which takes an argument. The argument may be the same letter or number as before (without the hyphen), an example of the desired format, or in some cases a name such as "SGML_hexadecimal".
Formats supplied as arguments to the -Z flag are now checked to ensure that they do not contain more than one conversion specification.
Fixed a bug introduced in version 3.10 in which an HTML numeric character reference lacking the final semi-colon led to the program not termiminating.
A number of additional expansions are now performed by uni2ascii with the -x flag.
The list of expansions performed by uni2ascii with the -x flag is now provided by a new option, the -E flag.
Corrected some typos in the usage information for uni2ascii.

3.13

Adds support for hexadecimal numbers with prefix "16#" as in Postscript.
Adds support for hexadecimal numbers with prefix "16#r" as in Common Lisp.
Adds support for hexadecimal numbers with prefix "16#" and suffix "#" as in ADA.
The look of the GUI has been improved.

3.12

Adds support for decimal numbers preceded by "v".
Adds support for hexadecimal numbers preceded by "$".

3.11

38 missing characters were added to the set from which diacritics are stripped by uni2ascii with the -d flag.
A bug was fixed in which U+013C LATIN SMALL LETTER L WITH CEDILLA was mapped to upper case L rather than lower case l by uni2ascii with the -d flag.

3.10

Fixed bug in ascii2uni in which -G command line flag was rejected, fixing Debian bug #401084.
Fixed bug in ascii2uni and uni2ascii in which a missing argument to the -Z flag produced an error message incorrectly identifying the error as an invalid flag, fixing Debian bug #401084..
Fixed a bug in the -J format of uni2ascii in which the first hex digit was omitted, fixing Debian bug #401084.
Adds support for Common Lisp format hexadecimal numbers, e.g. #x00E9.
If no Unicode Replacement Characters were emitted, nothing is said about them in the informational message at the end of the run.
Information is now printed about individual ill-formed HTML entities missing their final semi-colon.

3.9.5

Fixes a bug in uni2ascii in which a space was not added after space characters (as per -s option) and newlines (as per -n option) when the -w flag was given for UTF-32 formats.

3.9.4

Fixes a bug in uni2ascii in which a read interrupted in the middle of a UTF-8 sequence was incorrectly treated as truncated. Thanks to Dylan Thurston for the patch.

3.9.3

Corrects errors in ascii2uni manual page, fixing Debian bug #367546.
Adds information to both manual pages about default ASCII format.

3.9.2

This release fixes a bug in ascii2uni that produces incorrect results in impure mode conversions of standard hex (-X option). The fix does not work outside the BMP, that is, for hex values above 0xFFFF. A more general fix will be made available shortly.

3.9

Two bugs were fixed in uni2ascii in which the -f option erroneously converted 9 to y and Z to a.
The -f option of uni2ascii now converts superscript and subscript forms to their ASCII equivalents.

3.8

A bug in u2a that reversed the value of the switch for converting ASCII characters in going from Unicode to ASCII was fixed.
Uni2ascii now reports the total number of characters processed and the number actually converted.
Fixed miscellaneous bugs in u2a in the reporting of the number of characters converted, replaced, etc.

3.7

Various errors in the handling of small capitals with the -f option were corrected. Some that had been omitted have been added. Small caps are all now changed to the corresponding plain lower case letter as per the Unicode classification. Previously, they were changed to the corresponding plain upper case letter. One small capital letter that was changed to the wrong plain letter was fixed.
The characters expanded by the -x option now included the ellipses U+2026 … and U+22EF ⋯ and the arrows U+2190 ← U+2192 →, U+21D0 ⇐, and U+21D2 ⇒.
The characters replaced by an approximate ASCII equivalent with the -e option now include the union symbol U+222A ∪.

3.6

The three formats used in POSIX portable charmap files (octal byte, e.g. \115\141\171, hex byte, e.g. \x4d\x61\x79, and decimal byte, e.g. \d77\d97\d121) are now supported.

3.5.2

This patch fixes a bug that triggered an error in u2a when neither TMP nor TEMP is defined as an environment variable. This would usually be under MS Windows, but to my surprise these variables are unset on some Unix systems as well.

3.5

The manual pages have been expanded and several typos corrected.
Several typos in the usage messages have been corrected.
A popup in u2a now lists the approximate replacments performed if the -e option is given.
The handling of invalid Microsoft-style HTML character entities and numeric character references has been improved.

3.4

A warning is now issued by ascii2uni if invalid Microsoft-style HTML character entities or numeric character references lacking the final semi-colon are detected.
Ascii2uni now provides conversion of HTML character entities in pure mode.
The numbers of Microsoft-style tokens converted and number of unrecognized character entities replaced with the Unicode replacement character is now reported by u2a along with the number of tokens converted.
Some typos in the manual pages were corrected.

3.3

The SGML hexadecimal and decimal numeric character reference formats are now supported.
The color-scheme of the GUI has been improved.
A link to the Unicode Consortium web site has been added to the Help menu.
A bug that interfered with links from the Bug Reports page has been fixed.
The Expand button is now disabled when converting from ASCII to Unicode.

3.2

The format \uN with decimal N used in RTF is now supported.
An option has been added to uni2ascii that expands certain single Unicode characters into a sequence of plain ASCII characters. For example, German eszet may be expanded to ss.
The options for replacing non-ASCII characters with related ASCII characters have been extended to apply to the UTF-8 output formats.
A bug that interfered with the -G option of uni2ascii has been fixed.

3.1

Options have been added to uni2ascii for converting non-ASCII characters to related ASCII characters instead of to a textual representation. One option converts stylistic variants (e.g. boldface). A second converts characters enclosed in circles or parentheses to the unenclosed character. A third option removes diacritics. A fourth option converts functional equivalents, such as dashes of various lengths to hyphen.

3.0

An optional graphical user interface u2a has been added.

2.8

The option (-Q) is now available in uni2ascii of converting Unicode to HTML character entities where possible.
In uni2ascii the -q option now works.

2.7

The format X'xxxx', consisting of a a hexadecimal number within single quotes (apostrophes) with the prefix X, was added to both programs.

2.6

The code has been cleaned up and autoconfiscated. This minimizes portability problems and simplifies installation.

2.5

The format \ooo, consisting of a backslash followed by three octal digits, where ooo is the octal representation of one UTF-8 byte, was added to both programs.

2.4

Two formats =XX and %XX, where XX is the hexadecimal representation of one UTF-8 byte, were added to both programs.

2.3

The ability to convert HTML character entities was added to ascii2uni.
A command line flag was added to ascii2uni to convert all three types of HTML escape.

2.2

Added the command line flag -q to suppress chat.
Added four more formats: <U00E9>, U00E9, u00E9, U+00E9
Both programs now provide a command-line option for defining the input/output format directly.

2.1.1

Patches a bug in uni2ascii that caused a segmentation fault if the program was called with no command line arguments. The problem was that the initialization of the conversion format to the default had been omitted.

2.1

uni2ascii now offers three new output formats: \x-escapes, \x-escapes with braces, and \u-escapes within the BMP but \U- beyond the BMP.
uni2ascii now offers a choice of upper- or lower-case a-f in hexadecimal output formats.
The program ascii2uni has been added. This program is the inverse of uni2ascii. It generates UTF-8 Unicode from 7-bit ASCII files containing various escapes for non-ASCII characters.

2.0

This version replaces the original Python program with a C program that generates four additional major types of output with a further 8 variants for each. The C program is also 20 times faster, has better error-reporting, and handles the entire Unicode range rather than just the BMP.