Libuninum

NumberConverter

Contents

  1. News
  2. Description
  3. Details
  4. Environment
  5. Documentation
  6. Downloads
  7. Change Log
  8. Bugs
  9. Roadmap

News

Version 2.7 adds support for Kayah Li, Lepcha, Ol Chiki, Saurashtra, Shan, Sundanese, and Vai. Full width characters are now accepted in Western numbers.

Description

This is a library for converting Unicode strings to numbers and numbers to Unicode strings. Standard functions like strtoul, strtod, and sprintf do this for numbers written in the usual Western number system using the Indo-Arabic numerals, but they do not handle other number systems. The main functions take as input a UTF-32 Unicode string and compute the corresponding unsigned integer. For example, they will convert the Chinese string 五十九万四千三百二十一 to the integer 594,321 and the Devanagari string ७८४९२ to the integer 78,492. Internal computation is done using arbitrary precision arithmetic, so there is no limit on the size of the integer that can be converted.

The value of the string is returned in one of three forms. One option is a string of ASCII characters containing the decimal representation of the integer using the Indo-Arabic digits. This option has the virtue of avoiding any possibility of overflow or truncation. The second option is to obtain the value as a GNU MP mpz_t object. This is only useful if you are going to do further computation using GNU MP. The final option is to obtain the value as an unsigned long integer. If you are going to do internal calculations, this is probably the most convenient option, but some numbers (in fact, infinitely many) will not fit into an unsigned long integer. The library guarantees that no overflow or truncation will occur; if the number will not fit, it sets an error flag and returns 0.

An inverse function accepts as input an unsigned long integer, an mpz_t object, or an ASCII decimal string and converts it to a Unicode string in a selected number system.

If you use the library, I would be interested in knowing what you are using it for. My own application is in my sort utility msort.

In addition to the library, the command-line program numconv is provided both as an example of use of the library and as a utility possibly of use in its own right. In addition to the number system conversions that are its main use, numconv provides a convenient way to delimit numbers generated by other programs without delimitation or with delimitation inappropriate for the locale. To do this, set both input and output to Western numbers and either set the output delimitation parameters directly on the command line or use the -L flag to obtain them from the locale. For example, both:

echo "123456789" | numconv -f Western_Lower -t Western_Lower -g 2 -G 3 -s ' '

and

echo "123,456,789" | numconv -f Western_Lower -t Western_Lower -g 2 -G 3 -s ' '

will produce the output:

12 34 56 689
which might be appropriate in an Indian locale.

There is also a graphical number converter, NumberConverter, which performs a similar function to numconv.

The number systems currently supported (with some variants omitted) are the following. (Unless you have an unusually comprehensive set of fonts, your brower will not display all of them.)

Aegean𐄝𐄓𐄌
Arabic٥٤٦
Arabic Alphabeticث​م​و
Armenian AlphabeticՇԽԶ
Balinese᭕᭔᭖
Bengali / Assamese৫৪৬
Burmese၅၄၆
Chinese五百四十六
Chinese Accounting伍佰肆拾陸
Chinese Counting Rods𝍤𝍬𝍥
Chinese Place五四六
Chinese Suzhou〥〤〦
Common Braille⠑⠙⠋
Cyrillic AlphabeticФМЅ
Devanagari (Hindi, Marathi, Sanskrit)५४६
Egyptian (hieroglyphic)𔌻𔌻𔌻𔌻𔌻 𔍓𔍓𔍓𔍓 𔎡𔎡𔎡𔎡𔎡𔎡
Ethiopic፭፻፬፲፮
Ewellic Decimal
Ewellic Hexadecimal`
French/Czech Braille⠱⠹⠫
Georgian (Mxedruli)ფმვ
Georgian (Xucuri)ႴႫႥ
Glagolitic AlphabeticⰗⰍⰅ
Greek AlphabeticΦΜϚ
Gujarati૫૪૬
Gurmukhi੫੪੬
Hebrewרתמו
Hexadecimal0x222
Kannada೫೪೬
Kayah Li꤅꤄꤆
Kharoshthi‭𐩀𐩀𐩃𐩅𐩅𐩆𐩀𐩃
Khmer៥៤៦
Klingon
Lao໕໔໖
Lepcha᱅᱄᱆
Limbu᥋᥊᥌
Malayalam൫൪൬
Mongolian᠕᠔᠖
New Tai Lue᧕᧔᧖
Nko߅߄߆
Ol Chiki᱕᱔᱖
Old Italic𐌣𐌣𐌣𐌣𐌣𐌣𐌣𐌣𐌣𐌣​𐌢𐌢𐌢𐌢​𐌡 𐌠
Old Persian𐏕𐏕𐏕𐏕𐏕 𐏔𐏔 𐏒𐏒𐏒
Oriya୫୪୬
Osmanya𐒥𐒤𐒦
Perso-Arabic۵۴۶
Phoenician𐤙𐤙𐤙𐤙𐤙​𐤘𐤘𐤘𐤘​𐤖𐤖𐤖𐤖𐤖𐤖
Roman numeralsDXLVI
Russian Braille⠢⠲⠖
Saurashtra꣕꣔꣖
Shan႕႔႖
Sinhala෫෾෸෬
Sundanese᮵᮴᮶
Tamil Place௫௪௬
Tamil Traditional௫௱௪௰௬
Telugu౫౪౬
Tengwar (mortal)
Tengwar (Elvish)
Thai๕๔๖
Tibetan༥༤༦
Vai꘥꘤꘦
Verdurian
Western546

Ewellic, Klingon, Tengwar, and Verdurian do not have official Unicode encodings. The library assumes that they are encoded in the Private Use Area in accordance with the encodings registered with the Conscript registry. Kayah Li, Lepcha, Ol Chiki, Saurashtra, Shan, Sinhala, Sundanese, and Vai are encoded according to the not-quite-final draft of Unicode 5.1.

In some cases, both traditional non-place based systems and their modern place-based counterparts are supported. In addition to the specialized Counting Rod and Suzhou numbers, a total of fifteen variants of the "ordinary" Chinese numbers are supported.

The basic interface is from C but a Tcl interface is also provided.


Back to Top

Details

LanguageC, Tcl
DependenciesGMP arbitrary precision arithmetic library
Current version2.7
Last modified2007-12-08
LicenseGNU Lesser General Public License

Environment

The GNU arbitrary precision arithmetic package GMP is required. The library should work on any POSIX-compliant system on which GMP is available, which means just about any POSIX-compliant system. Kernels on which it is reported to work include: FreeBSD, Linux, Mac OS X, OpenBSD. I would appreciate reports of success or failure on other systems.

The installation process seems not to work properly on OpenBSD. First, the configure script may not detect the presence of GNU MP, even if it is properly installed. Second, the -I and -L flags need to be given to gcc but are not automatically added to the makefile by autoconf. I haven't yet figured out how to make things work automatically on OpenBSD. If you don't know either, please bear with me. If you do know, you might tell me.


Back to Top

Documentation

Numconv has a manual page. For the library, for the time being, consult the README files and the sample programs in the Examples directory, as well as numconv.c.

Downloads

Source

libuninum-2.7.tar.gz

libuninum-2.7.tar.bz2

libuninum-2.7.zip

If you would like to be notified of new releases, subscribe to libuninum at Freshmeat.

Packages

Debian
Debian packages
Fedora Core
RPMs
FreeBSD
Freshport
Mac OS X
Mac OS X
Redhat
RPMs
T2
T2

Back to Top

Changes

2.7

2.6


Full Change Log


Known Bugs

The conversion of Ethiopic strings to numbers is buggy and so has been temporarily disabled. A corrected version is under construction.


Roadmap

Back to Top

Back to Bill Poser's software page.