|
| |
| | Unicode HOWTO |
 | | Unicode code points 0-255 are identical to the Latin-1 values, so converting to this encoding simply requires converting code points to byte values; if a code point larger than 255 is encountered, the string can't be encoded into Latin-1. |  | | Unicode character U+FEFF is used as a byte-order mark (BOM), and is often written as the first character of a file in order to assist with autodetection of the file's byte ordering. |  | | To summarize the previous section: a Unicode string is a sequence of code points, which are numbers from 0 to 0x10ffff. |
|
http://www.amk.ca/python/howto/unicode
(4145 words)
|
|
| |
| | UTF-8 and Unicode FAQ |
 | | FriBidi is Dov Grobgeld’s free implementation of the Unicode bidi algorithm. |  | | Unicode database) is now also available, which is implemented by just overstriking (logical OR-ing) a base-character glyph with up to two combining-character glyphs. |  | | UTF-32 was introduced in Unicode to describe a 4-byte encoding of the extended “21-bit” Unicode. |
|
http://www.cl.cam.ac.uk/~mgk25/unicode.html
(14389 words)
|
|
| |
| | ongoing · On the Goodness of Unicode |
 | | There's a lot of history behind this simple label; Unicode proper is a consortium of technology vendors that, many years ago in a flash of intelligence and public-spiritedness, decided to unify their work with that going on at the ISO. |  | | Encodings &; From Unicode's point a view, text is stored on a computer as a series of numbers, one per character. |  | | Unicode itself defines several different encoding schemes, the two best known of which are UTF-8 and UTF-16. |
|
http://www.tbray.org/ongoing/When/200x/2003/04/06/Unicode
(2094 words)
|
|
| |
| | Test: Unicode |
 | | Unicode (as of version 3.2, early 2002) differentiates between this koppa (epigraphical) and the numerical koppa; this does not affect the TLG data bank, since the Beta Code distinction between the two koppas is no longer observed. |  | | Since polytonic Greek is a relatively low priority in the computer industry, and the proper handling of Unicode diacritics is still incipient (requiring sophisticated font engines like OpenType still not widely available), it is safer to use the precomposed characters. |  | | Case in numerals is currently distinguished in Aisa (but not for sampi), Alphabetum, Antioch, Aristarcoj, Athena and New Athena Unicode (but not in Athena for Q-like Koppa), Cardo, Code 2000, FreeMono (only for stigma), Galilee Unicode Gk (only for sampi and koppas), Galatia SIL (not for Q-like Koppa), Lucida Grande, TITUS Cyberbit. |
|
http://www.tlg.uci.edu/help/UnicodeTest.html
(2311 words)
|
|
| |
| | 9.4. Unicode |
 | | Unicode now has been extended to handle ancient Chinese, Korean, and Japanese texts, which had so many different characters that the 2-byte unicode system could not represent them all. |  | | To solve these problems, unicode represents each character as a 2-byte number, from 0 to 65535. |  | | method, available on every unicode string, to convert the unicode string to a regular string in the given encoding scheme, which you pass as a parameter. |
|
http://diveintopython.org/xml_processing/unicode.html
(1508 words)
|
|
| |
| | ongoing · Characters vs. Bytes |
 | | This is the first of a three-part essay on modern character string processing for computer programmers. |  | | Here I explain and illustrate the methods for storing Unicode characters in byte sequences in computers, and discuss their advantages and disadvantages. |  | | UTF · Along with the characters, Unicode also defines methods for storing them in byte sequences in a computer. |
|
http://www.tbray.org/ongoing/When/200x/2003/04/26/UTF
(2675 words)
|
|
| |
| | Character sets |
 | | One possible way to create a string with Unicode hex values is to create a regular string and then coerce it to a Unicode string (while paying attention to byte order). |  | | Unicode is essentially a superset of every Windows ANSI, Windows DBCS and DOS OEM character set. |  | | In particular, add code to map data "at the boundary" to and from Unicode using the Win32 functions WideCharToMultiByte and MutliByteToWideChar, or using the C run-time functions mbtowc, mbstowcs, wctomb, and wcstombs. |
|
http://www.microsoft.com/typography/unicode/cs.htm
(2428 words)
|
|
| |
| | XML.com: Unicode Secrets |
 | | Poor understanding of Unicode is probably the biggest obstacle users face when trying to learn how to process XML, and Python users are no exception. |  | | Again, you need to grab only the first item from the encode function's return value, the second of which is the number of characters that were encoded in the given Unicode object. |  | | This is primarily an admonition for XML API designers, but it also applies to users because many API's allow you to pass in strings or Unicode objects interchangeably. |
|
http://www.xml.com/pub/a/2005/05/18/unicode.html
(2056 words)
|
|
| |
| | Unicode in XML and other Markup Languages |
 | | The issues of using Unicode characters with marked-up text depend to some degree on the rules of the markup language in question and the set of elements it contains. |  | | As a result, fewer Unicode implementations support these characters, than would be the case otherwise. |  | | See Unicode Technical Report #9, The Bidirectional Algorithm [UAX 9]. |
|
http://www.w3.org/TR/unicode-xml
(6853 words)
|
|
| |
| | Sacred-texts.com: Unicode |
 | | This solves a major problem for creators of etexts, as it is now possible to fully transcribe texts in multiple languages without requiring ASCII transliterations, special fonts or browsing software. |  | | The major version 4 and up browsers support Unicode if you have a decent Unicode font installed, provided you designate that font as your default font. |  | | This is a variable-length binary compression scheme which encodes Unicode efficiently. |
|
http://www.sacred-texts.com/unicode.htm
(1348 words)
|
|
| |
| | Biblical Language Fonts and Unicode |
 | | Galilee Unicode Gk differs from other Greek font offerings in that it is intended and optimized for legibility in reading on screen and for video projection rather than for printed materials. |  | | These pages use CSS and Unicode UTF-8 encoding; most Greek text is now in Unicode format, though some remnants of the older, non-standard Galilee encoding remain. |  | | Unicode fonts for Macintosh OS X computers (Alan Wood) |
|
http://faculty.bbc.edu/RDecker/unicode.htm
(2330 words)
|
|
| |
| | IPA transcription in Unicode |
 | | There is also another version, with no font specified, that you can use to test fonts. |  | | You must be running Windows 95 or later, or, on a Macintosh, the System X browser Safari; (otherwise, and for Unix or Linux, see advice from the Unicode site) |  | | August 2002: Microsoft has removed the Arial Unicode MS Font for Publisher 2000 free download. |
|
http://www.phon.ucl.ac.uk/home/wells/ipa-unicode.htm
(675 words)
|
|
| |
| | Unicode fonts for Windows computers - Page 1 |
 | | The following list of Unicode fonts is probably not comprehensive, it is just the ones that I have acquired with various operating systems and applications, or found while learning about Unicode from the Web. |  | | There are even shareware Unicode fonts, such as Code2000. |  | | You can find out if your Windows fonts support Unicode by using the extensions that Microsoft supplies for the Properties tab that is available when a TrueType (.TTF) font file is right-clicked in Windows Explorer. |
|
http://www.alanwood.net/unicode/fonts.html
(6350 words)
|
|
| |
| | What is Unicode? - A Word Definition From the Webopedia Computer Dictionary |
 | | Many analysts believe that as the software industry becomes increasingly global, Unicode will eventually supplant ASCII as the standard character coding format. |  | | You are in the: Small Business Computing Channel |  | | This is a bit of overkill for English and Western-European languages, but it is necessary for some other languages, such as Greek, Chinese and Japanese. |
|
http://www.webopedia.com/TERM/U/Unicode.html
(186 words)
|
|
| |
| | Unicode™ : Java Glossary |
 | | Apparently these are all handled by using a separate font, with the same Unicode encodings. |  | | You can use lowly Notepad in Windows NT/W2K/XP to edit existing documents but not earlier Windows versions. |  | | In Java programs, intractable Unicode characters are represented in the form '\uffff', with four hex digits. |
|
http://mindprod.com/jgloss/unicode.html
(938 words)
|
|
| |
| | i18n/l10n: HTML - base character set |
 | | The IETF recomends in RFC 2277 that all (new) Internet protocols and formats that deal with text use the UCS, and in particular its |  | | Unicode and ISO/IEC 10646 are codepoint by codepoint identical and developed in close synchronization. |  | | The Unicode Standard is available as a book: |
|
http://www.w3.org/International/O-unicode.html
(139 words)
|
|
| |
| | Rosette Core Library for Unicode - Basis Technology Products |
 | | Rosette Core Library for Unicode - Basis Technology Products |  | | Rosette Core Library for Unicode (RCLU) enables software engineers to quickly add support for the world’s languages to their applications. |  | | Unicode is an international standard that provides a single encoding for all the world’s languages. |
|
http://www.basistech.com/unicode
(204 words)
|
|
| |
| | Vietnamese Professionals Society |
 | | If you can't read the Vps.org vietnamese section, you probably don't have it on your computer. |  | | This pages uses Tahoma and Verdana unicode fonts. |  | | Free Viet-Pali-Sanskrit Unicode fonts from Buddhas' Sasana (Binh Anson) |
|
http://www.vps.org/rubrique.php3?id_rubrique=73
(54 words)
|
|
| |
| | Unicode Home Page |
 | | Proposed Update to UAX #9 The Bidirectional Algorithm |  | | Proposed Update UTR #25 Unicode Support for Mathematics |  | | Proposed Update to UAX #34 Unicode Named Character Sequences |
|
http://www.unicode.org
(99 words)
|
|
| |
| | Unicode Support in Your Browser |
 | | Unicode is the World's standard for encoding text. |  | | Most all of the characters used in modern writing systems have already been assigned to unique code positions and work is under way to add some fairly exotic modern scripts as well as provide standardized encoding for ancient scripts. |  | | But, if you're using an older operating system, you may have tried to see some of those special characters and become fairly frustrated when your font viewer failed or totally choked-up. |
|
http://home.att.net/~jameskass
(479 words)
|
|
| |
| | Debian -- unicode |
 | | unicode is a simple command line utility that displays properties for a given unicode character, or searches unicode database for a given name. |
|
http://packages.debian.org/unstable/utils/unicode.html
(86 words)
|
|
| |
| | Akerbeltz.org - Unicode |
 | | Is e còdachadh gu tur eadar-dhealaichte a tha ann an Unicode. |  | | Lean ris na leanas agus bidh do choimpiutair comasach air Unicode a làimhseachadh an ceann 10 mionaid (nì e feum dhut ma clò-bhuileas tu an duilleag seo oir feumaidh tu do choimpiutair ath-tòiseachadh). |  | | Bu chóir dhà rannsachair-lìn duilleagan a tha ann an Unicode a dh'aithnicheadh, ach ma chì thu brochan nan litrichean air duillegan Fuaimean na Gàidhlig againn fhathast, nì na leanas: |
|
http://www.akerbeltz.org/fuaimean/unicode.htm
(514 words)
|
|
|