I have found this method is best used when one is 1) entering text in English and either Cyrillic or an East European (non-Cyrillic) language, but not both and 2) entering sections of text, rather than just a character here or there. It is also only really feasible if your computer is set up to type in the language you are inserting.
- Open a new HTML document and set the proper character encoding for the page in the source code (See the section on Character Encoding below. For Cyrillic, this will generally be Windows-1251).
- Using the editor (not in the source code), type in the text you want to insert, having switched to the proper keyboard/keyboard driver for the language you are inserting (Russian, Polish, Bulgarian, etc.).
- You may be able to cut and paste Cyrillic (or other) text directly into your webpage editor. Be sure to change the character encoding first. If this doesn't work, you may want to try using Otpad to cut and paste (see the section on Software Options).
I have found that hand coding (using Unicode) is best used when entering individual characters (rather than several words or sentences) and when one is entering characters from a diverse set of scripts (Cyrillic, East European, Greek, etc.). Say I want to put my paper on Balkan linguistics online and have to represent an "S with Caron," a Cyrillic "sh" and a Greek "psi."
- I look up the hexadecimal code for each character on the code tables at the Unicode web-site.
- I find "S with Caron" is 0160, Cyrillic "sh" is 0428 and Greek psi is 03A8.
- In the source code I change the character encoding to Unicode (see the Character Encoding section below).
- I then enter Š , Ш and Ψ in the HTML source code
- This will produce Š, Ш and Ψ.
This approach will work to invoke any of the 40,000+ characters included in the Unicode standard.
Whether you are creating a document using an HTML editor or through hand-coding (or a combination of both), you will need to specify the character encoding in a META tag in the <HEAD> section of the document so that web browsers will know which character encoding your document is using and display properly.
In the <HEAD> section of your HTML document, add a tag (or modify the existing tag):
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=utf-8">
The critical bit is at the very end. In this example the code is for "utf-8" (Unicode). If you were typing in an HTML editor using Windows-based Cyrillic fonts, the proper tag would be "windows-1251"
Much of the text of this page was adapted from a presentation given by Andy Spencer, Slavic Bibliographer at the Univerisity of Wisconsin, at AAASS in 2003.