Java Forum

Ask Question   UnAnswered
Home » Forum » Java       RSS Feeds

converting farsi charset

  Asked By: Caleb    Date: Jul 26    Category: Java    Views: 5462

i want to convert farsi characters from iso-8859-1 to utf-8.
please help if you know something about it.



4 Answers Found

Answer #1    Answered By: Tara Ryan     Answered On: Jul 26

Is your ISO-8859-1 content stored in a file? If so, try this:

FileInputStream fis = new FileInputStream("iso8859fileName");
InputStreamReader isr = new InputStreamReader(fis, "ISO-8859-1");

FileOutputStream fos = new FileOutputStream("utf8fileName");
OutputStreamWriter osw = new OutputStreamWriter(fos, "UTF-8");

Answer #2    Answered By: Sam Anderson     Answered On: Jul 26

tnx for your help  but my ISO-8859-1 content stored in oracle 9i
database and i want to use it in my jsp web pages.

Answer #3    Answered By: Mehreen Malik     Answered On: Jul 26

String Class --> http://java.sun.com/j2se/1.3/docs/api/java/lang/String.html
useful methods for ur request :
public byte[] getBytes(String enc)
throws UnsupportedEncodingException
Convert this String into bytes according to the specified character encoding, storing the result into a new byte array.
enc - The name of a supported character encoding
The resultant byte array
UnsupportedEncodingException - If the named encoding is not supported
public String(byte[] bytes,
String enc)
throws UnsupportedEncodingException
Construct a new String by converting  the specified array of bytes using the specified character encoding. The length of the new String is a function of the encoding, and hence may not be equal to the length of the byte array.
bytes - The bytes to be converted into characters
enc - The name of a supported character encoding
UnsupportedEncodingException - If the named encoding is not supported

Character Encodings
Various constructors and methods in the java.lang and java.io packages accept string arguments that specify the character encoding to be used when converting between raw eight-bit bytes and sixteen-bit Unicode characters. Such encodings are named by strings composed of the following characters:
The uppercase letters 'A' through 'Z' ('\u0041' through '\u005a'),
The lowercase letters 'a' through 'z' ('\u0061' through '\u007a'),
The digits '0' through '9' ('\u0030' through '\u0039'),
The dash character '-' ('\u002d', HYPHEN-MINUS),
The colon character ':' ('\u003a', COLON), and
The underscore character '_' ('\u005f', LOW LINE).

An encoding name must begin with either a letter or a digit. The empty string is not a legal encoding name.
An encoding may have more than one name. One of an encoding's names is considered to be its canonical name. The canonical name of an encoding is the name returned by the getEncoding methods of the InputStreamReader and OutputStreamWriter classes.

Encoding names generally follow the conventions documented in RFC2278: IANA Charset Registration Procedures. If an encoding listed in the IANA Charset Registry is supported by an implementation of the Java platform then one of its names must be the name listed in the registry. Many encodings are given more than one name in the registry, in which case the registry identifies one of the names as MIME-preferred. An implementation of the Java platform must support the MIME-preferred registry name for a supported encoding if there is one; for convenience it may additionally support other registry names. The IANA MIME-preferred name of an encoding, if there is one, is often, but not necessarily, its canonical name. Following IANA convention, the mapping from IANA registry names to encodings is not case-sensitive.

Every implementation of the Java platform is required to support the following character encodings. Consult the release documentation for your implementation to see if any other encodings are supported.

US-ASCII Seven-bit ASCII, a.k.a. ISO646-US, a.k.a. the Basic Latin block of the Unicode character set
ISO-8859-1 ISO Latin Alphabet No. 1, a.k.a. ISO-LATIN-1
UTF-8 Eight-bit Unicode Transformation Format
UTF-16BE Sixteen-bit Unicode Transformation Format, big-endian byte order
UTF-16LE Sixteen-bit Unicode Transformation Format, little-endian byte order
UTF-16 Sixteen-bit Unicode Transformation Format, byte order specified by a mandatory initial byte-order mark (either order accepted on input, big-endian used on output)
The various Unicode Transformation Formats are described in detail in The Unicode Standard and in the Unicode FAQ.
Every instance of the Java virtual machine has a default character encoding. The default encoding is determined during virtual-machine startup and typically depends upon the locale and encoding being used by the underlying operating system.

Answer #4    Answered By: Daya Sharma     Answered On: Jul 26

you can convert  your iso-8859-1 String in your actionForm with this command:

name = new String(name.getBytes("iso-8859-1"), "utf-8") ;

Didn't find what you were looking for? Find more on converting farsi charset Or get search suggestion and latest updates.