Tuesday 21 June 2016

How to Convert Byte array to String in Java with Example

There are multiple ways to convert a byte array to String in Java but most straight forward way is to use the String constructor which accepts a byte array i.e. new String(byte []) , but key thing to remember is character encoding. Since bytes are binary data but String is character data, its very important to know the original character encoding of the text from which byte array has created. If you use a different character encoding, you will not get the original String back. For example, if you have read that byte array from a file which was encoded in "ISO-8859-1" and you have not provided any character encoding while converting byte array to String using new String() constructor then its not guaranteed that you will get the same text back? Why? because new String() by default uses platform's default encoding (e.g. Linux machine where your JVM is running), which could be different than "ISO-8859-1". If its different you may see some garbage characters or even different characters changing the meaning of text completely and I am not saying this by reading few books, but I have faced this issue in one of my project where we are reading data from database which contains some french characters. In the absent of any specified coding, our platform was defaulted on something which is not able to convert all those special character properly, I don't remember exact encoding. That issue was solved by providing "UTF-8" as character encoding while converting byte array to String. Yes, there is another overloaded constructor in String class which accepts character encoding i.e. new String(byte[], "character encoding").

BTW, if you are new in the world of character encoding and don't understand what is UTF-8 or UTF-16, I recommend you to read my article
 difference between UTF-8, UTF-16 and UTF-32 encoding. That will not only explain difference but also give you some basic idea about character encoding. Another article, I recommend you to read is about how Java deals with default character encoding. Since many classes which performs conversion between bytes and character cache character encoding, its important to learn how to provided proper encoding at JVM level. If this interests you then here is the link to full article.





How to convert byte array to String in Java
Everything is 0 and 1 in computers world, yet we are able to see different things e.g. text, images, music files etc. The key to convert byte array to String is character encoding. In simple word, byte values are numeric values and character encoding is map which provide a character for a particular byte for example in most of character encoding scheme e.g. UTF-8, if value of byte is 65, character is A, for 66 it's B. Since ASCII character which includes, numbers, alphabets and some special characters are very popular they have same value in most of encoding scheme. But that's not true for every byte value for example -10 can be different in UTF-8 and Windows-1252 encoding scheme. Now some one can question that, since byte has 8 bits, it can only represent maximum 255 characters, which is quite less given so many languages in the world. That's why we have multi byte character encoding schemes, which can represent a lot many characters. Why we need to convert bytes to String? one real world example is to display base 64 encoded data as text. In order to do that you need to convert byte array to hex String as shown in that tutorial.




Java Byte Array to String Example
Byte array to String in Java with Example
 Now we know little bit of theory about how to convert byte array to String, let's see a working example. In order to make the example simple, I have created a byte array on the program itself and then converted that byte array into String using different character encoding e.g. cp1252, which is default character encoding in Eclipse, windows1252 another popular encoding in Windows and UTF-8, which is a default standard character encoding in world. If you run this program and look at the output you will notice that most of the characters are same in all three encoding, they are mostly ASCII characters containing alphabets in both upper and lower case and numbers, but special characters are rendered differently. This is where using incorrect character encoding can create trouble. Rest of the example is pretty straight forward as we already have a byte array and we are just using overloaded String constructor which also accepts encoding. For a more complex example, where we read content from an XML file, see this tutorial. There are also printable and non-printable characters in ASCII, which is handled differently by different character encoding.


import java.io.UnsupportedEncodingException;

public class ByteArrayToStringDemo {

    public static void main(String args[]) throwsUnsupportedEncodingException {
      
        byte[] random = new byte[] { 6765706966656669-20};
      
        String utf = new String(random, "UTF-8");
        String cp1252 = new String(random, "Cp1252");
        String windows1252 = new String(random, "Windows-1252");
    
        System.out.println("String created from byte array in UTF-8 encoding : " + utf);
        System.out.println("byte array to String in Cp1252 encoding : " +cp1252);
        System.out.println("byte array to String in Windows-1252 encoding : " + windows1252);

    }

}

Output :
String created from byte array in UTF-8 encoding : CAFEBABE?
byte array to String in Cp1252 encoding : CAFEBABEì
byte array to String in Windows-1252 encoding : CAFEBABEì


That's all about
 how to convert byte array to String in Java. Always provide character encoding while converting bytes to character and that should be the same encoding which is used in original text. If you don't know then UTF-8 is good default but don't rely on platform's default character encoding because that is subject to change and might not be UTF-8. Better option is to set character encoding for your application at JVM level to have complete control on how byte array gets converted to String.

No comments:

Post a Comment