Package com.google.common.base
Class Utf8
- java.lang.Object
- 
- com.google.common.base.Utf8
 
- 
 @Beta @GwtCompatible public final class Utf8 extends java.lang.Object Low-level, high-performance utility methods related to the UTF-8 character encoding. UTF-8 is defined in section D92 of The Unicode Standard Core Specification, Chapter 3.The variant of UTF-8 implemented by this class is the restricted definition of UTF-8 introduced in Unicode 3.1. One implication of this is that it rejects "non-shortest form" byte sequences, even though the JDK decoder may accept them. - Since:
- 16.0
 
- 
- 
Method SummaryAll Methods Static Methods Concrete Methods Modifier and Type Method Description static intencodedLength(java.lang.CharSequence sequence)Returns the number of bytes in the UTF-8-encoded form ofsequence.static booleanisWellFormed(byte[] bytes)Returnstrueifbytesis a well-formed UTF-8 byte sequence according to Unicode 6.0.static booleanisWellFormed(byte[] bytes, int off, int len)Returns whether the given byte array slice is a well-formed UTF-8 byte sequence, as defined byisWellFormed(byte[]).
 
- 
- 
- 
Method Detail- 
encodedLengthpublic static int encodedLength(java.lang.CharSequence sequence) Returns the number of bytes in the UTF-8-encoded form ofsequence. For a string, this method is equivalent tostring.getBytes(UTF_8).length, but is more efficient in both time and space.- Throws:
- java.lang.IllegalArgumentException- if- sequencecontains ill-formed UTF-16 (unpaired surrogates)
 
 - 
isWellFormedpublic static boolean isWellFormed(byte[] bytes) Returnstrueifbytesis a well-formed UTF-8 byte sequence according to Unicode 6.0. Note that this is a stronger criterion than simply whether the bytes can be decoded. For example, some versions of the JDK decoder will accept "non-shortest form" byte sequences, but encoding never reproduces these. Such byte sequences are not considered well-formed.This method returns trueif and only ifArrays.equals(bytes, new String(bytes, UTF_8).getBytes(UTF_8))does, but is more efficient in both time and space.
 - 
isWellFormedpublic static boolean isWellFormed(byte[] bytes, int off, int len)Returns whether the given byte array slice is a well-formed UTF-8 byte sequence, as defined byisWellFormed(byte[]). Note that this can be false even whenisWellFormed(bytes)is true.- Parameters:
- bytes- the input buffer
- off- the offset in the buffer of the first byte to read
- len- the number of bytes to read from the buffer
 
 
- 
 
-