On ByteBuffers

Java’s Buffer API is one of the most commonly misused APIs. Buffers encapsulate the byte[] buffer, int offset, int length tuple which is common in C and Java libraries. It is fairly easy to use Buffers in a manner that works, but will lead to problems in the future.

Buffers have three index fields that describe how the Buffer is to be used:

  • Capacity - the number of elements the Buffer contains, which cannot be changed after creation

  • Position - the index of the next element to be read or written, i.e. the start of the area to be interacted with.

  • Limit - the index of the first element that should not be read or written, i.e. the end of the area to be interacted with.

A Buffer can either be direct or non-direct. Direct Buffers are backed by an off-heap non-garbage-collected array. Non-direct buffers are backed by a garbage-collected array.

Methods Should Respect The Area of Interaction

One common pitfall is the incorrect use of the capacity, position and limit fields. For example, here’s a method that (incorrectly) gets the contents of a ByteBuffer as a byte array.

static byte[] getContents(ByteBuffer byteBuffer) {
  byteBuffer.clear();
  byte[] data = new byte[byteBuffer.capacity()];
  byteBuffer.get(data);
  return data;
}

The clear() method used here sets the Buffers position to zero and its limit to its capacity, but does not clear any data. If there was any data in the buffer that should not have been returned, this method disregards the boundaries set by the caller, and reads all of the data. It’s also fairly common to see the Buffer’s position used as the end of the area, instead of the beginning.

static byte[] getContents(ByteBuffer byteBuffer) {
  byte[] data = new byte[byteBuffer.position()];
  byteBuffer.position(0);
  byteBuffer.get(data);
  return data;
}

This is a fairly easy mistake due to how Buffer write operations work. Each write operation updates the Buffer’s position to be right after the data being written. After a set of write operations, the Buffer’s position would be at the end the area to be read. The correct usage is to update the position and limit before passing the Buffer to the a method.

static byte[] getContents(ByteBuffer byteBuffer) {
  byte[] data = new byte[byteBuffer.remaining()];
  byteBuffer.get(data);
  return data;
}

static void myCallingMethod() {
  ByteBuffer myBuffer = ByteBuffer.allocate(Double.BYTES);
  myBuffer.putDouble(Math.PI);
  myBuffer.flip();
  byte[] data = getContents(myBuffer);
}

The remaining() method returns the number of elements between the position and the limit, which is the size of the area to be interacted with. The flip() method used in myCallingMethod() is update the position and limit so the Buffer’s contents can be read. It sets the limit to the position, and the position to zero.

Methods Should Document Their Updates

Any method that interacts with a Buffer should document any changes it makes to the Buffer as part of its API contract. This is not at technical limitation, but reduces cognitive load on the users of the API. For example, previous getContents method could be documented as

/**
  * Gets the contents of a ByteBuffer as a byte array. The Buffer's position will be
  * set to its limit.
  */
static byte[] getContents(ByteBuffer byteBuffer) { ... }

If a caller of the method did not want the Buffer’s position and limit to be updated, they can use the duplicate() method to prevent the method from modifying the Buffer’s position and limit.

static void myCallingMethod() {
  // ...
  byte[] data = getContents(myBuffer.duplicate());
}

The duplicate() method returns a new Buffer that has the same backing content, but an independent position and limit.

Buffers Are Always Mutable

Buffers can be made read-only (via the asReadOnlyBuffer() method) prevent modification of the backing data. Since read and write operations on Buffers update their position, all Buffers are still always mutable, regardless of if the Buffer is read-only. For example:

public class BufferSupplier {
  private final ByteBuffer buffer = ByteBuffer.wrap("Howdy".getBytes())
                                              .asReadOnlyBuffer();
  public ByteBuffer getBuffer() {
    return buffer;
  }
}

static void myCallingMethod() {
  byte[] data = getContents(bufferSupplier.getBuffer());
}

This will all work fine - once. Since getContents() updates the position of the buffer, the Buffer is now empty, and subsequent uses of the Buffer will have no data. Since Buffers are always mutable, any method that is supposed to return a Buffer with a fixed content must return a new Buffer with independent position and limit. Conveniently, there is the duplicate() method:

public ByteBuffer getBuffer() {
  return buffer.duplicate();
}

Don’t Abuse The array() Method

The Buffer API has an array() method, which returns the Buffer’s backing array. This method is an optional operation, which throws an exception if the Buffer does not support the method. Many types of Buffers do not support the array() method.

  • Direct buffers - they use off-heap memory, so there is no backing Java array

  • Read-only buffers - allowing access to the array would allow modification

  • “View” buffers - methods like ByteBuffer.asCharBuffer() return a Buffer of a different type with the same backing data, so the array would be the wrong type

The desire to use the array() method usually comes from one of two places, it’s easier, or it’s faster. For example, the getContents() method could be (incorrectly) implemented as:

static byte[] getContents(ByteBuffer byteBuffer) {
  return byteBuffer.array();
}

That sure was easier, but there are quite a few problems with this implementation.

  • It does not work for all ByteBuffers (ones that don’t support the array() method)

  • It does not respect the area of interaction

  • It does not update the position (assuming it was documented as such)

  • It returns the same backing array as the Buffer, which could be undesirable if you want to modify the array and buffer independently in the future.

We can fix these problems (excluding the first), but it greatly reduces the how “easy” the array() method is to use.

static byte[] getContents(ByteBuffer byteBuffer) {
  byte[] data = new byte[byteBuffer.remaining()];
  byte[] array = byteBuffer.array();
  System.arraycopy(array, byteBuffer.position(), data, 0, byteBuffer.remaining());
  byteBuffer.position(byteBuffer.limit());
  return data;
}

This method still has one subtle problem, array backed Buffers do not necessarily start at index zero in the array. If you use the ByteBuffer.wrap(byte[] array, int offset, int length) method or the Buffer.slice() method, the data in the Buffer can start at a non-zero offset. The correct implementation would be:

static byte[] getContents(ByteBuffer byteBuffer) {
  byte[] data = new byte[byteBuffer.remaining()];
  byte[] array = byteBuffer.array();
  System.arraycopy(array, 
                   byteBuffer.arrayOffset() + byteBuffer.position(),
                   data,
                   0, 
                   byteBuffer.remaining());
  byteBuffer.position(byteBuffer.limit());
  return data;
}

This is what the non-direct ByteBuffer.get() method does under the hood, so there should be no real performance benefit, it’s significantly more difficult to write, and it still doesn’t work for all Buffers. For this type of operation, sticking with the built-in read methods is a much better option.

There are cases when using the array() method may be appropriate, but most of these cases would likely fall under premature optimization until proven otherwise. If the method is widely used (like in a library) it may be beneficial to make these optimizations so that they can benefit all users. For example, a utility method that writes a ByteBuffer to an OutputStream could be written as:

static void writeTo(ByteBuffer byteBuffer, OutputStream outputStream)
    throws IOException {
  outputStream.write(getContents(byteBuffer));
}

There is going to be some performance overhead due to the allocation and copying of data to the intermediate byte array. There could also be significant garbage collection and memory impact if the Buffer is large. By using the array() method this overhead can be optimized away.

static void writeTo(ByteBuffer byteBuffer, OutputStream outputStream)
    throws IOException {
  if (byteBuffer.hasArray()) {
    outputStream.write(byteBuffer.array(), 
                       byteBuffer.arrayOffset() + byteBuffer.position(),
                       byteBuffer.remaining());
    byteBuffer.position(byteBuffer.limit());
  } else {
    outputStream.write(getContents(byteBuffer));
  }
}

Utilities

With so many potential pitfalls, it would be useful to be able to unit test Buffer-using methods against various valid but potentially non-standard Buffer layouts. There is a small GitHub repository which can assist. It provides a set of Buffer factories which can be used during unit tests. For example, with JUnit 5, you could use a parameterized test

@ParameterizedTest
@MethodSource("com.brandontoner.ByteBufferFactory#allFactories")
void getContents_isCorrect(final ByteBufferFactory factory) {
  byte[] array = "A Test String".getBytes();
  assertArrayEquals(array, getContents(factory.copyOf(array)));
}