In Java data can be transferred between programs, from files to programs or from networks to programs using I/O streams. Streams can be written to or read from by programs. Streams can be pipes for data but can optionally transform their data. Streams can contain bytes, other primatives, localized characters or serialized objects.
The most basic stream types are byte streams, which inherit from the abstract classes
OutputStream. InputStream is the parent to a large number of byte stream classes as shown in the list of direct subclasses below:
Reading data from a
FileInputStream is straight-forward with the only complications being that exceptions are throw by the file handling process. The simplest program to read from a file is shown below – this will print the int values of the characters not the characters themselves. Note that the stream is not closed, potentially a resource leak so handle this with a finally block in production code or try with resources as below.
And a version with improved exception handling (although not logging for simplicity). There is no explicit call to close the stream here as the try with resources construct will auto close the stream on exiting the try block (hence the declaration of the IOException).
Aside: The try with resources will suppress the close() exception and throw the read() exception if errors occur on both unlike in a finally block where the close() exception will suppress the read() exception.
So far, so working but this would be an unusual use of I/O streams. The stream contains character data but the
read() method returns int values. The values could be cast to characters (assuming the file contains characters with no code points above 127 – see Joel’s old article for a reminder about Unicode which is untested in this program).
char c = (char) i;
Aside: that the LANG env property will give you the default encoding behaviour for files created by vim unless overriden by a setting in .vimrc or the use of :set fileencoding
The preferred method would be to use a stream that accounts for the character data. Character stream classes inherit from the abstract classes
Writer, and again have a number of concrete subclasses, for example
FileReader as used in the example below. In this code the stream’s read method returns int values that are based on the 2 byte char primitive rather than byte values. The constructor assumes the default encoding of the platform (in my case UTF-8). Note that the cast is still present but it will handle code points over 127 correctly for default encoded files.
If a file is supplied in a different encoding then the correct strategy will be to construct an
FileInputStream which is a byte stream specialized for reading from the file system and wrap it in an
InputStreamReader which converts the bytes provided by the FileInputStream to chars using the encoding specified in the constructor.
Note the use of the semi-colon separator in the resource set up – generally prefer this to anonymous construction for localisation of exceptions. Not much in it in this case as they throw different exceptions that have different handlers but may not always be the case.
Both byte and character streams have a method called
read() that acquires a small part of the stream at a time, usually delegated to the underlying O/S. If the aquisition of bytes or characters involves a network or disk access then this method of operation will be inefficient.
A buffered I/O stream resolves this problem by running a continuous set of read operations to fill a local area of memory (a byte array) called the buffer. Once the buffer is full then it is read to emptyness and then refilled in another continious set of I/O operations.
The wrapping idiom in the code above, where a byte stream (
FileInputStream) was wrapped by a character encoding stream
(InputStreamReader) is used to achieve buffering. Buffered streams are used to wrap byte or character streams.
Note the use of the
readLine() method rather than the
read() method. This will read “\r\n”, “\r” or “\n” as a line terminator.
The Scanner API
The stream readers above can read characters, bytes and lines but to customise tokenisation (i.e. to read words (whitespace separated) or any other regex as a separator then Java provides a class called Scanner. Scanner has constructors that take
Stream objects and methods that allow the reading of all primitives other than
char and user defined tokens as
String objects as well as
Note that the
IOException on the underlying stream is handled by the Scanner’s
close() method which must be called hence the try with resources. Scanners are reasonably adept at reading numbers in different formats – see the Oracle I/O trail for an example.