I/O                                                                                      Peter Komisar
                                                                                                             revision Aug 2, 2001

references: 1) 'Java 1.2 Developer's Handbook', Heller&Roberts, 2) 'Just Java', P.van der Linden
3) Unicode support in Solaris Operation Systems, a Sun Site white paper, url noted below


IO stand for input and output. The use of the computer depends on the input and
output process. Information has to been entered into the machine for it to process.
Information might be entered into the computer from the keyboard. Data can also
be supplied to the computer from a file. Initially the standard means of outputing
data from a computer was to send the processed information to a printer. The
monitors we view has become the commonest means of viewing a computer's
output. Frequently data is input from files located on hard drives. When a program
has finished, the results are typically output to a file for storage. Java supplies the io
package and it's classes to provide java programs with the capability to to input
and output data.

Streams

It is easy to picture a stream of water running from a lake down a grade to a
small pond. The lake is the source for the stream of water. The pond is the
destination. Drawing the comparison to the world of computers, the water is
the carrier of data or information. The source might be a data source such as
a file or a data structure in a program. The destination might be another file,
or an internet connection. In the water metaphor, the stream runs through
a trench or perhaps is carried in pipes or aquifers. In Java, the conduit for
data is supplied by the different classes of the io package.

Encoding

Before streams can be used effectively, there has to be agreement on both
ends of a transmission just what the data means. A series of 1s and 0s can
be intepreted in a number of ways. Do you count 8, 16, 24 or 32 bits per
character. What order are the bits to be interpreted? Are the most significant
bits first or last in the transmission.

ASCII and ISO-8859-1

Encoding describes the schemes used to translate characters into binary bit
patterns, represented by 0s and 1s. The ASCII letter capital 'A' has a decimal
value, 65. In binary this can be described in a single byte as 1000001. Using
the first seven bits of a byte to describe the 128 characters is the original ASCII
character set. Using all 8 bits to describe 256 characters is a character set called
ISO-8859-1 by the ISO standards organization.

As time went on, different characters were mapped to the different numeric values
of the byte to derive other character sets. (i.e ISO-8859-9). A bewildering array
of these character sets have been created describing not just different languages
but also different platform versions of each of these languages. Soon schemes
were sought to bring these character sets into single manageable groups.

Unicode and UCS-2

One of these was Unicode which uses 2 bytes to describe 65,535 characters.
This is the character set Java uses. Java from it's inception was internationalized.
Unicode currently has about 35,000 of it's values assigned to the characters that
make up the worlds languages. Unicode is the same set by a different name as
UCS-2. UCS-2 is an abreviation for Universal Character Set and is a ISO
standard. Like Unicode, USC-2 uses two bytes to describe each character.

UCS-4

Though it is likely that Unicode will supply everyone's character set needs for some
time to come, ISO recognizes that in addition to spoken languages there are languages
non-spoken languages used in mathematics, science and commerce. There are
experimental invented languages. There are also many dialects in the world being
discovered or just now being committed to script. Beyond these are ancient dialects
that are being discovered and need representation. Considering these facts, it is
apparent that Unicode is too small! The great character set endorsed by ISO has
is UCS-4 and takes 4 bytes for each character it describes. It is also referred to a
as ISO-10646 character set.
 

UTF-8 and UTF-16

As you can imagine, if you were sending a stream of data half way around the world,
in UCS-4 encoding, but all your characters were ASCII values, you would be sending
3 blank bytes along with one byte that held the ASCII information. Very inefficient
use of bandwidth. (Bandwidth in this context is the number of bytes of data used per
character.)

UTF-8 and UTF-16 are clever schemes that allow a variable number of bytes of data
to be used depending on what type of character is being sent. UTF stands for Universal
Character Set Transformation Format. UTF first determines what character encoding
type it is sending. ASCII always has the most significant bit empty. UTF_8 reads this bit
and sees only a single byte needs to be used to send an ASCII character. For Unicode,
one to three bytes are used. For UCS-4 up to seven bytes are needed to send a character.
This implies that UTF-8 may be more or less efficient in terms of bandwidth used, than
transmissions based solely on fixed-length character sets.

The good news is character sets were deviced to be backwards compatible with
earlier character sets, so ASCII is a subset of ISO-8859-1. ISO-8859-1 is a
subset of UCS-2 or Unicode. UCS-2 or Unicode are subsets of UCS-4.

Table depicting characters set relationships
 
    UCS-4 // ISO 10646
    UCS-2// Unicode
  ISO-8859-1 
   ASCII

Common Data Formats
 
 ASCII   American Standard Code for 
 Information Interchange
 7-bits,
 [1 byte]
 128 mostly readable characters
 ISO 8859-1  256 ISO character code   8-bits, 
 [1 byte]
 adds many non-English characters
 Unicode  synonymous with UCS-2  16 bits, 
 [2 bytes] 
 most of the world's characters 
 UCS-2   Universal Character Set
 two byte encoding
 16 bits,
 [2 bytes]
 1st plane of ISO/IEC 10646 
  in two bytes (0 to 64K)
 USC-4  Universal Character Set
 four byte encoding
 32 bits, 
 [4 bytes]
 Full ISO/IEC implementation
 in 4 bytes // ISO 10
 UTF-8 *  UCS Transformation Format
 versatile but complex
 [1 to 6
 bytes]
 if bit 1 is 0,-->1 byte ASCII
 if 1st bits are 110,-->2 bytes
 if 1st bits are1110,->3 bytes etc.
 UTF-16   extended variant of UCS-2  [2 to 4 
 bytes]
 
 binary  data transfer in numeric form  [1 to 8 
 bytes]
 binary version of Java chars
 objects    streaming java objects         [variable length]   the serialization  process 

Note: ASCII is a subset of ISO 8859-1 which is a subset of Unicode/(UCS-2)
           which is  a subset of UCS-4/(ISO/IEC 10646)
 

* A Table describing Details of UTF-8 encoding
 
 Bits   Hex Min   Hex Max   UTF-8 Binary Encoding 
 7  00000000  0000007F  0xxxxxxx 
 11   00000080   000007FF  110xxxxx 10xxxxxx
 16   00000800   0000FFFF   1110xxxx 10xxxxxx 10xxxxxx 
 21   00010000   001FFFFF  11110xxx 10xxxxxx 10xxxxxx 10xxxxxx 
 26   00200000   03FFFFFF  111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 
 31  04000000   7FFFFFFF 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 
 from 'Unicode support in Solaris Operation Systems', a white paper at the Sun Site.

Character Set Translation between Java & the Operating System

As mentioned earlier, Java uses Unicode internally. The operating system that Java
is running on may not be using Unicode. Solaris for instance may be using ASCII,
ISO8859-1 or UTF. Mac uses ASCII coupled with a proprietary character format.
NT can use Unicode, ASCII or UCS. The Java IO functions take care of translating
between the the character set(s) being used by the underlying operating system and
the Unicode character set Java uses.

For more details on character encoding you may wish to see the white paper on
Unicode support in Solaris Operating Systems at

www.sun.com/software/white-papers/wp-unicode/#1014899
 

The Package Gets Bigger

When Java was first released it had a set of IO classes that were based on working
with bytes of data. Unfortunately, when this byte-based system of classes was put
into service using the various character encoding described earlier, problems arose.
The fix that was taken was simple. Wherever applicable, for each byte-based stream
class, a char-based counterpart was created. The new set of classes were designed
to work with 2-byte Unicode characters instead of bytes. Now in addition to the 8 bit
Input/OutputStream classes of jdk1.0 the Reader/Writer classes of jdk1.1 were
added.
 

This has made the io package larger by 13 classes. The io package had already been
criticized for creating more classes than were needed to provide input output services.
In anycase, it is what it is. Although it is a little more work to learn than perhaps it
needed to be, once you get use to the io package, you will find it easy to use.

Reader and InputStream read in, while Writer and OutputStream write out. The
8-bit streams are still used to exchange binary data in many of the Java API's.

There are three streams that are automatically opened.

1) System.in // reads bytes from the keyboard
2) System.out // writes bytes to the screen famous for System.out.println( )
3) System.err  // seperate out to screen to report errors

Each has a set method,  setIn( ), setOut( ) and setErr( ) to redirect the stream.

Example showing setIn( ) & setOut( )   // from 'Just Java', P.V Linden page 615

FileOutputStream fs1 = new FileOutputStream("stdlog.txt");
System.setOut(new PrintStream(fs1);

FileInputSteam fi = new FileInputStream("input.txt");
System.setIn( fi );



Levels of Organization

One nice feature of the stream classes is they follow predictable naming patterns.
The steam class names can be determined from the categories they fall into.

1)  Stream width, 8 or 16 bit    // Input & OutputStreams are 8 bit, Reader & Writer's are 16 bit
2)  Source / Destination or Function                        // for example, File, ByteArray Pushback etc.
3)  Direction of Input or Output                // Input & Reader are IN, Output and Writer are OUT

Most of the time you can deduce the name of the class you need by deciding the
source/destination or function, whether you need an 8 or 16 bit stream, and which I/O
direction you are going in. So to write an array of bytes out to stream you would use
ByteArray + OutputStream or ByteArrayOutputStream. To read a file of 16 bit,
char types in you would use File + Reader or FileReader.

Basic Methodology

To use the IO classes the following steps are followed.

1) Select the appropriate I/O class for the objective.
2) Instantiate  the class using the most appropriate constructor
3) High level streams may be layered over low-level streams by making the
    low-level stream objects the arguments to the high-level stream's constructor
3) Call the appropriate read( ) or write( ) methods on the top level stream.
 



IO's abstract superclasses
InputStream & OutputStream
Reader & Writer

In general, the methods of the stream classes throw IOException. The methods that do
not cause exceptions, are involved with processes other than streaming. IOException is
the parent exception of 15 specialized exception classes in the io package and described
in the JDK documentation. The try{}catch( ){  } construct is usually present when doing
IO to handle the potential of exceptions being thrown (unless you select to pass the
exception handling on to the enclosing scope).

Note the abstract superclasses are not all that abstract! Only one or two methods in each
class is abstract and require implementation. Both Reader and Writer have an additional
constructor which take an Object instance as an argument. The object's lock is used to
synchronize thread access to shared code in a multithreaded environments. The functions
contained in the 8 bit classes are mirrored in the 16 bit classes, distinguished primarily by
the argument types, being byte for Input/OutputStream and char for Reader/Writer methods.
 

InputStream
 
 abstract int read( )   reads one byte from a source returning the value in the
 low-order 8 bits of an int type
 int read(byte[] dest)  reads bytes from source into dest array, in this case returning 
 an int value describing the number of bytes read
 int read(byte[] dest,
 int offset, int length)
 reads length bytes into dest array, beginning at offset All 
 three forms of read return -1 when no more data is available 
 void close( )  releases system resources associated with the source 
 i.e. the file descriptor
 int available( )   returns the number of bytes that can be read or skipped 
 from the given input stream without blocking 
 long skip(long nbytes)  attempts to skip and discard nbytes, returns the number 
 actually skipped
 boolean markSupported( )*  returns true if mark/reset mechanism is supported
 void mark(int readlimit)*  sets a mark in the input stream
 void reset( )   resets the stream to repeat the read from mark
*doesn't throw IOException
 

OutputStream
 
abstract void write(int b)   writes the byte in the low-order 8-bits of  b
 discarding the high 24
void write(byte b[])  writes an array of bytes, b
 void write
 (byte b[] , int offset, int length)
 writes an array subset, b, from offset, length bytes long
void flush( )   writes out any bytes which may have been buffered
void close( )  releases system resources associated with the data source
1) Output Streams methods are declared throwing IOException   2)  flush( )  buffers before close( )
 

Reader
 
 int read( )   reads one character from source, returned in 
 the low-order 16 bits of an int
 int read(char[] dest)  reads characters from source into dest array,
 returns the number read
 abstract int read
 (char[] dest, int offset, int length)
 reads length chars into array dest beginning at offset all 
 three forms of read return -1 when no more data is available
 abstractvoid close( )  releases system resources associated with source
  i.e. the file descriptor
 long skip(long nchars)  attempts to skip and discard nchars, returns the number 
 actually skipped
 boolean markSupported( )*  returns true if mark/reset mechanism is supported
 void mark(int readlimit)*   sets a mark in the input stream
 void reset( )   resets the stream to repeat the read from mark
 void ready( )   returns true if stream has data immediately available
 (so read( ) won't block)
  * don't throw IOException
 

Writer // note Writer has two extra writes that take String or a String subset
 
 void write(int c)   writes char in low order 16-bits of argument
 void write(char[] c)  writes an array of characters
 void write
 (char[] c, int offset, int length)
 writes a subset of an array of characters
 void write(String s)   writes a string
 void write
 (String s, int offset, int length)
 writes a subset of String s of given length from offset 
 abstract void flush( )  writes out any characters the stream has buffered
 abstract void close( )  releases system resources associated with source
       * flush( ) buffers before close( )


Miscellaneous

1) available( ) is replaced by ready( ) in Reader.

2) Because Reader and Writer methods convert native codesets properly, they should be used
    when processing character data.

3) read( ) methods returning an int allowing for 16-bit char values and EOF which is an int value, -1
    (0 to 0xFFFF) & 0xFFFFFFFF(-1), EOF)

4) The number of bytes read by a method is dictated by the system's default encoding. i.e

     (i)  if ASCII, one byte is read and promoted  internally to two-byte Unicode.
     (ii) if Unicode two bytes are read and no conversion is needed
     (iii) if UTF is in effect, 1 to 3 bytes are read, and the corresponding Unicode (java) char is
          assembled

5) Currently the encoding in effect for a file cannot be changed.


Survey of the IO classes

Ignoring the Exception classes, IO classes can be grouped as by whether they deal
with 8 or 16-bit data types, and whether they are designed to do input or output.Of
the stream classes, a couple are unique and fall into a separate category. Further there
are  number of non-stream type classes that form another division. Finally, within the
general divisions there are low-level and high-level stream classes. Low level stream
open on specific destinations while high level streams open on other streams.

The tables below categorize the io classes along the lines just described. Notice there
is not perfect symetry. Sometimes stream classes fall into one or two categories instead
of having a version in all four general divisions.

                                            8 Bit Input Output Stream Classes
      Low Level InputStreams   Low Level OutputStreams
 FileInputStream
 ByteArrayInputStream
 PipedInputStream
 FileOutputStream
 ByteArrayOutputStream
 PipedOutputStream
 High Level InputSteams   HighLevel OutputStreams
 BufferedInputSteam
 DataInputStream
 PushbackInputStream
 BufferedOutputStream
 DataOutputStream
 PrintStream

                                                16 Bit Reader Writer Classes
     Low Level Readers    Low Level  Writers
 FileReader
 CharArrayReader
 PipedReader
 StringReader
 FileOutputStream
 ByteArrayOutputStream
 PipedWriter
 StringReader 
 High Level Readers  High level Writers
 BufferedReader
 PushbackReader
 LineNumberReader
 BufferedWriter
 PrintWriter

                                                                Specialty Classes
 Unique Stream Classes  Non-stream based IO Classes
 SequenceStreamReader
 InputStreamReader 
 OutputStreamWriter
 ObjectInputStream &
 ObjectOutputSteam
 File 
 RandomAccessFile
 StreamTokenizer
 FileDescriptor

 

Low-Level Stream Classes  // open on files, arrays, strings, pipes
 
.
 FileInputStream a byte-based input stream that reads from a file
 FileOutputStream a byte-based output stream that writes to a file

 FileReader  a character-based stream for reading from a file
 FileWriter  a character-based stream for writing to a file



 ByteArrayInputStream takes input from a byte array or a subset of a byte array
 ByteArrayOutputStream writes to a byte array

 CharArrayReader reads characters from a character array
 CharArrayWriter writes characters to a character array



// Has no corresponding 8-bit streams. There were StringBuffer[In/Out]putStreams. 
// They are deprecated in 1.2 as they do not properly convert characters into bytes. 
// The recommended way to create a stream from a string is via the StringReader class.

 StringReader reads characters from a string
 StringWriter  writes characters to a string buffer (not to String
 // has getBuffer( ) and toString( )



 PipedInputStream reads bytes from a corresponding piped output stream
 PipedOutputStream writes bytes to a corresponding piped input stream

 PipedReader reads characters written to a corresponding piped writer
 PipedWriter writes characters to a corresponding piped reader

 // Piped streams are used for inter-thread communication in multithreaded environments. 
 // Piped streams can replace wait( )/notify( ) in the classic producer/consumer scenario.


High-Level Stream Classes // Take their input from other streams



 
 
.
 BufferedInputStream uses an internal byte array to buffer data read from source
 BufferedOutputStream collects byte data until buffer is full then writes in one operation

 BufferedReader uses an internal character array to buffer data read from source
 BufferedWriter collects character data in it's buffer until full then writes in one operation



 DataInputStream reads bytes and translates them into primitives, char arrays & strings 
 DataOutputStream writes primitive data types, strings and byte arrays to output stream

 // No Reader/Writer equivalents



 // LineNumberInputStream is deprecated since jdk 1.1 It had no writer counterpart 

 LineNumberReader maintains a count of # of lines it has read // BufferedReader subclass



 PrintStream supports writing text 
 //  deprecated except for System.out, a static Printstream instance in System class

 PrintWriter has all of PrintStream methods except it writes characters not bytes

 // System class defines a static PrintStream object ,out , on which the PrintStream 
 // method println( ) is often called in popular form System.out.println("literal");



 PushbackInputStream allow byte(s) read to be pushed back to source 
 PushbackReader allow char(s) to be pushed back to source // both use internal stacks

 //  no corresponding subclasses of OutputStreams or Writer 



 SequenceInputStream class combines two or more input streams 

 // takes  two streams as args or an enumeration
 // Vector v = new Vector; v.addElement(s1); v.addElements(s2); etc. provide v as arg

 // no counterparts



 //  no corresponding opposite classes 

 InputStreamReader takes an InputStream subclass reads bytes and converts them to chars
 OutputStreamWriter takes an OutputStream subclass, takes chars and streams to bytes



 ObjectOutputStream writes serialized objects to stream
 ObjectInputStream reads serialized objects from stream

 //  no corresponding Reader/Writer classes 
.


File class

In Java, file meta-data, (information about a given file but not it's contents) is returned by
methods of the File class. The File class doesn't itself do any IO.

1) returns a File object which a file can be opened on.
2) tests if a file exists, or can be read/write
3) tests whether a File object represents a file or a directory
4) returns #bytes and when file was last modified
5) has methods to create and delete files and directories

Some of Files methods // see jdk API  for fuller descriptions
 
 public int compareTo(java.io.File);  lexigraphical comparison < = = >
 public static File createTempFile
 (String1, String2, File) 
 throws IOException
 File is dir where temp file will be created 
 String1-->file name prefix 
 String2--> suffix (min.3 chars long)
 public void deleteOnExit( );   marks file to be deleted when program ends 
 (can't rescind call)
 public boolean mkdir( )   creates one directory
 public boolean mkdirs( )   as mkdir( ) plus any needed parent directories
 public java.lang.String[] list( );   returns a String array [of files & dirs] contained 
 in the dir from which the method has been invoked
 public java.io.File[] listFiles( );  same as list( ) but returns File objects not String
 listRoots( )  lists the available filesystem roots (i.e. "A:\", "C:\" etc.) 
 public java.net.URL toURL( )
 throws MalformedURLException
 puts File object into URL form  (file:/// something)
 (if the File object is a directory it end's in a slash "/")
 public boolean createNewFile( )
 throws  java.io.Exception
 creates a new file if it doesn't already exist and 
 returns boolean if a new file has been created.


FileDescriptor

Instances of the file descriptor class serve as an opaque handle to the underlying machine-
specific structure representing an open file, an open socket, or another source or sink of
bytes. The main practical use for a file descriptor is to create a FileInputStream or
FileOutputStream to contain it. Applications should not create their own file descriptors.

// info straight from the JDK API docs


RandomAccessFile

RandomAccessFile supports reading and writing as well as pointer positioning. It is not
based on the streaming model so cannot be chained or layered with other streaming
classes. RandomAccessFile has the same wide variety of reading and writing methods that
DataInputStream and DataOutputStream have. RandomAccessFile also has the methods,

void seek( long position ) sets the position of the file pointer.
long getFilePointer( ) returns the current location of the file pointer


StreamTokenizer

StreamTokenizer parses input into tokens, usually from inside a while loop, calling nextToken( )
until the end of the input is reached. The int returned by nextToken( ) describes the type of the
next token, (as described by static fields such as TT_NUMBER, TT_WORD, TT_EOL, &
TT_EOFMany methods are available to recognize various user-specified characters.
StreamTokenizer is paricularly suited for parsing Java, C or C++.
 
 

//  note there is also a StringTokenizer but it is in the java.util package and not a part of the
// IO package It is described as being easier to use than StreamTokenizer so it might be worth
// visiting for certain applications example from the JDK 1.3 documentation

   StringTokenizer st = new StringTokenizer("this is a test");
            while (st.hasMoreTokens( )) {
                println(st.nextToken( ));
               }

//  prints the following output:

          this
          is
          a
          test



Using the IO Classes

The following example shows the basic approach the io classes use to stream data.

1) Opening an input steam to read bytes from a file
 

     file bytes   -->  retrieved via FileInputStream -->  to bytes --> of  an  object  in memory
.

 // getting bytes from a file for a program's use

example FileInputStream fis = new FileInputStream("disk_file.txt");

// once created, the object's read methods can be called to access the data of the file


2) Opening an output stream to stream bytes to a file
 

 From an object reference representing data  -->  FileOutputStream  -->  bytes  -->  to a file
.

// storing bytes produced in a program to a file

example  FileOutputStream fos = new FileOutputStream("disk_file.txt");

// once created, the object's write methods can be called to send bytes to file


3) Opening a high order stream on a low order stream
 

 bytes in a disk file  --> FileInputStream --> bytes --> DataInputStream --> ints, doubles etc 
 or
 ints, doubles etc --> DataOutputStream--->bytes --> FileOutputStream  -->bytes in a disk file 
.

example     FileInputStream fis = new FileInputStream("disk_file.txt");
                      DataInputSteam dis = new DataInputStream(fis);

                      FileOutputStream fos =  new FileOutputStream("disk_file.txt");
                      DataOutputStream dos = new DataOutputStream(fos);

// chained or coupled streams where a 'high order' stream takes the 'low level' stream as input

The following example shows both input and output processes in one short piece
of code. First a PrintWriter object is created ready to act as a conduit for streaming
data to a file. Then a BufferedReader object is built based on System.in, the keyboard
as a source.  Then a read method, readLine( ) reads data from the keyboard which is
stored in a String object. This object is then written out to file via the println method.

Example using System.in and Thee IO Classes

import java.io.*;

public class ToFile{
public static void main(String[]args){
  String line;
  PrintWriter out=null;
  try{
    out=new PrintWriter(new FileOutputStream("target"));
    }
  catch(FileNotFoundException f){
    System.out.println("file not found");
    System.exit(0);
    }
  BufferedReader in=new BufferedReader
  (new InputStreamReader(System.in));

  System.out.println("Type in a line");
  try{
     line=in.readLine( );
     out.println(line);
     out.close();
     in.close();
     }
 catch(IOException io){
      System.out.println("IOException");
     }
  }
}