The Filter Classes Guide

Introduction
Using Filters
Writing Filters
Filter Class Hierarchy

1. Introduction

UNIX programs can be combined so that the output from one program is passed to the next program as input. For example, the grep and sort programs could be combined as follows:

      grep 'pylon' | sort < input.txt > output.txt

The grep program reads its input from the file input.txt and outputs only those lines that contain the word 'pylon'. The output from the grep program is then passed as input to the sort program, which sorts the lines of its input and outputs them to the file output.txt.

The filter classes presented here allow classes derived from them to be combined with each other in a similar fashion. For example, assuming the existence of filter classes analogous to each of the programs in the above example, it could be written using filter classes as

      Grep grep("pylon");
      Sort sort;
      Append  input  = Append::file("input.txt");
      WriteTo output = WriteTo::file("output.txt", WriteTo::OVERWRITE);
      Filter f = grep | sort;
      f.connect(input, output);

or more concisely as

      Filter f = Grep("pylon") | Sort();
      f.connect(Append::file("input.txt"), 
                WriteTo::file("output.txt", WriteTo::OVERWRITE));

Note that filter classes can also be used by themselves. The code fragment

      Grep grep("pylon");
      grep.connect(cin, cout);

reads its input from the C++ standard input stream cin and outputs only those lines containing the word 'pylon' to cout, the C++ standard output stream.

The remainder of this document discusses the use and construction of filters in more detail. Section 2 describes how to use and combine existing filters; section 3 explains how to write new types of filters; and section 4 presents the hierarchy of filter base classes.

[ Contents | Introduction | Using Filters | Writing Filters | Hierarchy ]

[ Contents | Introduction | Using Filters | Writing Filters | Hierarchy ]

2.1 Using Single Filters

To use a filter to read data from an input stream, process it and output the results to an output stream, simply connect the input stream to the output stream through the filter using the filter`s connect() member function. For example, the following code fragment constructs a filter and then connects the standard input stream to the standard output stream through it:

   Take take10(10);
   take10.connect(cin, cout);

The first line constructs a filter, specifically a filter that outputs the first 10 lines of its input (or all of the lines if there are fewer than 10). The second line connects the standard input stream (cin) to the standard output stream (cout) through the filter named take10. This causes the take10 filter to read data from cin, output the first 10 lines to cout, and discard the rest.

The above code fragment could be written more concisely as

   Take(10).connect(cin, cout);  // outputs first 10 lines of cin to cout

This form is usually more convenient, provided that the filter isn`t going to be used again. Note that the constructors of some filters may throw an exception if the filter cannot be constructed, so this last form may make it more difficult to handle any such exceptions. This is especially true when several filters are combined, as discussed below.

[ Contents | Introduction | Using Filters | Writing Filters | Hierarchy ]

2.2 Using Combined Filters

Filters can be combined into a sequence of filters so that the output of one filter in the sequence is used as the input to the next filter in the sequence. The following code fragment combines two filters to extract a range of lines from its input:

   Take take12(12);     // keeps first 12 lines on input
   Drop drop4(4);       // discards first 4 lines of input
   Filter extractRange = take12 | drop4;  // extracts lines 5-12
   extractRange.connect(cin, cout);

The first line constructs the Take filter take12 that outputs the first 12 lines of its input, and the second line constructs the Drop filter drop4 that outputs all but the first 4 lines of its input. In the third line the take12 and drop4 filters are combined using the filter concatenation operator (|) to create the extractRange filter, a filter that outputs the fifth through the twelfth line of its input. The fourth line of code connects the standard input stream to the standard output stream through the extractRange filter: the fifth through the twelfth line read from the standard input stream is written to the standard output stream.

As can be seen in the previous example, the result of combining two or more filters is a filter object of class Filter. A Filter object stores a copy of each of its component filters (i.e. the filters that are combined to create the Filter object) and connects these copies together in the proper order. Then when the Filter object is connected to an input stream and an output stream it reads data from the input stream and passes it on to the first of its component filters, which processes it and outputs it to the next component filter for use as input, and so on until the last component filter writes its output to the output stream to which the Filter object is connected. Thus, in the last line of the previous example, connecting the standard input stream and the standard output stream through the extractRange filter causes the following to occur:

the extractRange filter reads data from cin and passes it to the take12 filter as input;
take12 passes the first 12 lines of its output to the drop4 filter as input; and
drop4 writes all but the first 4 lines of its input to cout.

Note that since a Filter object stores copies of its component filters, the filters that were combined to create the Filter object can be modified or deleted and the Filter object will still work properly. Thus the following code fragment will have the same effect as the previous one:

   Filter extractRange = buildLineExtractor(5, 12);
   extractRange.connect(cin, cout);
   .
   .
   .
   Filter buildLineExtractor(int startLine, int endLine)
      {
      // assumes startLine > 0 and startLine <= endLine
      Take take(endLine);
      Drop drop(startLine - 1);
      return take | drop;
      }

The filter returned by buildLineExtractor() will work correctly despite the fact that the filters take and drop are destroyed at the end of buildLineExtractor().

The buildLineExtractor() function could be rewritten as

   Filter buildLineExtractor(int startLine, int endLine)
      {
      // assumes startLine > 0 and startLine <= endLine
      return Take(endLine) | Drop(startLine - 1);
      }

but, as mentioned above, the constructors of some filters throw an exception if the filter can`t be constructed, and this form of combining filters makes it difficult to determine which constructor threw a given exception.

Also note that a Filter object can be constructed from just one filter. The following version of buildLineExtractor(), which deals sensibly with all possible parameter values, illustrates this:

   Filter buildLineExtractor(int startLine, int endLine)
      {
      if (endLine <= 0 || startLine > endLine)  // empty line range
         return Take(0);
      else if (startLine <= 1)                  // start at line 1
         return Take(endLine);
      else
         return Take(endLine) | Drop(startLine - 1);
      }

The if and else if clauses construct a Filter object from a single Take filter to handle the degenerate cases.

[ Contents | Introduction | Using Filters | Writing Filters | Hierarchy ]

2.3 Connecting Filters

Besides the version of the filter member function connect() that connects an input stream and an output stream through a filter, there are three other versions of connect() that connect, respectively,

an input stream and an output filter;
an input filter and an output stream; and
an input filter and an output filter.

Any filter can be used as an input filter or an output filter, but only certain filters are useful in each role. Input filters aren`t passed any input data, so usually only filters that output data without being passed any input data are used as input filters. And the output of output filters is discarded, so usually only filters that write their output to one or more other destinations are used as output filters. A given filter`s documentation should indicate whether it is useful as an input filter and/or an output filter.

In actual fact, filters can only connect together other filters: the three versions of connect() that connect a stream just create a filter that reads from or writes to each of the streams that it connects, then connects the two filters.

[ Contents | Introduction | Using Filters | Writing Filters | Hierarchy ]

2.4 Filter Constructor Exceptions

As mentioned above, filter constructors may throw an exception if the filter can`t be constructed. Usually the exception thrown will be derived from FilterException, though not always. For example, out-of-memory exceptions will not normally be derived from FilterException.

The FilterException class has a public member function named getMessage() that returns a String containing an explanation of why the exception was thrown. This message should make it easier to determine why a filter that threw a FilterException couldn`t be constructed.

A common reason for the construction of a filter to fail is that its constructor has been passed an invalid argument. In such cases the filter`s constructor will throw an exception of class InvalidFilterArgumentException, a class derived from FilterException.

[ Contents | Introduction | Using Filters | Writing Filters | Hierarchy ]

2.5 Filter Limitations

Filters currently have one major limitation: they do not properly handle data that contains null characters. Since filters are usually used to process text, this limitation should not normally be a problem.

Usually all of the characters from the null character to the character before the next newline character, inclusive, are lost, though this is not guaranteed.

This limitation may be eliminated in later version of the filter classes.

[ Contents | Introduction | Using Filters | Writing Filters | Hierarchy ]

3. Writing Filters

Deriving from SingleFilter
Deriving from ShellFilter
Deriving from ComponentFilter
Throwing Exceptions from Filter Constructors
Utility Functions

New filter classes are created by deriving from the ComponentFilter class or one of its descendants. Deriving directly from ComponentFilter is rare: most filters are derived from SingleFilter, though those that just use another filter to implement their functionality are often derived from ShellFilter. A new filter class should never be derived from Filter.

[ Contents | Introduction | Using Filters | Writing Filters | Hierarchy ]

3.1 Deriving from SingleFilter

At a minimum, a filter class derived directly from the SingleFilter abstract filter class must

define a copy constructor;
declare the assignment operator as private to prevent its being called accidentally;
call the DEFINE_FILTER_COPIER() macro function in its class declaration, passing it the name of the class; and
override the pure virtual member function processLine().

The class declaration for such a minimal filter would look like the following:

   class MinimalFilter: public SingleFilter
      {
      public:
         MinimalFilter(const MinimalFilter &f);
         void processLine(const String &line);
         DEFINE_FILTER_COPIER(MinimalFilter);
      private:
         const MinimalFilter &operator =(const MinimalFilter &f);
      };

A filter like this wouldn`t be very useful, though, especially in the absence of another constructor besides the copy constructor. Thus most filters also do one or more of the following:

define additional constructors
define a destructor, if necessary
override zero or more of the virtual member functions beforeInput(), afterInput(), handleError() and minimize()

Defining constructors and destructors is a common C++ activity and isn`t covered here, except to note that the copy constructor should do a deep copy so that the copy doesn`t depend on the original after the copying is completed. For example, a copy should not keep a pointer to a data member that the original owns, but should make its own copy of that data member.

The remainder of this section concentrates on when and how to override the virtual member functions mentioned above.

beforeInput(), processLine() and afterInput()

Whenever a filter f, or a filter of which f is a component, has one of its connect() member functions called, the following sequence of events occurs:

f.beforeInput() is called
f.processLine(inputLine) is called for each line inputLine passed to f as input
f.afterInput() is called

Thus beforeInput() is used to do any set-up necessary before input is processed, processLine() processes or stores each line of input as it is received, and afterInput() is used to do any post-input cleanup and/or process and output any remaining input. processLine() is declared to be pure virtual, and so must be overridden; the default implementations of beforeInput() and afterInput() do nothing.

Note that the sequence of events listed above can occur several times during the existence of a filter, so it is important that beforeInput() and afterInput() ensure that the filter works the same way in subsequent uses as it does the first time it is used.

The processLine(), beforeInput() and afterInput() member functions can use the outputLine() member function to output a line of data to the next filter. The argument to outputLine() must be a line: it can contain the end-of-line character ComponentFilter::EOL only as its last character, and all but the last line of output must have the end-of-line character as its last character. The following processLine() function, which simply outputs each line of input unchanged, demonstrates the use of outputLine():

   void PassThrough::processLine(const String &line)
      {
      outputLine(line);
      }

If an error occurs in a filter`s beforeInput(), processLine() or afterInput() function, use the outputError() member function to pass a FilterError object, constructed from a message describing the error, to any filters following this one in a sequence of filters. Alternatively, you could use the version of outputError() that accepts the error message directly.

handleError()

A filter`s handleError() member function receives and handles any FilterError objects output by a previous filter in a sequence of filters. The default implementation of handleError() simply calls outputError() to pass the FilterError on to the next filter, and usually isn`t overridden. If a filter does override handleError(), it should either handle the error (which is usually difficult or impossible) or (eventually) pass the FilterError object to the next filter using outputError().

An example of a filter that would override handleError() is a filter that overwrites a file that might already exist: it could write its output to a temporary file, and if no errors occurred then it would replace the file with the temporary file. It would pass on any FilterError objects using outputError().

minimize()

The final virtual member function that can be overridden is the minimize() function. Some filters have resources that expand depending on the input it receives. For example, the size of a StringBuffer used to store the previous line of input will expand to be about the size of the longest input line. If a filter is going to be reused in the near future (a common occurrence), then it would be wasteful to reduce the size of such resources only to expand them again the next time the filter is used. Thus such resources are usually left at their expanded size at the end of the input (that is, they are not reduced in the filter`s beforeInput() or afterInput() functions).

But there may be times when the user of a filter wants to minimize the resources used by the filter (for example, because the filter won`t be used for a while, the system`s resources are constrained, and/or abnormal input to the filter increased its reducible resources to wastefully large sizes). In such situations the user can call the filter`s minimize() function to minimize the filter`s resource usage.

If you override minimize() in your filter, then it should call its parent class` version of minimize() at some point. A filter must also be able to perform its function after its minimize() function is called: resources necessary to a filter`s correct operation should not be eliminated by its minimize() function.

One common action that a filter`s minimize() function performs is to minimize the size of any StringBuffers it uses during processing. Passing the filter member function minimizeBuffer() an empty (i.e. length() == 0) StringBuffer will minimize the amount of memory the StringBuffer uses.

The virtual member functions discussed above are the only SingleFilter virtual member functions that should be overridden when creating a new filter class. Overriding any of the others could cause filters of that filter class, as well as any filters combined with such filters, to work incorrectly or not at all.

[ Contents | Introduction | Using Filters | Writing Filters | Hierarchy ]

3.2 Deriving from ShellFilter

A filter that is derived from the ShellFilter filter class uses another filter (referred to as its 'implementation filter') to implement its functionality: it is just a 'shell' around the filter that does the actual processing. Among the types of filters derived from ShellFilter are

filters that represent a sequence of (one or more) filters; and
filters whose implementation varies depending on the parameters used to construct it.

To derive a filter class from ShellFilter,

define a copy constructor and any other desired constructors;
declare the assignment operator as private to prevent its being called accidentally; and
call the DEFINE_FILTER_COPIER() macro function in its class declaration, passing it the name of the class.

None of ShellFilter`s virtual member functions need to be overridden, though they can be. The only member functions of ShellFilter that should be overridden are the ones that can be overridden when deriving from SingleFilter. Note that a ShellFilter`s beforeInput() function is called after its implementation filter`s beforeInput() function is called, and its afterInput() function is called before its implementation filter`s afterInput() function is called.

ShellFilter`s default implementation of processLine() just passes its line parameter on to its implementation filter. If you override processLine() in a class you derive from ShellFilter, you may need to access the implementation filter directly. This can be accomplished by using the ShellFilter member function getImplementation(), which returns a pointer to the implementation filter, or NULL if no implementation filter has been selected.

The implementation filter for a ShellFilter filter is usually selected in the filter`s constructor and must be left unchanged for the rest of the filter`s existence. To select an implementation filter, call ShellFilter`s select() member function with the implementation filter as the parameter. The ShellFilter will make and use a copy of the filter passed to select(), so that filter can be destroyed after select() returns.

The following is an example of a constructor for a filter class derived from ShellFilter:

   ExtractRange::ExtractRange(int startLine, int endLine):
      ShellFilter()
      {
      if (endLine <= 0 || startLine > endLine)  // empty line range
         select(Take(0));
      else if (startLine <= 1)                  // start at line 1
         select(Take(endLine));
      else
         select(Take(endLine) | Drop(startLine - 1));
      }

Note that the filters passed to select() in the above example are local to the constructor.

While rarely done, the implementation filter can be selected after the ShellFilter has been constructed, so long as it is selected before one of the ShellFilter`s connect() member functions is called, and before the ShellFilter is used as a component of another filter.

[ Contents | Introduction | Using Filters | Writing Filters | Hierarchy ]

3.3 Deriving from ComponentFilter

Since the only difference between a ComponentFilter and a SingleFilter is that a SingleFilter has connect() member functions, the only filter classes that should be derived directly from ComponentFilter are those that shouldn`t be used directly, but only as components of other filters. Filter classes derived directly from ComponentFilter are written the same way as filter classes derived directly from SingleFilter.

[ Contents | Introduction | Using Filters | Writing Filters | Hierarchy ]

3.4 Throwing Exceptions from Filter Constructors

If a filter class` constructor cannot construct the filter object, then it should throw an exception. If the constructor throws the exception directly, then the exception should be of a class derived from FilterException; other exceptions (most notably exceptions thrown by the new operator when it fails) do not have to be of a class derived from FilterException.

FilterException objects are constructed from a String explaining why the exception was thrown. This explanation should be designed to help the user of the filter to determine why the filter couldn`t be constructed.

If a filter`s constructor is passed an invalid argument (or an invalid combination of arguments), it should throw an exception of class InvalidFilterArgumentException, which is a class derived from FilterException.

[ Contents | Introduction | Using Filters | Writing Filters | Hierarchy ]

3.5 Utility Functions

The ComponentFilter class defines several utility member functions that may be useful in writing filter classes. This section lists and briefly describes these utility functions.

In the following, the end-of-line character is the character ComponentFilter::EOL. It is usually the newline character.

boolean getNextLine(istream &in, StringBuffer &buf)

This function reads the next line from in and copies the line, including the end-of-line character if present, into buf. It returns true if a line was successfully copied into buf, and false if there is no more data to read or an error occurred when reading from in.

void outputLine(const String &str)

This function outputs line to the next filter. Note that line must actually be a line: it can only contain the end-of-line character as its last character, and it must have the end-of-line character as its last character unless it is the last line to be output.

void outputString(const String &str, StringBuffer &buf)

This function appends str to buf and outputs each complete line of the result to the next filter. If the last part of str is not a complete line (that is, if str doesn`t end with the end-of-line character), then this last, partial line is copied into buf; otherwise buf will be empty when this function returns. Note that both str and buf may contain end-of-line characters (though buf won`t when this function returns).

void outputFile(const String &filename, StringBuffer &buf)

This function appends the contents of the file named filename to buf and outputs each complete line of the result to the next filter. If the last part of the file is not a complete line (that is, if the file doesn`t end with the end-of-line character), then this last, partial line is copied into buf; otherwise buf will be empty when this function returns. Note that both the file and buf may contain end-of-line characters (though buf won`t when this function returns).

void outputStream(istream &in, StringBuffer &buf)

This function appends all of the data on in to buf and outputs each complete line of the result to the next filter. If the last part of the data read from in is not a complete line (that is, if the last character read from in isn`t the end-of-line character), then this last, partial line is copied into buf; otherwise buf will be empty when this function returns. Note that both buf and the data read from in may contain end-of-line characters (though buf won`t when this function returns).

void outputError(const FilterError &fe)
void outputError(const String &msg)

The first version of this function outputs fe to the next filter. The second version constructs a FilterError object from the message msg that describes the error, then passes that FilterError object to the next filter.

void minimizeBuffer(StringBuffer &buf)

If buf is empty (that is, if buf.length() == 0), then this function minimizes buf`s capacity, and hence the amount of memory that buf uses. This function is usually called from a filter class` minimize() member function.

[ Contents | Introduction | Using Filters | Writing Filters | Hierarchy ]

4. Filter Class Hierarchy

      ComponentFilter
         |
         +----SingleFilter
                 |
                 +----Filter
                 |
                 +----ShellFilter

[ Contents | Introduction | Using Filters | Writing Filters | Hierarchy ]

The Filter Classes Guide

beforeInput(), processLine() and afterInput()

void outputError(const FilterError &fe) void outputError(const String &msg)

void outputError(const FilterError &fe)
void outputError(const String &msg)