Types To Regular Expressions
  PickAxe Companion Notes
 
Peter Komisar                    v.1.2                      Conestoga College

references:
Programming Ruby 1.9, The Pragmatic Programmers Guide
Thomas, Fowler & Hunt
Tutorial Point,
http://www.tutorialspoint.com/ruby/ruby_regular_expressions.htm
Regular Expressions Info., http://www.regular-expressions.info/posixbrackets.html
http://ruby.about.com/od/regularexpressions/a/regexgrouping.htm


In Ruby numbers fit four classifications

Integers

Ruby seems to have a 'laissez-faire' philosophy regarding the
length of Integers which, rather than being regulated to a fixed
limit are allowed to be any value that system memory is able
to afford.

This apparently indeterminate condition does in fact follow a
strict mandate and is associated with the native machine word
length as is describe in the following quote from the text in
Chapter 22, the 3rd article.

"Ruby integers are objects of class Fixnum or Bignum. Fixnum
objects hold integers that fit within the native machine word minus
1 bit. Whenever a Fixnum exceeds this range, it is automatically
converted to a Bignum object, whose range is effectively limited
only by available memory. If an operation with a Bignum result has
a final value that will fit in a Fixnum, the result will be returned as
a Fixnum. "
                                   - Programming Ruby 1.9, Thomas, Fowler & Hunt

// some bits of 32 or 64 must be used for accessory purposes 


Fixnum or Bignum


In fact, at least on this machine, which is a 32 bit system the
point at which a number no longer is a Fixnum and becomes
a Bignum is 1073741823 or 2 30 - 1.  For the record, this is
the value that is described in PickAxe for the limit of a Fixnum
in the Numbers section of Chapter 6 on Standard Types.


Maximum Fixnum
// on this 32 bit system

This is not hard to prove. Object's class( )  method may be
used to report what class a number is being stored as. 
Numbers can be looped until the changeover point occurs.
 
A zero start will take quite a while to come to an answer
especially if the increment is by one each iteration.


The following example benefits from some earlier testing
and we can also say this reporting is being done on a 32
bit machine.



Example


i=1073741811;  # starting oddly close to 2 30
puts i.class

while( i.instance_of?(Fixnum))
     i = i + 1
     puts i
end
print i - 1
puts  " #{i-1} is a #{( i -1 ).class}"

print i
puts " is a #{i.class}"

OUTPUT // produces

Fixnum
1073741812
1073741813
1073741814
1073741815
1073741816
1073741817
1073741818
1073741819
1073741820
1073741821
1073741822
1073741823
1073741824
1073741823 1073741823 is a Fixnum  
1073741824 is a Bignum


We can go back to a text statement now and appreciate better
what it is saying.

"Integers within a certain range (normally −230...230-1 or −262...262-1
are held internally in binary form and are objects of class Fixnum.
Integers outside this range are stored in objects of class Bignum "

                                          - Programming Ruby 1.9, Thomas, Fowler & Hunt


Underscores are Ignored in Digit Strings

The underscore suffices for a comma separator in  large
numbers. It makes for some getting use to as is shown in
the next example.


Example

s = 1_1 + 2_2 + 1_0_0_0    # equivalent of 11 + 22 + 1000
puts s

OUTPUT // produces

1033

Numbers can be shown in different number systems using
the following forms.


Escapes for Octal, Decimal Hex and Binary

The following example shows the escapes in use.



Example


puts 010    # octal
puts 0d10   # decimal, the default
puts 0x10  # hex
puts 0b10  # binary


OUTPUT


8
10
16
2

// there are also methods that do these conversions


Float Class Numbers


A numeric literal becomes a float object if a decimal point
and / or an exponent is included as part of the literal.

Following are examples showing a decimal pointed literal,
one that includes an exponent and a mix of both.


Example

 
puts 7.2
puts 7e2
puts 7.2e3

OUTPUT

7.2
700.0
7200.0


Note the decimal point must have a preceding numeric value.
The decimal point must be followed by a numeric value or the
exponent symbol, e.

Examples

.25         # illegal
0.25       # OK
1.           # illegal
1.0         # OK


Ruby 1.9 adds rational and complex number support.


Rational Numbers

Ruby 1.9 adds support for rational and complex numbers.
Rational numbers are fractions formed from integer values.

Rational Number Example

3 / 4


 Complex Numbers


Complex numbers are points on the complex plane with
a real and imaginary part.

"A complex number is a number consisting of a real and
imaginary part. It can be written in the form a + bi, where
a and b are real numbers, and i is the standard imaginary
unit with the property i 2 = −1."                    - Wikipedia


Neither number type has a literal form in Ruby. Instead
constructors are supplied to store the two components
of each type of number.

Adding Rational Numbers Example

fraction_sum = Rational(3, 4) + Rational(1, 4)
puts fraction_sum

OUTPUT

1/1 

// the output is always expressed as a rational number


Mixing a Float Value Causes a Promotion of Rational To Float

fraction_sum =Rational(4, 4) + 0.25
puts fraction_sum

OUTPUT

1.25

// because there are few other promotions to consider,
// suffice it to say Float mixed with Complex yields Complex



Complex Number Example


puts Complex(3, 7)

OUTPUT

3+7i   # i for imaginary part


Returning Numeric from String Values Representing Numbers

String values need to be converted to appropriate types
in order to process them as numbers. Consider the use
of Ruby 'wrapper classes' in the following example.


Example

s1 = "900"
s2 = "200"
s3= s1 + s2
puts s3   # concatenates string values

s4 = Integer(s1) + Integer(s2) + Float(2.5) + Rational(1,2)
# wrappers put them in numeric form that can be added
puts s4

OUTPUT

900200
1103.0


Class Integer and Parent Numeric & Looping

Much looping behavior in Ruby is found in the Numeric class
and it's child, Integer. In other languages such as Java, ceil( )
and floor are found in the Math class. Ruby has a Math class
too but reserves it for harder core functions, such as you find
on a typical Math calculator.

Integer Methods

ceil
chr
denominator
downto
even
floor
gcd
gcdlcm
integer
lcm
next
numerator
odd?
ord
pred
rationalize
round
succ
times
to_i
to_int
to_r
truncate
upto


Common useful looping methods include
The parent, Numeric adds the useful step( ) method.

The following examples show some of these in use.

A times( ) Method Example 

string = "0"
5.times { puts string = string  + string } 

OUTPUT

00
0000
00000000
0000000000000000
00000000000000000000000000000000


A downto( ) Method Example // upto( ) is the reverse

10.downto(0) {|i| print "#{i}  " }


OUTPUT


10  9  8  7  6  5  4  3  2  1  0


A step( ) Method Example



0.step(10, 1) {|i| print "#{i} " }    # step from 0 to 10 by ones


OUTPUT


0 1 2 3 4 5 6 7 8 9 10


The String Type 


Ruby's strings like other languages are sequences of characters.
They are objects of the String class. String literal are enclosed
within single or double quotes, the latter able to evaluate expressions.

Different elements which might otherwise not be amenable to
putting inside a string can be entered using escape sequences.

Following are a large number of Escape Sequences. All the
escapes listed at the web site worked in Ruby except for \x.

A few more are added from the text. These need to be enclosed
in double quotes to evaluate.


Escape Sequences     // just for reference

from J2EE Web Thumbnail Images
http://www.java2s.com/Code/Ruby/String/EscapeCharacterslist.htm


Escape       Description
Add
// for reference, these escapes may be preceded by the question (making them
// Ruby character constants) but this is an unnecessary
redundancy



Code to Test the Escapes


esc1 = " 1. \a 2. \b 3. \cx 4. \C-x 5. \e 6. \f 7. \M-\C-x 8. \n\' "
esc2 = "9. \001  10. \r 11. \s  12.  \t  13. \v  14.  15.  \x7D 16. \\ "
#  \xnn  - escapes a hexadecimal value
# \nnn - octal notation for different signals, 001 - SOH, 002 STX etc.

# new line and an apostrophe escape at 8.
puts esc1
puts esc2

OUTPUT //  facsimile, view output in Scite
 
 
 # 1. BEL 2. BS 3. CAN 4. CAN 5. ESC 6. FF 7. " 8.
  '
 # 9. SOH 10.
 # 11.  12.       13. VT    14.   15.   }  16. \
 

#{ expr }

We have seen the use of the #{ } form to escape the results
of an expression into a string.  The following serves as a
reminder on how to create a class, instantiate it, create
accessor methods and reference a variable value via the
accessor method.

Example

class V
    attr_accessor :v
    def initialize(v)
    @v = v
end
end

tv=V.new(" lizards")
puts " V is about #{tv.v}."

OUTPUT

V is about lizards.


Here is a new shortcut. "If the enclosed value is a global,
class or instance variable, the braces can be omitted."


Example
// from Programming Ruby 1.9, Thomas, Fowler & Hunt

"Safe level is #$SAFE" # => Safe level is 0


The code enclosed in the braces can be multi-lined.


Example // from Programming Ruby 1.9, Thomas, Fowler & Hunt

puts "now is #{ def the(a)
'the ' + a
end
the('time')
} for all bad coders..."
produces:


More Ways of Doing Quotes! %q, %Q

Lower-case q represents a single quote while the upper-case
stands for a double quote.  You define the delimiter as long as
it is non-alphanumeric and not a multi-byte character. There
can't be any space between the % sign and the character that
has been chosen as the delimiter.

Example

puts %q* single quote *
puts %Q^ double quote  ^

# the Q is optional! If the Q is omitted the quote is equivalent to a double quote!

puts %? Also a double quote ?


The Here Document

 
If you can stand one more way, there is a form called the
'here document' which you can look up in the book.


Encoding

In Ruby the default encoding of a string literal is US-ASCII.
To return the name of the encoding system used on a string
the encoding( ) method may be called.

Example

string = "Ruby's default string literal encoding: "
print string
puts string.encoding

OUTPUT

Ruby's default string literal encoding: US-ASCII 

There is a whole Chapter dedicated to encoding, Chapter
17 so the story on encoding is stopped at a short visit.

// putting  # encoding: utf-8 in commented form changes
// the file encoding to UTF-8, but that is Chapter 17


String Code Example


The text states there are over 100 String methods and
shows one interesting and practical example where a
file with vertical bar, separated fields is iterated.

The tasks are described in the book as a ) breaking each
line into fields, b ) converting the running times from minutes
and seconds to seconds, and c ) removing extra spaces
from the artists’ names.

We mimic the text example below. You may refer back to
the original for a second take on it. We omit the part of
converting minutes and seconds to seconds. First a file is
created using the comma as a delimiter.


A produce.txt
File

fruit       , apples   ,  6 ,          Macintosh ,   bag

vegetable   , carrots  , 12 , Ontario      Orange,   bundle
canned_good , soup     ,  2 , Chicken   Vegetable,   can


The text introduces some interesting new elements at this
stage. First is a Struct type, a data type that holds attributes.
The Struct is instantiated with the 'new' keyword.

The following two methods are used in conjunction with a
regular expression that selects in our case for a comma.

chomp( )
  - In Ruby docs, the method chomp( ) is described
as returning "a new String with the given record separator
removed from the end of str".

split( )  - "divides str into substrings based on a delimiter"

We also have a nested block in the code. The file is opened
and an array readied for use. The inner block creates sub-
strings based on the comma separations and stores them
in appropriate fields via a parallel assignment, a form that
is discussed later. You can see it at work here in the comma
separated set of fields that receive successive values. The
outer block resumes and outputs the loaded array elements
to console.

The Struct is then instantiated on the appropriate fields.
Each Struct created is added to the produce array. The
contents of each of these is then put to console.  Comment
in the squeeze( ) method which works on the description
field, and unnecessary white spaces are removed.


Example

Produce = Struct.new(:type, :name, :description)
File.open("produce.txt") do |produce_file|  # outer block
produce_array = [ ]
produce_file.each do |line|   # inner block
type, name, quantity, description, container = line.chomp.split(/\s*\ , \s*/)
# description.squeeze!(" ")
produce_array <<Produce.new(type, name, description)
end

puts produce_array[0]
puts produce_array[1]
puts produce_array[2]
end

OUTPUT // after description.squeeze is commented in

#<struct Produce type="fruit", name="apples", description="Macintosh">
#<struct Produce type="vegetable", name="carrots", description="Ontario Orange, bundle">
#<struct Produce type="canned_good", name="soup", description="Chicken Vegetable, can">


To make the above example easier to understand the
following example was generated using methods. You
can decide if it is easier to follow. Blocks win the economy
race here though there is some extra overhead encapsulating
the code as a class that needs to be counted in.


Example Reformed As a Class With Methods Rather Than With Blocks


class Chimp
 
def initialize
    file = "produce.txt"
    produce_array =[ ]
    fileIn(file, produce_array)
  end

  def fileIn(file, produce_array)
    file=File.open(file)   
    enum = file.each()
      loop do
        type, name, quantity, description, container = enum.next.chomp.split(/\s*\ , \s*/)
        produce_array <<Produce.new(type, name, description)
      end

    puts produce_array[0]
    puts produce_array[1]
    puts produce_array[2]
     end 
end 

Produce = Struct.new(:type, :name, :description)
Chimp.new                 



Produce = Struct.new(:type, :name, :description)
Chimp.new                 

Ruby Ranges

Ruby uses ranges in:

Sequences


Sequences step through a given range using ' the .. and ... range'
operators.

the  . . and . . . range Operator

Notice in the following example the range is converted to an array
using the to_a( )  method.



Example


ary1 = (5..7).to_a        # two-dot form, inclusive
p ary1
ary2 = (5...7).to_a       # three-dot form, exclusive
p ary2

OUTPUT

[5, 6, 7]
[5, 6]              

// the highest value in the range is excluded in triple dot range form

The to_enum method will generate an enumeration.

Example

enum = ('a'..'g').to_enum
while(true)
print " #{enum.next} "
end

OUTPUT

 a  b  c  d  e  f  g


Range Methods


Range is a class with the following core methods.

Range Class Methods

 ==
 ===
 begin
 cover?
 each
 end
 eql?
 exclude_end?
 first
 hash
 include?
 inspect
 last
 max
 member?
 min

new
 step
 to_s

The next example shows a few of the methods in use.


Example


dozen = (1..12)
 puts dozen.first
 puts dozen.last
 puts dozen. max
 puts dozen. min
 puts dozen. include?(14)
 puts dozen.hash
 
OUTPUT

1
12
12
1
false
-468882126


Omission
:
See text for successive comparison of objects in
ranges, as the text states; " In reality, this isn’t something
you do very often, so examples tend to be a bit contrived."

Forward: 
Ranges as Conditions is covered in the Loops
section of Chapter 8.


Ranges as Intervals
 

The following range example uses the === operator which
has obvious utility. 


Case Equality Operator, ===

The case equality operator is introduced taking the form of
three successive equal signs


Example

puts (1.. 3 ) === 4       # 4 is outside range, produces 'false'
puts (8...11) === 11   # 11 is excluded by triple dot form so also shows false


The commonest use is in a case statement which will likely
show up in Chapter 8 again.


Case Example Using Range Intervals

age = 15
case age
when 0..1
puts "Infant"
when 2..3
puts "Toddler"
when 4..5
puts "Preschool"
when 6..12
puts "Older child"
when 13..19
puts "Adolescent"
when 20..30
puts "young adult"
# and so on
end

OUTPUT

Adolescent


Regular Expressions


A regular expression is a set of characters that create
a pattern. This pattern when matched against a string
can act as a filter.

Regular expressions allow you to test a string for a
pattern match. One can also substitute in replacement
text for sections of string that match a given pattern.


Forward Slashes, the Common Regular Expression Delimiter

Forward slashes are the commonest delimiter for regular
expressions.


Example

/ rip /   # matches trip ripe and gripe but not Rip or r.i.p. 

Within a pattern, all characters except the following match
themselves.  The following special character must be
preceded by a backslash to be part of the match pattern.


Special Characters That Require Backslash Escapes

 braces
  (   )  {    }   [    ]
 the period & question mark   .  ?
 plus & multiply   +    *
 vertical bar & backslash   |     \   
 caret and dollar sign   ^    $  


The =~ Operator

The =~ operator is used in Ruby to match a string against a
pattern, returning the character offset into the string at which
the match occurs.

Example

offset = /sun/ =~ "where the sun never sets"
puts "Counting from zero the offset to the match is #{offset}"


OUTPUT

Counting from zero the offset to the match is 10


Pattern literals are like double quotes allowing the expression
substitution form, #{  }  to evaluate. 

Also, if a match is not found nil is returned which is interpreted
as false in Ruby. This matching expressions can be used in
conditional statements.


Example


regex = /T#{2*2}/
 
str = "I did not receive my T4"

if str =~ regex
puts "Attention: T4 request"
else puts "No problemo!"
end

OUTPUT

Attention: T4 request


The following example from PickAxe searches for lines
with the word 'on' in them.


Example
// from Programming Ruby 1.9, Thomas, Fowler & Hunt


File.foreach("testfile").with_index do |line, index|
puts "#{index}: #{line}" if line =~ /on/
end

produces:

0: This is line one
3: And so on...


The Is Not a Match Operator,  !~

The 'Is not a Match' operator can be used in the
negative to select when a match is not found. 



Example

regex = /Fudge/
str = "Fudge, I did not receive my T4"
if str !~ regex
puts "OK"
else puts "Censored"
end

OUTPUT

Censored


The sub( ) Method

The sub(  ) method allows a matched text to be replaced
by a given pattern.  Following is the notation used in Ruby
docs to describe the method signature.


Ruby Docs Notation for the sub( ) & gsub( ) Method

str.sub(pattern, replacement) → new_str


The method is called on a string. It's arguments are the
pattern, a replacement value, and returns the string
result.

// gsub for global sub

The sub(  ) method only replaces the first occurrence
of the pattern in the string. The gsub( ) method replaces
all occurrences of the pattern.


sub( ) Example

regex = /Fudge/
 
str1 = "Fudge, I did not receive my T4"
str2= str1.sub("Fudge", "Heavens To Betsy")
puts str2

OUTPUT

Heavens To Betsy, I did not receive my T4


gsub( ) Example


str3 ="mississippi"
str4 = str3.gsub("i", "a")
puts str4

OUTPUT

massassappa


The sub!( ) and gsub!( ) Methods

Both sub( )and gsub( ) return new strings that show the
changes however the original string is not changed.  Add
the exclamation mark and the original string is changed.
But this only happens if there is a match. otherwise nil
is returned.

Example

str = "It's now or never"
str.sub!(/w/, "*")
str.gsub!(/v/, "_")
puts str

OUTPUT

It's no* or ne_er

The following example shows that where there is no
match, with the exclamation form of the methods, nil
is returned and the output shows nothing.


Example

s = "Anything but a number"
nomatch = s.sub!(/2/, "*")
nomatch2= s.gsub!(/3/, "_")
puts s
puts nomatch     
puts nomatch2


The Regexp Class

Here is what the documentation says about the Regexp class.

" Regexp holds a regular expression, used to match a pattern
against strings. Regexps are created using the /…/ and %r{…}
literals, and by the Regexp::new constructor. "


Regexp Methods

==
===
=~
casefold
compile
encoding
eql?
escape
fixed_encoding
hash
inspect
last_match
match
named_captures
names
new
options
quote
source
to_s
try_convert
union
~




Following are three forms that can be used to create regular
expression objects.


Three Notations, Forward Slashes, Constructor and %r Forms

Example
// put a space after sytax in rx3 and see the output


rx1 =  / X /
rx2 =  Regexp.new("constructor")
rx3 =  %r{syntax}

s=" X constructor the r syntax"

if rx1=~s && rx2 =~s && rx3=~ s
puts "ALL TRUE"
else
puts "AT LEAST ONE FALSE"
end

OUTPUT

ALL TRUE


Regular Expression Modifiers // from Tutorial Point
http://www.tutorialspoint.com/ruby/ruby_regular_expressions.htm

"Regular expression literals may include an optional modifier to control various aspects of matching. The modifier is specified after the second slash character . . . and may be represented by one of these characters:"
"
Modifier Description
i Ignore case when matching text.
o Perform #{} interpolations only once, the first time the regexp literal is evaluated.
x Ignores whitespace and allows comments in regular expressions
m Matches multiple lines, recognizing newlines as normal characters
u,e,s,n Interpret the regexp as Unicode (UTF-8), EUC, SJIS, or ASCII. If none of these modifiers is specified, the regular expression is assumed to use the source encoding.
"
"Like string literals delimited with %Q, Ruby allows you to begin your regular expressions with %r followed by a delimiter of your choice. This is useful when the pattern you are describing contains a lot of forward slash characters that you don't want to escape:
"


The following example shows how these modifiers are applied.


Example

s= "X"

if s=~/x/
    puts True
    else
    puts "False as case is not the same"
end
# add the i, ignore_case option

if s=~/x/i
    puts "True, with the ignore case switch on"
    else
    puts "False as case is not the same"
end   

OUTPUT

False as case is not the same
True, with the ignore case switch on

// for reference: see text for $& and $'


The MatchData Class


MatchData holds all the information available regarding
a given pattern match.  

Accessory Methods


PickAxe defines a pretty nice method that shows the
stuff before and after a match using the above methods.


Example
  // method from Programming Ruby 1.9, Thomas, Fowler & Hunt


s = "The fox jumped over the fence."

def show_regexp(string, pattern)
match = pattern.match(string)
if match
"#{match.pre_match}->#{match[0]}<-#{match.post_match}"
else
"no match"
end
end

puts show_regexp(s, /over/)

OUTPUT

The fox jumped ->over<- the fence.


The next example breaks out the contents of the above
example. Notice that match method is called on the pattern
and the string in question is passed in as an argument.

Subsequently the pre_match() and post_match() methods
are called on the match returned by the match() method.


Example

s = "Numerator / Denominator"
pattern = /\//  # that's a forward slash being escaped with a backslash
match=pattern.match(s)
pre_= match.pre_match
post_ = match.post_match

puts " #{pre_} --> #{match} <-- #{post_} "

OUTPUT

Numerator  --> / <--  Denominator


Ruby Anchors


Anchors constrain a pattern search to specific locations in
the string field.

Matches at the Beginning and End of Lines with ^ and $

The caret mark, ^ anchors a search to the beginning of a line.
The dollar sign, $ limits searches to the end of a line.

The following example shows these two anchors being
used. You will need to comment in different pattern values.
First do the search without a leading ^ or a trailing $ sign.
Then add them to select only for matches at starts and
ends of lines.


Example
array=%w{ABCDEFG GABCDEF FGABCDE EFGABCD}
pattern=/D$/   # repeat with ^A, ^F , E$ and D$
0.upto(3) {|i| puts "#{i}. #{pattern.match(array[i])}" }


OUTPUT


0.
1.
2.
3. D


Word and Non-Word Boundary Anchors, \b and \B


\b - finds occurences of a pattern that are words, bounded
      by white space
\B -finds occurences of a pattern within word boundaries
      that is, with no shared white space boundaries

Example

array=%w{ words and nonwords such have boundaries }
pattern=/\Bonwor/   # repeat with ^F , E$ and D$
0.upto(7) {|i| puts "#{i}. #{pattern.match(array[i])}" }

OUTPUT
   // nothing with \b
.
1.
2. onwor
3.
4.
5.
6.
7.


Example


array=%w{ words and nonwords such have boundaries }
pattern=/\band/   # repeat with ^F , E$ and D$
0.upto(7) {|i| puts "#{i}. #{pattern.match(array[i])}" }

OUTPUT  // nothing with \B

0.
1. and
2.
3.
4.
5.
6.
7.


Character Classes

Character classes are sets of characters that match
for any of the contained characters. Inside square
brackets the following special function characters have
their actions turned off.


Special Characters Muted inside Square Braces
However standard escapes can still be used such as \t or \n.


Character Class Form


[ characters ]


Common Character Classes
 // for more see text

The caret ^ 'NOTs' the values

Following are the POSIX Character classes which
Ruby supports, however they are nested inside Ruby's
own square braces. The example after the listing
shows how this is done.


POSIX Character Classes
 // for more info see
http://www.regular-expressions.info/posixbrackets.html


Example 

puts s = "My Swiss account number is 9182736".gsub!(/[0-9]/, "-" )  
puts s2 = "tIhKeX vXirMuXsNX XruXinBed XevTerXythiXng".gsub!(/[[:upper:]]/,"")
puts s3 = "AB@CD$E#FG#H@I^JK_LMNOP".gsub!(/[^[:alnum:]]/,"")

OUTPUT

My Swiss account number is -------
the virus ruined everything
ABCDEFGHIJKLMNOP


Abbreviated Forms of Character Classes



Creating Intersections with &&


Conditions can be mixed with by logical anding with the &&
operator.

 Example  // from Programming Ruby 1.9, Thomas, Fowler & Hunt

str = "now is the time"
str.gsub(/[a-z&&[^aeiou]]/, '*') # => "*o* i* **e *i*e"

// see PickAxe for info on \p operator introduced in Ruby 1.9
// for working with Unicode characters


A period Outside a Bracket Represents Any Character


A unbracked period (.) represents any character except
newline in a single line and including newline in multiline
mode.


Example  // from Programming Ruby 1.9, Thomas, Fowler & Hunt

a = 'It costs $12.'
show_regexp(a, /c.s/)   # => It ->cos<-ts $12.
show_regexp(a, /./)       # => ->I<-t costs $12.
show_regexp(a, /\./)      # => It costs $12->.<-


Repetition  

A regular expression appended with one of the following
allows variations on the number of characters that are
matched.

Example

regex+      // Matches one or more occurrences of the regular expression

Repetition Symbols

symbol
 action
+ one or more (occurrences of regex)
*
zero or more    // caution noted below
? zero or one
{m,n} at least m and at most n
{m,} at least m
{ ,n}   at most n
{ m } exactly m

        
Note in the following example, we use sub( ) rather than gsub( )
which replaces every case of a condition versus the first case.
This lets the actions of the repetition symbols to be expressed.


Example


s = "ABBCCCDDDEEEEE"
# select for one C and replace with asterisk
puts s.sub(/C/, "x" )  
# select for all C's using +, the group is replaced
puts s.sub(/C+/, "x" )  
# select for exactly 2 C's, one is left
puts s.sub(/C{2}/, "x" )  

OUTPUT

ABB*CCDDDEEEEE  // one C is relaced
ABB*DDDEEEEE        // all three C's are replaced by one x
ABB*CDDDEEEEE     // exactly two C's are replaced and one is left


Caution Using * Operator

The text cautions when using the asterisk *. Because it selects
for zero or more occurrences, there is always a positive result
even if a pattern isn't present.


Matching This OR That with |


The OR symbols selects 'this or that'.


Example


s = "rocky mountain flat plain sandy island"
# OR condition
puts s.sub(/rocky mountain | frozen tundra/, "muddy marsh " )  

OUTPUT

muddy marsh flat plain sandy island


Grouping With Parenthesis Operator
http://ruby.about.com/od/regularexpressions/a/regexgrouping.htm

When looking for a certain redundant aspect in a pattern
such as 'ab' in the following string, the parenthesis operator
can supply a sub-grouping mechanism for the search.

 

Example

s = "abababababababab "
puts s.sub(/(ab)+/, "AB" )  


For reference: See text for more complicated uses of
parenthesis including naming parenthesized groups.

" You give a group a name by placing ?<name> immediately after
the opening parenthesis. You can subsequently refer to this named
group using \k<name> (or \k’name’). " -PickAxe

Pattern Substitutions

Because the sub( ) and gsub( ) methods were used in
a lot of the examples the notion of substitutions based
on patterns that involve grouping, repetition and alternation
(ORing) have already been observed.

Following are PickAxe examples that elaborate on these
notions.

 Example  // from Programming Ruby 1.9, Thomas, Fowler & Hunt

a = "quick brown fox"
a.sub(/[aeiou]/, '*') # => "q*ick brown fox"
a.gsub(/[aeiou]/, '*') # => "q**ck br*wn f*x"
a.sub(/\s\S+/, '') # => "quick fox"
a.gsub(/\s\S+/, '') # => "quick"
 

Using Backslash Sequences in the Substitutions

Pattern groups can be referenced by back-slashed indexes
as in \1 and \2.

In the following the back slashes can be used to control
substitution order. In the first example all the word characters 
are selected for each of two words.

Example

puts "first:last".sub(/(\w+):(\w+)/, '\2, \1')

OUTPUT

last first

In the next example each group selects for one character
and the back slashed indexes reorder them in reverse in
groups of three. 

puts "simsispispi".gsub(/(.)(.)(.)/, '\3\2\1')


Omission: Section 7.4 Advanced Regular Expressions

The section begins, " You may never need the information
in the rest of this chapter."  We have already reached our
maximum for the note so this section is left as a reference
for the day when you may need all the regular expression
power that Ruby has to offer.


Assignment


1. Using the examples at the top of the note or inventing your
own procedure, determine what numeric value on your home
or school machine is the maximum stored as a Fixnum, after
which numbers become Bignums.


2.  Use the fact that underscores are ignored in number notation
to write a lotto win of $ 50 billion to console using underscores
to replace commas.



3. The following example is supplied for your convenience.
Express 4 different values larger than 1000 to console in octal,
decimal, hex and binary forms.


Example


puts 010    # octal
puts 0d10   # decimal, the default
puts 0x10  # hex
puts 0b10  # binary


4. For the record which of the following is a rational number.

a )  22
b )  4 / 5
c )  3.3 / 2.2
d )  6.25


5 )  For a quick hands on review, (technically the #{ } form
is being reviewed here ) mimic the following class definition,
instantiation, and variable access for a TV show you like
(or don't like).  A book would be good too! 

Example

class V
    attr_accessor :v
    def initialize(v)
    @v = v
end
end

tv=V.new(" lizards")
puts " V is about #{tv.v}."


6 ) a)  Refer to the following example to put to console two quotes
using the described alternate system, the first with lower-case q
and the second upper-case. Use different user defined delimiters
in the quotes.

b ) Prove the form that omits the Q is the equivalent of a double
quote by including an #{ } expression that will only evaluate within
a double quote system.

Example

puts %q* single quote *
puts %Q^ double quote  ^

# the Q is optional ! If the Q is omitted the quote is the same as a double quote!

puts %? Also a double quote ?


7. Refer to the following example if you wish. Create an array that
stores a 30 day month and one that stores a 31 day month using
the same 31 range, but varying the dot form on one to exclude the
final day.

 
Example

ary1 = (5..7).to_a        # two-dot form, inclusive
p ary1
ary2 = (5...7).to_a       # three-dot form, exclusive
p ary2

OUTPUT

[5, 6, 7]
[5, 6]              


8. You may refer to the following example to write a sentence,
and store it as a string that talks about 'Mothers Day'. Use the
gsub( ) method to replace occurences of the word Mothers with
Fathers.

sub( ) Example

regex = /Fudge/
str1 = "Fudge, I did not receive my T4"
str2= str1.sub("Fudge", "Heavens To Betsy")
puts str2e



9. Ruby supplies three forms for the regular expression as is
shown in the following example. Rework the example so that
all three regular expressions though in different forms select
for the same value. Reduce the string to the one value that is
patterned in the regular expressions. Then use the same logic
to show all three patterns are the same by getting an 'ALL TRUE'
or 'ALL THE SAME' output.


Example // put a space after sytax in rx3 and see the output

rx1 =  / X /
rx2 =  Regexp.new("constructor")
rx3 =  %r{syntax}

s=" X constructor the r syntax"

if rx1=~s && rx2 =~s && rx3=~ s
puts "ALL TRUE"
else
puts "AT LEAST ONE FALSE"
end


10. Use any regular expression form or syntax you like
to output to console the sentence without the intervening
periods.

"The.....low.....pressure.....area....is.....moving....our.....way".

// think this one through but if you are short on time search
// the note for 'virus' and you will find an example that can be
// adapted to this question.