Regular Expressions

What are Regular Expressions?

Regular expressions (regex) are patterns used to match character combinations in strings. Ruby uses the Onigmo regex engine.

Basic Syntax

Literals

# Match exact string
regex = /hello/
puts "hello world".match?(regex)  # true

# Case insensitive
regex = /hello/i
puts "HELLO world".match?(regex)  # true

Character Classes

# Any digit
/\d/     # [0-9]

# Any letter
/[a-zA-Z]/

# Any character except newline
/./

# Word character (letter, digit, underscore)
/\w/

# Whitespace
/\s/

# Negated classes
/\D/     # Non-digit
/\W/     # Non-word
/\S/     # Non-whitespace

Quantifiers

# Zero or more
/a*/

# One or more
/a+/

# Zero or one
/a?/

# Exactly n times
/a{3}/

# n or more times
/a{3,}/

# Between n and m times
/a{2,5}/

Anchors

# Start of string
/^hello/

# End of string
/world$/

# Word boundary
/\bword\b/

# Start of line
/^/

# End of line
/$/

Using Regex in Ruby

Match Method

string = "The quick brown fox"
match = string.match(/quick (\w+)/)
puts match[0]    # "quick brown"
puts match[1]    # "brown"
puts match.pre_match   # "The "
puts match.post_match  # " fox"

Match? Method (Ruby 2.4+)

puts "hello".match?(/^h/)  # true
puts "hello".match?(/^w/)  # false

Scan Method

text = "The year is 2023 and 2024 will be better"
years = text.scan(/\d{4}/)
puts years  # ["2023", "2024"]

Sub and Gsub

text = "Hello, world!"

# Replace first occurrence
puts text.sub(/l/, "L")    # "HeLlo, world!"

# Replace all occurrences
puts text.gsub(/l/, "L")   # "HeLLo, worLd!"

Split

text = "apple,banana,cherry"
fruits = text.split(/,/)
puts fruits  # ["apple", "banana", "cherry"]

Common Patterns

Email Validation

email_regex = /\A[\w+\-.]+@[a-z\d\-]+(\.[a-z\d\-]+)*\.[a-z]+\z/i
puts "user@example.com".match?(email_regex)  # true

Phone Numbers

phone_regex = /\A\d{3}-\d{3}-\d{4}\z/
puts "555-123-4567".match?(phone_regex)  # true

URLs

url_regex = /\Ahttps?:\/\/[^\s]+\z/
puts "https://example.com".match?(url_regex)  # true

Dates

date_regex = /\A\d{4}-\d{2}-\d{2}\z/
puts "2023-12-25".match?(date_regex)  # true

Advanced Features

Groups and Captures

regex = /(\w+) (\w+)/
match = "John Doe".match(regex)
puts match[1]  # "John"
puts match[2]  # "Doe"

Named Captures

regex = /(?<first>\w+) (?<last>\w+)/
match = "John Doe".match(regex)
puts match[:first]  # "John"
puts match[:last]   # "Doe"

Lookahead/Lookbehind

# Positive lookahead
regex = /foo(?=bar)/
puts "foobar".match?(regex)  # true
puts "foobaz".match?(regex)  # false

# Negative lookbehind
regex = /(?<!not )foo/
puts "foo".match?(regex)     # true
puts "not foo".match?(regex) # false

Non-capturing Groups

regex = /(?:https?|ftp):\/\/[^\s]+/
# The ?: makes it non-capturing

Regex Options

# Case insensitive
/regex/i

# Multiline mode
/regex/m

# Extended mode (ignore whitespace and comments)
/regex/x

# Combined options
/regex/imx

Performance Considerations

# Pre-compile regex for reuse
EMAIL_REGEX = /\A[\w+\-.]+@[a-z\d\-]+(\.[a-z\d\-]+)*\.[a-z]+\z/i

# Avoid catastrophic backtracking
# Bad: /(a*)*b/
# Good: /a*b/

Common Mistakes

# Forgetting to escape special characters
regex = /hello.world/  # Matches "helloXworld"
regex = /hello\.world/ # Matches "hello.world"

# Using ^ and $ incorrectly
regex = /^hello$/      # Matches exactly "hello"
regex = /hello/        # Matches "hello" anywhere

Tools and Libraries

Built-in String Methods

"hello".gsub(/[aeiou]/, "*")  # "h*ll*"
"test".start_with?(/te/)      # true
"test".end_with?(/st/)        # true

Oniguruma Features

Ruby's regex engine supports advanced features like:

Best Practices

Regular expressions are powerful but can be complex. Start simple and build up complexity as needed. Always test your regex patterns thoroughly.

Loading