Regular Expressions
What are Regular Expressions?
Regular expressions (regex) are patterns used to match character combinations in strings. Ruby uses the Onigmo regex engine.
Basic Syntax
Literals
# Match exact string
regex = /hello/
puts "hello world".match?(regex) # true
# Case insensitive
regex = /hello/i
puts "HELLO world".match?(regex) # true
Character Classes
# Any digit
/\d/ # [0-9]
# Any letter
/[a-zA-Z]/
# Any character except newline
/./
# Word character (letter, digit, underscore)
/\w/
# Whitespace
/\s/
# Negated classes
/\D/ # Non-digit
/\W/ # Non-word
/\S/ # Non-whitespace
Quantifiers
# Zero or more
/a*/
# One or more
/a+/
# Zero or one
/a?/
# Exactly n times
/a{3}/
# n or more times
/a{3,}/
# Between n and m times
/a{2,5}/
Anchors
# Start of string
/^hello/
# End of string
/world$/
# Word boundary
/\bword\b/
# Start of line
/^/
# End of line
/$/
Using Regex in Ruby
Match Method
string = "The quick brown fox"
match = string.match(/quick (\w+)/)
puts match[0] # "quick brown"
puts match[1] # "brown"
puts match.pre_match # "The "
puts match.post_match # " fox"
Match? Method (Ruby 2.4+)
puts "hello".match?(/^h/) # true
puts "hello".match?(/^w/) # false
Scan Method
text = "The year is 2023 and 2024 will be better"
years = text.scan(/\d{4}/)
puts years # ["2023", "2024"]
Sub and Gsub
text = "Hello, world!"
# Replace first occurrence
puts text.sub(/l/, "L") # "HeLlo, world!"
# Replace all occurrences
puts text.gsub(/l/, "L") # "HeLLo, worLd!"
Split
text = "apple,banana,cherry"
fruits = text.split(/,/)
puts fruits # ["apple", "banana", "cherry"]
Common Patterns
Email Validation
email_regex = /\A[\w+\-.]+@[a-z\d\-]+(\.[a-z\d\-]+)*\.[a-z]+\z/i
puts "user@example.com".match?(email_regex) # true
Phone Numbers
phone_regex = /\A\d{3}-\d{3}-\d{4}\z/
puts "555-123-4567".match?(phone_regex) # true
URLs
url_regex = /\Ahttps?:\/\/[^\s]+\z/
puts "https://example.com".match?(url_regex) # true
Dates
date_regex = /\A\d{4}-\d{2}-\d{2}\z/
puts "2023-12-25".match?(date_regex) # true
Advanced Features
Groups and Captures
regex = /(\w+) (\w+)/
match = "John Doe".match(regex)
puts match[1] # "John"
puts match[2] # "Doe"
Named Captures
regex = /(?<first>\w+) (?<last>\w+)/
match = "John Doe".match(regex)
puts match[:first] # "John"
puts match[:last] # "Doe"
Lookahead/Lookbehind
# Positive lookahead
regex = /foo(?=bar)/
puts "foobar".match?(regex) # true
puts "foobaz".match?(regex) # false
# Negative lookbehind
regex = /(?<!not )foo/
puts "foo".match?(regex) # true
puts "not foo".match?(regex) # false
Non-capturing Groups
regex = /(?:https?|ftp):\/\/[^\s]+/
# The ?: makes it non-capturing
Regex Options
# Case insensitive
/regex/i
# Multiline mode
/regex/m
# Extended mode (ignore whitespace and comments)
/regex/x
# Combined options
/regex/imx
Performance Considerations
# Pre-compile regex for reuse
EMAIL_REGEX = /\A[\w+\-.]+@[a-z\d\-]+(\.[a-z\d\-]+)*\.[a-z]+\z/i
# Avoid catastrophic backtracking
# Bad: /(a*)*b/
# Good: /a*b/
Common Mistakes
# Forgetting to escape special characters
regex = /hello.world/ # Matches "helloXworld"
regex = /hello\.world/ # Matches "hello.world"
# Using ^ and $ incorrectly
regex = /^hello$/ # Matches exactly "hello"
regex = /hello/ # Matches "hello" anywhere
Tools and Libraries
Built-in String Methods
"hello".gsub(/[aeiou]/, "*") # "h*ll*"
"test".start_with?(/te/) # true
"test".end_with?(/st/) # true
Oniguruma Features
Ruby's regex engine supports advanced features like:
- Named groups
- Lookbehind assertions
- Atomic groups
- Possessive quantifiers
Best Practices
- Use raw strings for regex literals
- Pre-compile frequently used regex
- Test regex thoroughly
- Use non-capturing groups when you don't need captures
- Avoid complex regex when simple string methods suffice
- Comment complex regex
- Consider readability over cleverness
- Use regex testing tools online
Regular expressions are powerful but can be complex. Start simple and build up complexity as needed. Always test your regex patterns thoroughly.
