How do I split a token from the end of my string?

123 Views Asked by Dave At 17 May 2025 at 14:05

I want to separate a string into two parts if a token from an array is found at the end of the string. I have tried this:

x = "Canton Female"
GENDER_TOKENS = ["m", "male", "men", "f", "w", "female", "wom"]

x.partition(/(^|[[:space:]]+)[#{Regexp.union(GENDER_TOKENS)}]$/i)
 #=> ["Canton Female", "", ""]

But although the word "female" is part of my tokens, it is not getting split out. How do I adjust my regex so that it gets split properly?

Original Q&A

There are 3 best solutions below

Tom Lord On 21 December 2017 at 18:21 BEST ANSWER

I'm a little unclear what you are asking - what is the desired result? However, here's what I think you're looking for:

GENDER_TOKENS = ["m", "male", "men", "f", "w", "female", "wom"]

"Canton Female".split(/\b(#{Regexp.union(GENDER_TOKENS).source})$/i)
#=> => ["Canton ", "Female"]

"Tom Lord".split(/\b(#{Regexp.union(GENDER_TOKENS).source})$/i)
#=> => ["Tom Lord"]

String#split will split the string on each match; unlike String#partition, which returns [head, match, tail]. I think that's probably what you wanted?
\b is a word boundary anchor. This is a cleaner solution than trying to match on "start of line or whitespace".
The Regexp union is wrapped in round brackets to group the values together, not square brackets. The latter makes it a character set, which is clearly not what you wanted.
Regexp#source returns only the inner "text" of the regexp; unlike the (implicit) Regexp#to_s you were using, which returns the full object including option toggles - i.e. /(?-mix:m|male|men|f|w|female|wom)/

Cary Swoveland On 21 December 2017 at 20:09

GENDER_TOKENS = %w[m male men f w female wom]
GENDER_REGEX = /\b(?:#{GENDER_TOKENS.join('|')})\z/i
  #=> /\b(?:m|male|men|f|w|female|wom)\z/i

def split_off_token(str)
  idx = str =~ GENDER_REGEX
  case idx
  when nil
    [str]
  when 0
    ['', str]
  else
    [str[0, idx].rstrip, str[idx..-1]]
  end
end

split_off_token("Canton Female")
  #=> ["Canton", "Female"]
split_off_token("Canton M")
  #=> ["Canton", "M"]
split_off_token("wom")
  #=> ["", "wom"]
split_off_token("Canton Fella")
  #=> ["Canton Fella"]

Max On 21 December 2017 at 18:25

Why not split first?

parts = x.split
if GENDER_TOKENS.include? parts.last.downcase
  # ...
end

Probably not much slower, and way more readable

How do I split a token from the end of my string?

There are 3 best solutions below

Related Questions in RUBY

Related Questions in REGEX

Related Questions in STRING

Related Questions in SPLIT

Related Questions in RUBY-2.4

Trending Questions

Popular # Hahtags

Popular Questions