In Python, how do I split a string and keep the separators?

--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Hypnotic Orient Looping

--

Chapters
00:00 In Python, How Do I Split A String And Keep The Separators?
00:27 Accepted Answer Score 471
00:51 Answer 2 Score 11
01:17 Answer 3 Score 54
01:34 Answer 4 Score 19
01:58 Thank you

--

Full question
https://stackoverflow.com/questions/2136...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #regex

#avk47

ACCEPTED ANSWER

Score 471

The docs of re.split mention:

Split string by the occurrences of pattern. If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list.

So you just need to wrap your separator with a capturing group:

>>> re.split('(\W)', 'foo/bar spam\neggs')
['foo', '/', 'bar', ' ', 'spam', '\n', 'eggs']

ANSWER 2

Score 54

If you are splitting on newline, use splitlines(True).

>>> 'line 1\nline 2\nline without newline'.splitlines(True)
['line 1\n', 'line 2\n', 'line without newline']

(Not a general solution, but adding this here in case someone comes here not realizing this method existed.)

ANSWER 3

Score 19

If you have only 1 separator, you can employ list comprehensions:

text = 'foo,bar,baz,qux'  
sep = ','

Appending/prepending separator:

result = [x+sep for x in text.split(sep)]
#['foo,', 'bar,', 'baz,', 'qux,']
# to get rid of trailing
result[-1] = result[-1].strip(sep)
#['foo,', 'bar,', 'baz,', 'qux']

result = [sep+x for x in text.split(sep)]
#[',foo', ',bar', ',baz', ',qux']
# to get rid of trailing
result[0] = result[0].strip(sep)
#['foo', ',bar', ',baz', ',qux']

Separator as it's own element:

result = [u for x in text.split(sep) for u in (x, sep)]
#['foo', ',', 'bar', ',', 'baz', ',', 'qux', ',']
results = result[:-1]   # to get rid of trailing

ANSWER 4

Score 11

Another no-regex solution that works well on Python 3

# Split strings and keep separator
test_strings = ['<Hello>', 'Hi', '<Hi> <Planet>', '<', '']

def split_and_keep(s, sep):
   if not s: return [''] # consistent with string.split()

   # Find replacement character that is not used in string
   # i.e. just use the highest available character plus one
   # Note: This fails if ord(max(s)) = 0x10FFFF (ValueError)
   p=chr(ord(max(s))+1) 

   return s.replace(sep, sep+p).split(p)

for s in test_strings:
   print(split_and_keep(s, '<'))


# If the unicode limit is reached it will fail explicitly
unicode_max_char = chr(1114111)
ridiculous_string = '<Hello>'+unicode_max_char+'<World>'
print(split_and_keep(ridiculous_string, '<'))