In Python, how do I split a string and keep the separators?

Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: Melt

--

Chapters
00:00 Question
00:35 Accepted answer (Score 432)
01:07 Answer 2 (Score 48)
01:30 Answer 3 (Score 33)
02:00 Answer 4 (Score 17)
02:45 Thank you

--

Full question
https://stackoverflow.com/questions/2136...

Accepted answer links:
[docs of ]: https://docs.python.org/3/library/re.htm...

Answer 2 links:
[splitlines(True)]: https://docs.python.org/3/library/stdtyp...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #regex

#avk47

ACCEPTED ANSWER

Score 471

The docs of re.split mention:

Split string by the occurrences of pattern. If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list.

So you just need to wrap your separator with a capturing group:

>>> re.split('(\W)', 'foo/bar spam\neggs')
['foo', '/', 'bar', ' ', 'spam', '\n', 'eggs']

ANSWER 2

Score 54

If you are splitting on newline, use splitlines(True).

>>> 'line 1\nline 2\nline without newline'.splitlines(True)
['line 1\n', 'line 2\n', 'line without newline']

(Not a general solution, but adding this here in case someone comes here not realizing this method existed.)

ANSWER 3

Score 19

If you have only 1 separator, you can employ list comprehensions:

text = 'foo,bar,baz,qux'  
sep = ','

Appending/prepending separator:

result = [x+sep for x in text.split(sep)]
#['foo,', 'bar,', 'baz,', 'qux,']
# to get rid of trailing
result[-1] = result[-1].strip(sep)
#['foo,', 'bar,', 'baz,', 'qux']

result = [sep+x for x in text.split(sep)]
#[',foo', ',bar', ',baz', ',qux']
# to get rid of trailing
result[0] = result[0].strip(sep)
#['foo', ',bar', ',baz', ',qux']

Separator as it's own element:

result = [u for x in text.split(sep) for u in (x, sep)]
#['foo', ',', 'bar', ',', 'baz', ',', 'qux', ',']
results = result[:-1]   # to get rid of trailing

ANSWER 4

Score 11

Another no-regex solution that works well on Python 3

# Split strings and keep separator
test_strings = ['<Hello>', 'Hi', '<Hi> <Planet>', '<', '']

def split_and_keep(s, sep):
   if not s: return [''] # consistent with string.split()

   # Find replacement character that is not used in string
   # i.e. just use the highest available character plus one
   # Note: This fails if ord(max(s)) = 0x10FFFF (ValueError)
   p=chr(ord(max(s))+1) 

   return s.replace(sep, sep+p).split(p)

for s in test_strings:
   print(split_and_keep(s, '<'))


# If the unicode limit is reached it will fail explicitly
unicode_max_char = chr(1114111)
ridiculous_string = '<Hello>'+unicode_max_char+'<World>'
print(split_and_keep(ridiculous_string, '<'))