The Python Oracle

Split string based on regex

Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: Secret Catacombs

--

Chapters
00:00 Question
00:38 Accepted answer (Score 170)
00:52 Answer 2 (Score 70)
01:33 Answer 3 (Score 1)
01:46 Thank you

--

Full question
https://stackoverflow.com/questions/1320...

Accepted answer links:
[this demo]: http://ideone.com/qoaTqr

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #regex #split

#avk47



ACCEPTED ANSWER

Score 170


I suggest

l = re.compile("(?<!^)\s+(?=[A-Z])(?!.\s)").split(s)

Check this demo.




ANSWER 2

Score 70


You could use a lookahead:

re.split(r'[ ](?=[A-Z]+\b)', input)

This will split at every space that is followed by a string of upper-case letters which end in a word-boundary.

Note that the square brackets are only for readability and could as well be omitted.

If it is enough that the first letter of a word is upper case (so if you would want to split in front of Hello as well) it gets even easier:

re.split(r'[ ](?=[A-Z])', input)

Now this splits at every space followed by any upper-case letter.




ANSWER 3

Score 1


Your question contains the string literal "\b[A-Z]{2,}\b", but that \b will mean backspace, because there is no r-modifier.

Try: r"\b[A-Z]{2,}\b".