Elegant Python function to convert CamelCase to snake_case?
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Techno Bleepage Open
--
Chapters
00:00 Elegant Python Function To Convert Camelcase To Snake_case?
00:13 Accepted Answer Score 1201
02:00 Answer 2 Score 163
03:09 Answer 3 Score 307
03:25 Answer 4 Score 45
03:35 Thank you
--
Full question
https://stackoverflow.com/questions/1175...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #camelcasing
#avk47
ACCEPTED ANSWER
Score 1201
Camel case to snake case
import re
name = 'CamelCaseName'
name = re.sub(r'(?<!^)(?=[A-Z])', '_', name).lower()
print(name) # camel_case_name
If you do this many times and the above is slow, compile the regex beforehand:
pattern = re.compile(r'(?<!^)(?=[A-Z])')
name = pattern.sub('_', name).lower()
Note that this and immediately following regex use a zero-width match, which is not handled correctly by Python 3.6 or earlier. See further below for alternatives that don't use lookahead/lookbehind if you need to support older EOL Python.
If you want to avoid converting "HTTPHeader" into "h_t_t_p_header", you can use this variant with regex alternation:
pattern = re.compile(r"(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])")
name = pattern.sub('_', name).lower()
See Regex101.com for test cases (that don't include final lowercase).
You can improve readability with ?x or re.X:
pattern = re.compile(
r"""
(?<=[a-z]) # preceded by lowercase
(?=[A-Z]) # followed by uppercase
| # OR
(?<[A-Z]) # preceded by lowercase
(?=[A-Z][a-z]) # followed by uppercase, then lowercase
""",
re.X,
)
If you use the regex module instead of re, you can use the more readable POSIX character classes (which are not limited to ASCII).
pattern = re.compile(
r"""
(?<=[[:lower:]]) # preceded by lowercase
(?=[[:upper:]]) # followed by uppercase
| # OR
(?<[[:upper:]]) # preceded by lower
(?=[[:upper:]][[:lower:]]) # followed by upper then lower
""",
re.X,
)
Another way to handle more advanced cases without relying on lookahead/lookbehind, using two substitution passes:
def camel_to_snake(name):
name = re.sub('(.)([A-Z][a-z]+)', r'\1_\2', name)
return re.sub('([a-z0-9])([A-Z])', r'\1_\2', name).lower()
print(camel_to_snake('camel2_camel2_case')) # camel2_camel2_case
print(camel_to_snake('getHTTPResponseCode')) # get_http_response_code
print(camel_to_snake('HTTPResponseCodeXYZ')) # http_response_code_xyz
To add also cases with two underscores or more:
def to_snake_case(name):
name = re.sub('(.)([A-Z][a-z]+)', r'\1_\2', name)
name = re.sub('__([A-Z])', r'_\1', name)
name = re.sub('([a-z0-9])([A-Z])', r'\1_\2', name)
return name.lower()
Snake case to pascal case
name = 'snake_case_name'
name = ''.join(word.title() for word in name.split('_'))
print(name) # SnakeCaseName
ANSWER 2
Score 307
There's an inflection library in the package index that can handle these things for you. In this case, you'd be looking for inflection.underscore():
>>> inflection.underscore('CamelCase')
'camel_case'
ANSWER 3
Score 163
I don't know why these are all so complicating.
for most cases, the simple expression ([A-Z]+) will do the trick
>>> re.sub('([A-Z]+)', r'_\1','CamelCase').lower()
'_camel_case'
>>> re.sub('([A-Z]+)', r'_\1','camelCase').lower()
'camel_case'
>>> re.sub('([A-Z]+)', r'_\1','camel2Case2').lower()
'camel2_case2'
>>> re.sub('([A-Z]+)', r'_\1','camelCamelCase').lower()
'camel_camel_case'
>>> re.sub('([A-Z]+)', r'_\1','getHTTPResponseCode').lower()
'get_httpresponse_code'
To ignore the first character simply add look behind (?!^)
>>> re.sub('(?!^)([A-Z]+)', r'_\1','CamelCase').lower()
'camel_case'
>>> re.sub('(?!^)([A-Z]+)', r'_\1','CamelCamelCase').lower()
'camel_camel_case'
>>> re.sub('(?!^)([A-Z]+)', r'_\1','Camel2Camel2Case').lower()
'camel2_camel2_case'
>>> re.sub('(?!^)([A-Z]+)', r'_\1','getHTTPResponseCode').lower()
'get_httpresponse_code'
If you want to separate ALLCaps to all_caps and expect numbers in your string you still don't need to do two separate runs just use | This expression ((?<=[a-z0-9])[A-Z]|(?!^)[A-Z](?=[a-z])) can handle just about every scenario in the book
>>> a = re.compile('((?<=[a-z0-9])[A-Z]|(?!^)[A-Z](?=[a-z]))')
>>> a.sub(r'_\1', 'getHTTPResponseCode').lower()
'get_http_response_code'
>>> a.sub(r'_\1', 'get2HTTPResponseCode').lower()
'get2_http_response_code'
>>> a.sub(r'_\1', 'get2HTTPResponse123Code').lower()
'get2_http_response123_code'
>>> a.sub(r'_\1', 'HTTPResponseCode').lower()
'http_response_code'
>>> a.sub(r'_\1', 'HTTPResponseCodeXYZ').lower()
'http_response_code_xyz'
It all depends on what you want so use the solution that best suits your needs as it should not be overly complicated.
nJoy!
ANSWER 4
Score 45
stringcase is my go-to library for this; e.g.:
>>> from stringcase import pascalcase, snakecase
>>> snakecase('FooBarBaz')
'foo_bar_baz'
>>> pascalcase('foo_bar_baz')
'FooBarBaz'