The Python Oracle

Python regex? I'm in a trouble

--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Magical Minnie Puzzles

--

Chapters
00:00 Python Regex? I'M In A Trouble
01:40 Answer 1 Score 2
02:02 Accepted Answer Score 4
02:44 Answer 3 Score 0
03:37 Answer 4 Score 0
04:05 Thank you

--

Full question
https://stackoverflow.com/questions/3829...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #regex

#avk47



ACCEPTED ANSWER

Score 4


You don't need regular expressions, actually. You can use them though. There's a rather simple pattern in your information:

<Train number> - <city>|<Train number>-<identifier>

So let's look at what happens if you do

>>> '123 - ROMA TERMINI|123-S01358'.split('|', 1)
['123 - ROMA TERMINI', '123-S01358']

So now you have the first part of what you want. The second part can then be fixed using something similar, let's look at

>>> '123-S01358'.split('-', 1)
['123', 'S01358']

So you can do

>>> '123-S01358'.split('-', 1)[-1]
'S01358'

And you're done!

If you combine all of this together you should get your answer.




ANSWER 2

Score 2


I must use REGEX, true?

Not true.

I think a better solution is to parse each line into tokens and assign them to sensible variables. You need a solution that is less about string primitives and regex; more about objects and encapsulation.

I'd design a REST API that let me query for trains easily and return the response as JSON objects.




ANSWER 3

Score 0


First, you have to convert your bytearrays to str objects.

With the examples you provided:

examples = [
    b'2097 - MILANO CENTRALE|2097-S01700\n',
    b'123 - ROMA TERMINI|123-S01358\n',
    b'123 - TREVIGLIO|123-S01703\n'
]

Assuming that format is:

[TRAIN_NAME]|[TRAIN_NAME_REPEATED]-[TRAIN_NUMBER]\n

We don't need any regexes, we can simply split entries by delimiters:

for example_bytes in examples:
    example = example_bytes.decode("utf-8").split("|")
    # example = ['2097 - MILANO CENTRALE', '2097-S01700\n']

    train_name = example[0]
    # train_name = '2097 - MILANO CENTRALE'

    train_number = example[1].split("-")[1]
    # train_number = 'S01358'

    A.append(train_name)
    B.append(train_number.rstrip())

Then to see the result:

print(A)
# ['2097 - MILANO CENTRALE', '123 - ROMA TERMINI', '123 - TREVIGLIO']
print(B)
# ['S01700', 'S01358', 'S01703']

If you don't want your entries to be repeated (if it's even possible), I'd suggest you using sets instead of lists.

Check the API documentation, you depend on the format it provides entries in.




ANSWER 4

Score 0


You can actually get the data you want in json format making the correct post, for * Treno - Stazione* using the code for ROMETTA MESSINESE:

from pprint import pprint as pp
import requests
import datetime

station = "S12049"
dt = datetime.datetime.utcnow()
arrival = "http://www.viaggiatreno.it/viaggiatrenonew/resteasy/viaggiatreno/arrivi/{station}/{iso}"
with requests.Session() as s:
   r = s.get(departure.format(station=station, iso=dt.strftime("%a %b %d %Y %H:%M:%S GMT+000 (UTC)")))
   pp(r.json())

And departure:

arrival = "http://www.viaggiatreno.it/viaggiatrenonew/resteasy/viaggiatreno/partenze/{station}/{iso}"
with requests.Session() as s:
   r = s.get(arrival.format(station=station, iso=dt.strftime("%a %b %d %Y %H:%M:%S GMT+000 (UTC)")))
   pp(r.json())