The Python Oracle

Does Slicing `a` (e.g. `a[1:] == a[:-1]`) create copies of the `a`?

--------------------------------------------------
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Hypnotic Orient Looping

--

Chapters
00:00 Does Slicing `A` (E.G. `A[1:] == A[:-1]`) Create Copies Of The `A`?
01:18 Accepted Answer Score 11
02:13 Answer 2 Score 3
02:52 Answer 3 Score 0
03:27 Answer 4 Score 3
05:05 Thank you

--

Full question
https://stackoverflow.com/questions/2045...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #cpython #memoryefficient #pythoninternals

#avk47



ACCEPTED ANSWER

Score 11


The documentation is unclear because slicing different objects does different things. In the case of a list, slicing does make a (shallow) copy1. Note that this is a feature of python lists independent of python implementation. In the case of other objects (like numpy arrays), it might not create a copy.

If you want a better way to check that all the elements in a list are the same, I would probably recommend:

 all(lst[0] == item for item in lst)

From a performance standpoint, your friend's code might actually outperform this for small lists since list slicing is so optimized. But, this is IMHO much easier to tell what is going on and has the opportunity to "short circuit" as soon as it finds a non-match.

1The actual function to look at is list_subscript, but for most cases, it just calls list_slice




ANSWER 2

Score 3


If a is a list or tuple or string, len(a) is n, and n > 0, then each slice creates (at C level) a new array of length n-1. At C level, all objects in CPython are implemented as pointers, so these new arrays contain n-1 pointers copied from a (well, not really for strings - the string representation is more frugal).

But, as @mgilson said, what slicing returns is up to a's type. Some types may return a compact descriptor instead of copying anything. And a type may even implement slicing in such a way that the code shown wouldn't work as intended.

But you really had a list in mind ;-)




ANSWER 3

Score 3


Yes, with list objects Python creates shallow copies when slicing, however the loop is made in C (for cpython) and for that reason is going to be much faster than anything you can write in Python for doing the same. Looping two times in C for the shallow copy and looping again for comparison is going to be faster than just looping in Python once.

Please remember than cpython is quite often fast enough, but that Python code is still about 100 times slower than C code. Leaving cpython doing the loops for you is therefore better in general if you want a bit of speed. Note that even things like c = a + b in Python mean the execution of a lot of logic (including branches and memory allocations).

On the other side however if for your code this kind of micro optimization is fundamental then probably Python is not the right tool for the problem you are fighting and you should consider other options (like writing a small C++ extension interfaced with sip, using Cython, PyPy ...).

For sure that code is not readable for the untrained eye and in case the list is long and often not constant then all(y == x[0] for y in x) is going to be faster because of short circuiting (even if the loop is in Python and an extra subscript operation is done for every element).

Readability counts. A lot.

EDIT

Another interesting option for having C code looping over the elements is

x and x.count(x[0]) == len(x)

This doesn't offer short-circuiting, but on my pc is about 75 times faster than the all-based solution for a list for 1000 elements all of them equal and about 6 times faster than x[1:] == x[:-1].

I find it also a bit more readable than x[1:] == x[:-1], but it's probably a matter of taste.




ANSWER 4

Score 0


For vanilla lists, slicing does create a copy. You can prevent the copy using iteration instead:

import itertools
a1 = iter(a)
a2 = iter(a)
a2.next() # start a2 iterator one removed
all_are_identical = all((i1 == i2 for i1, i2 in itertools.izip(a1, a2)))

The line (i1 == i2 for i1, i2 in itertools.izip(a1, a2)) creates a generator that will return whether each element in a is equal to the next, one at a time, to all. The results are evaluated one by one, instead of being put in a list first, so you save memory at the cost of some performance.