The Python Oracle

Ordering an array for maximal pairwise matches

--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: The Builders

--

Chapters
00:00 Ordering An Array For Maximal Pairwise Matches
02:29 Accepted Answer Score 5
04:38 Answer 2 Score 2
05:55 Answer 3 Score 1
06:26 Answer 4 Score 0
06:42 Thank you

--

Full question
https://stackoverflow.com/questions/4912...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #arrays #numpy

#avk47



ACCEPTED ANSWER

Score 5


You've described a graph where the vertices are numbers, and the edges are your pairs.

Your conditions specify that each number appears once or twice in the list. This means that connected components in your graph are lines (or cycles). You can find them using this algorithm:

  • [Line exists] If possible, pick a number with degree 1 (that is, it only appears once in the list). Follow the chain of pairs as far as you can, adding them to the output and removing the traversed vertices from your graph.
  • [Cycle exists] If there was no number with degree 1 it means all the components are cycles. Pick any vertex (it will have degree 2). Follow the pairs as before, adding them to the output and removing the traversed vertices, but this time stop when you reach your original number.
  • Repeat, until you've used up all the vertices in the graph.

You can run this algorithm efficiently: maintain a set of vertices of degree 1 and another of degree 2. When you consume an edge (a pair in your original list), modify the sets appropriately: remove an endpoint of degree 1 from the first set, and move an endpoint from the degree 2 set to the degree 1 set. Alternatively, use a priority queue.

You'll also need an efficient lookup for your pairs: build a dict from vertex to a list of adjacent vertices.

Using these ideas, you can find a best ordering in linear time (assuming O(1) set and dict implementations).

Here's a somewhat clunky implementation.

import collections

def handle_vertex(v, vs):
  if v in vs[0]:
    vs[0].remove(v)
  elif v in vs[1]:
    vs[0].add(v)
    vs[1].remove(v)

def follow(edgemap, start, vs):
  """Follow a path in the graph, yielding the edges."""
  v0 = start
  last = None
  while True:
    # All vertices that we can go to next.
    next_v = [v for v in edgemap[v0] if v != last]
    if not next_v:
      # We've reached the end (we must have been a line).
      return
    assert len(next_v) == 1 or (v0 == start and len(next_v) == 2)
    next_v = next_v[0]
    # Remove the endpoints from the vertex-degree sets.
    handle_vertex(v0, vs)
    handle_vertex(next_v, vs)
    yield v0, next_v
    if next_v == start:
      # We've got back to the start (we must have been a cycle).
      return
    v0, last = next_v, v0

def pairsort(edges):
  edgemap = collections.defaultdict(list)
  original_edges = {}
  for a, b in edges:
    # Build the adjacency table.
    edgemap[a].append(b)
    edgemap[b].append(a)
    # Keep a map to remember the original order pairs appeared in
    # so we can output edges correctly no matter which way round
    # we store them.
    original_edges[a, b] = [a, b]
    original_edges[b, a] = [a, b]
  # Build sets of degree 1 and degree 2 vertices.
  vs = [set(), set()]
  for k, v in edgemap.iteritems():
    vs[len(v) - 1].add(k)
  # Find all components that are lines.
  while vs[0]:
    v0 = vs[0].pop()
    for e in follow(edgemap, v0, vs):
      yield original_edges[e]
  # Find all components that are cycles.
  while vs[1]:
    v0 = vs[1].pop()
    for e in follow(edgemap, v0, vs):
      yield original_edges[e]

input = [
    [ 4, 10],
    [ 4,  2],
    [ 0,  7],
    [ 5, 11],
    [ 6,  8],
    [ 3,  6],
    [ 9,  7],
    [ 2, 11],
    [ 9,  5],
    [ 8,  1]]

print list(pairsort(input))



ANSWER 2

Score 2


I am not certain that i understand every detail in your question; but if i did, then this should do what you want.

The basic idea is very simple: for two 1D arrays, you can cycle through all of the pairwise combinations by lining them up, holding one still and rolling the second one forward one increment at a time. If you use NumPy's roll function, then the value(s) that fall off the end of the array as you roll it forward just get pushed back onto the front, like a treadmill.

After that's set up, then just diff the two vectors along the correct axis and sum the 0s. Keep track of those sums (tx, below) then get index corresponding to the max value of those sums, NP.argmax(tx).

import numpy as NP

# create some data:
c1 = NP.random.randint(0, 10, 15)
c2 = NP.random.randint(0, 10, 15)
c12 = NP.concatenate((c1, c2)).reshape(15, 2)

tx = []    # to hold the indices of the various orderings

for i in range(15) :
    v = NP.diff(c12, axis=0)
    num_equal_neighbors = NP.sum( NP.sum(v==0, axis=0) )
    tx.append(num_equal_neighbors)
    c2 = NP.roll(c2, 1)
    c12[:,1] = c2

Now find which ordering of the two vectors gives the most 'pairwise' matches:

best_order = NP.argmax(tx)

therefore, the desired ordering occurs when the two 1D arrays are arranged such that the second array is rolled *best_order* number of places (and the first array is left as it is )




ANSWER 3

Score 1


Here is a simpler implementation of basically the same algorithm described in Paul Hankin's answer, using different data structures. It also runs in linear time.

edges = [[4, 10], [4,  2], [0,  7], [5, 11], [6,  8],
         [3,  6], [9,  7], [2, 11], [9,  5], [8,  1]]

def add_edge(vertex, adj):
    edge = edges[adj[0]][:]
    ordered_edges.append(edge[:])
    edge.remove(vertex)
    new_vertex = edge[0]
    new_adj = adj_edges[new_vertex]
    new_adj.remove(adj[0])
    del adj[0]
    return new_vertex, new_adj

adj_edges = {}
for i, edge in enumerate(edges):
    for vertex in edge:
        adj_edges.setdefault(vertex, []).append(i)
ordered_edges = []
for vertex, adj in adj_edges.iteritems():
    while len(adj) == 1:
        vertex, adj = add_edge(vertex, adj)
for vertex, adj in adj_edges.iteritems():
    while adj:
        vertex, adj = add_edge(vertex, adj)



ANSWER 4

Score 0


See also Longest path problem : NP complete for general graphs, or shortest path with weights negated for acyclic.
Also, try the graph shown in Spanning tree: longest is harder than just long.