Efficient algorithm for determining values not as frequent in a list

--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Mysterious Puzzle

--

Chapters
00:00 Efficient Algorithm For Determining Values Not As Frequent In A List
02:34 Answer 1 Score 5
02:59 Answer 2 Score 0
03:41 Answer 3 Score 7
04:43 Answer 4 Score 3
05:07 Thank you

--

Full question
https://stackoverflow.com/questions/2358...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #algorithm #list #frequency

#avk47

ANSWER 1

Score 7

Rewritten with the database stuff in fill-in pseudocode.

If I understand the problem correctly, I'd treat the questions (or their IDs as proxies) like a physical deck of cards: for each user, shuffle the deck and deal them a question at a time; if they want more than len(deck) questions, just start over: shuffle the deck into a new order and do it again. When a question is seen for the nth time, all other questions will have been seen n or n-1 times.

To keep track of what questions the user has had available, we put the unused question IDs back in the database and increment the "how many times through" counter whenever we need a new deal.

Something like:

from random import shuffle

def deal():
    question_IDs = get_all_questions(dbconn) # all questions
    shuffle(question_IDs)
    increment_deal_count(dbconn, userID) # how often this student has gotten questions
    return question_IDs


count_deals = get_stored_deals(dbconn, userID) # specific to this user
if count_deals: 
    question_IDs = get_stored_questions(dbconn, userID) # questions stored for this user 
else: # If 0 or missing, this is the first time for this student
    question_IDs = deal()


while need_another_question(): #based on exam requirements
    try:
        id = question_IDs.pop()
    except IndexError:
        question_IDs = deal()
        id = question_IDs.pop() # Trouble if db is ever empty. 

    use_question(id) # query db with the ID, then put question in print, CMS, whatever

# When we leave that while loop, we have used at least some of the questions
# question_IDs lists the *unused* ones for this deal
# and we know how many times we've dealt.

store_in_db(dbconn, userinfo, question_IDs)
# If you want to know how many times a question has been available, it's
# count_deals - (ID in question_IDs)
# because True evaluates to 1 if you try to subtract it from an integer.

ANSWER 2

Score 5

Why not have two lists, one for questions not yet picked and one for questions that have been picked. Initially, the not-yet-picked list will be full, and you will pick elements from it which will be removed and added to the picked list. Once the not-yet-picked list is empty, repeat the same process as above, this time using the full picked list as the not-yet-picked list and vice versa.

ANSWER 3

Score 3

To implement your algorithm, you only need to shuffle the list and pass through it, and when done, repeat.

No need to copy the list or juggle items between two lists, just use the following control flow, for example:

import random

def ask_questions(list_of_questions):
    while True:
        random.shuffle(list_of_questions)
        for question in list_of_questions:
            print(question)
            # Python 3 use input not raw_input
            cont = raw_input('Another question?') 
            if not cont:
                break
        if not cont:
            break

ANSWER 4

Score 0

Let's define a "pivot" that separates a list into two sections. The pivot partitions the array such that all numbers before the pivot has been picked one more than the numbers after the pivot (or more generally, all numbers before pivot are ineligible for picking, while all numbers after the pivot are eligible for picking).

You simply need to pick a random item from the list of numbers after the pivot, swap it with the number on the pivot, and then increment the pivot. When the pivot reaches the end of the list, you can reset it back to the start.

Alternatively, you can also use two lists, which is much easier to implement, but is slightly less efficient as it needs to expand/shrink the list. Most of the time, the ease of implementation would trump the inefficiency so the two-list would usually be my first choice.