The Python Oracle

Iterate through columns in Read-only workbook in openpyxl

--------------------------------------------------
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
and get $2,000 discount on your first invoice
--------------------------------------------------

Take control of your privacy with Proton's trusted, Swiss-based, secure services.
Choose what you need and safeguard your digital life:
Mail: https://go.getproton.me/SH1CU
VPN: https://go.getproton.me/SH1DI
Password Manager: https://go.getproton.me/SH1DJ
Drive: https://go.getproton.me/SH1CT


Music by Eric Matyas
https://www.soundimage.org
Track title: Switch On Looping

--

Chapters
00:00 Iterate Through Columns In Read-Only Workbook In Openpyxl
01:55 Accepted Answer Score 4
02:34 Answer 2 Score 10
03:10 Answer 3 Score 1
03:23 Thank you

--

Full question
https://stackoverflow.com/questions/4758...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #excel #openpyxl

#avk47



ANSWER 1

Score 10


If the worksheet has only around 100,000 cells then you shouldn't have any memory problems. You should probably investigate this further.

iter_cols() is not available in read-only mode because it requires constant and very inefficient reparsing of the underlying XML file. It is however, relatively easy to convert rows into columns from iter_rows() using zip.

def _iter_cols(self, min_col=None, max_col=None, min_row=None,
               max_row=None, values_only=False):
    yield from zip(*self.iter_rows(
        min_row=min_row, max_row=max_row,
        min_col=min_col, max_col=max_col, values_only=values_only))

import types
for sheet in workbook:
    sheet.iter_cols = types.MethodType(_iter_cols, sheet)



ACCEPTED ANSWER

Score 4


According to the documentation, ReadOnly mode only supports row-based reads (column reads are not implemented). But that's not hard to solve:

wb2 = Workbook(write_only=True)
ws2 = wb2.create_sheet()

# find what column I need
colcounter = 0
for row in ws.rows:
    for cell in row:
        if cell.value == "PerceivedSound.RESP":
            break
        colcounter += 1
    
    # cells are apparently linked to the parent workbook meta
    # this will retain only values; you'll need custom
    # row constructor if you want to retain more

    row2 = [cell.value for cell in row]
    ws2.append(row2) # preserve the first row in the new file
    break # stop after first row

for row in ws.rows:
    row2 = [cell.value for cell in row]
    row2.append(doStuff(row2[colcounter]))
    ws2.append(row2) # write a new row to the new wb
    
wb2.save('newfile.xlsx')
wb.close()
wb2.close()

# copy `newfile.xlsx` to `generalpath + exppath + doc`
# Either using os.system,subprocess.popen, or shutil.copy2()

You will not be able to write to the same workbook, but as shown above you can open a new workbook (in writeonly mode), write to it, and overwrite the old file using OS copy.




ANSWER 3

Score 1


This might be slower solution but given your query was to iterate through a single row tuple- i found a better solution

rowId=1
for i in range(len(ws[str(rowId)])):
    #print(str(ws[dbNameRow][i].value) + ' ' + str(i))
    if ws[dbNameRow][i].value == "<Provide your search string here>":
            lastColumn=i+1 #because counter starts from 0