I have a tedious task to perform which involves reading information from SVN and updating a review checklist with the information. The ‘hard’ part of this whole process is the actual review.
The ‘easy’ part is looking up the revision information, paths, file names, etc and putting them into the spreadsheet. Despite the fact that this is the ‘easy’ part it’s also very easy to
mess up. This tedious copy and paste process has many steps, requires significant working memory and can be easily confusing. The worst part is that even if you do the ‘hard’ part well, you
need to do the ‘easy’ part perfectly because if you mess up a revision or a path there’s no way to link the work you did to the actual artifact being reviewed.
So, why not create a script?
Python can read SVN and filesystem information. It can read Excel files. It can write excel files. So, why not automate it?
To do so, I will need to be able to write Excel files with Python 3. More specifically, I will need to modify an existing template with information I glean from SVN and other sources.
Thus, this post will contain information on how to read, write and modify Excel files with Python 3
OpenPyXL
If you need a library that can read and write xlsx files (not just xls files), you want OpenPyXL. I don’t think there’s
any disagreemnt on that on Stack Overflow. As far as I can tell, it’s the best way to write .xlsx
files in Python.
Installation
OpenPyXL installed just fine using pip:
Reading Excel Files
I’m going to base all of this off of a tutorial I found here.
It looks pretty simple for most things. Here’s their simple code on how to read a cell:
Except I don’t see a good answer on how to select a sheet. I see the book.active thing and I don’t think it’s what I want. I want to select by name.
Selecting a Sheet by Name
Turns out this is simple too. See this StackOverflow answer. There’s a dictionary of sheets that you can index by name. Like this:
Getting a List of Sheet Names
This one’s easy too. I’m beginning to think using OpenPyXL will be easy.
Writing Data to Cells
Once again, easy.
From the tutorial I mentioned earlier:
It’s worth noting here that OpenPyXL uses 1-based indexing: the cell at row 2 column 2 is “B2”, not “C3”.
Creating a Sheet
Here’s how to create a new sheet in a workbook:
Writing a List of Values Into a Sheet
A good way to do this is:
Looping Over All Rows in a Sheet
Getting a Range of Cells
There are times when you want to pluck a range of cells and work with them. Here’s how you can do that:
You can do it by rows. This code retrieves all rows between 3 and 10 (inclusive) and returns them as a list of lists :
And, to do the same thing with columns:
Retrieving Just Data (Not Formulas) From a Workbook
I had a bit of a shock when I opened a workbook, specified a range of cells and tried to get values: I was getting formulas instead of the result of formulas.
There’s a way to avoid this: load the workbook as ‘data only’. You do that like this:
This way, when you acces the value of a cell, you get the result of the formula.
OpenPyXL Notes
OpenPyXL uses 1-based indexing: the cell at row 2 column 2 is “B2”, not “C3”.
xlrd, xlt and xlutils
There are multiple ways to read and write Excel files in Python. One way to do it is a collection of libraries that all work together:
HOWEVER!!! MAJOR CAVEAT THAT BORKED MY ATTEMPT TO USE THESE LIBRARIES FOLLOWS!
xlwt CANNOT write xlsx or xlsm files - only the older xls files
I put a lot of effort into xlrd, xlwt and xlutils - all for nothing because my application
requires that I write .xlsx files. The information here will stay, because it’s useful. How is it useful you ask? Well, if there’s one thing I’ve found it’s that you have to use the right tool for the job - and there’s sadly many tools out there to read and write Excel files in Python. It’s not enough to just pick one and say ‘this is the only one I’m going to use’. One big reason: speed. I’ve found that some libraries are much slower than others. Thus, if you only, for example, needed to read Excel files, xlrd might be perfect for you over another, more comprhensive library that is much slower. You really have to watch out on large files sometimes - using a different, less featureful, but faster library may be useful.
Installation
Here’s what I did on a Windows 10 PC with Python 3.81 installed:
I have both Python 2.7 and 3 installed, so in order to make sure that pip installed the libraries for Python 3 I had to use pip3 instead of pip.
Example Code
Rather than write a lot of notes about how to use the libraries, I’ll just show you an example program. It
does two things: reads specific data from an Excel file, and finds and modifies a specific incorrect value
within an Excel file.
For reference, the Excel data being used looks like this:
Name
Telephone
Address
Notes
Don Juan
1234567890
1121 Nowhere St. Tokyo Japan
I like this guy
Jean-Luc Picard
NA
France, somewhere
Why does a Frenchman have an English accent?
The script will read the passed Excel file and print the data. Then, it will find and correct Jean-Luc’s Address entry and save the result in a new Excel file.
Notes
I will document anything interesting or unique about specific workflows with xlrd, xlwt and xlutils here.
Modifying Existing Excel Files With xlrd and xlwt
So, it seems, according to this answer, that the way to do this is to:
Use xlrd to read the excel file (i.e., ‘workbook’)
Copy the workbook with xlutils copy function
Modify the workbook in memory
Save it to a new file.
The code snippet they used is here:
I’ll create a simple Excel file with some information in it, load it, modify it, and finally, save it in a new file.
Writing to Excel Files
xlwt CANNOT write .xlsx or .xlsm files, only .xls!!
It’s worth noting that xlwt and xlrd do not have equivalent APIs. I have no idea why this is, but the fact
is that there’s no easy way to iterate over the sheets in a workbook or to retrieve a sheet by name with
xlwt. xlrd has it just fine, but not xlwt. So you need a workaround, found this.
It’s likely you or I will find more issues such as this while working with xlwt.
The upshot of most of this is that you’ll have to do the heavy-duty reading using xlrd and then sparingly
use xlwt to modify the data.