How-to Access SVN with Python
Introduction
I have a tedious task to perform which involves reading information from SVN and updating a review checklist with the information. The ‘hard’ part of this whole process is the actual review. The ‘easy’ part is looking up the revision information, paths, file names, etc and putting them into the spreadsheet. Despite the fact that this is the ‘easy’ part it’s also very easy to mess up. This tedious copy and paste process has many steps, requires significant working memory and can be easily confusing. The worst part is that even if you do the ‘hard’ part well, you need to do the ‘easy’ part perfectly because if you mess up a revision or a path there’s no way to link the work you did to the actual artifact being reviewed.
So, why not create a script?
Python can read SVN and filesystem information. It can read Excel files. It can write excel files. So, let’s just automate this whole thing.
This post will focus on the SVN aspects of the process. I have another that focuses on the Excel aspects.
Background
There is a package called PySVN that handles the interface between Python and SVN. It’s a bit of fun, so buckle up.
PySVN Installation
On a Windows 10 PC using Python 3 (3.8.1) I did the following:
Hmmm, odd. Usually that works…
It’s really odd. I can’t find a lot about why pip isn’t working. That being said, it seems you just install it with an installer, from their website.
The download website is here.
I’m downloading the version for Python 3.8.1 and WIndows 1 x64, which is here. Note: it’s the 1.14.0 version.
It brings me to SourceForge (ew) and the download starts.
Alright, downloaded. Now, starting the installer.
Uh, where is it?
Oh look, Windows Defender says that it’s scard of this download. I tell it ‘no’. You will run it.
Anyway, on to the actual installation.
- Select Start Menu Folder - The defaults are fine, I click ‘Next’
- Ready to Install - Everything looks good, I click ‘Install’
- Installation occurs…
- And then, it finishes. So I click ‘Finish’
Nothing major there!
Hello World Script
The first thing I like to do with anything new is to do a ‘Hello World’ type script. In this case, my workflow depends on several things:
- Getting the full SVN URL of files checked out on to my PC
- Getting the last few log entries of files check out on my PC
- Finding the last user to commit a file check out on my PC
The first thing that shows up when I google for ‘pysvn examples’ is this page.
It has quite a few little snippets that look rather tasty. Let’s see if I can find what I’m looking for.
Eh, no. I can’t. Dang. What else do we have?
One thing I have noticed over the years of using PySVN is that it has curiously few examples running around out there. Few StackOverflow questions, few blog posts, etc.
Getting a Login
One of the first things you’ll need to do is specify a callback function to get a login to your SVN server. Not much will work without it. This is the documentation for implementing a callback, and below you’ll see I’ve generated a simple function to suffice for the callback and register it to the SVN client.
That at least runs without error, so I’ll move on to the next part.
FYI, none of those entries above are my real password to anything.
Getting SVN Info for Working Copy Files
I think the function I need to use is the info2 function. Read more about it here.
I need to pick a file, point info2 towards it, then see what it returns. Here’s the code I’m trying:
What the docs say will be returned is this:
Historically, PySVN has a very confusing way of storing useful information. Actually getting the information you want out of what it returns is complicated. Here’s what comes out of the print statement:
See? That’s just kinda weird. A tuple inside a list? And the first item of the tuple is just the file path I passed to it? So, to access the PysvnInfo member, you have to do this:
But the second print statement prints this out:
Brilliant. So, we start trying to figure out what’s in there. Supposedly it’s a dictionary, right? So we’ll try accessing some of the named members from the website. Like this:
Which produces:
Okay, so I didn’t put the actual URL there, but take my word for it: it prints out the URL.
It should be easy to get the rest of the information that we want.
Getting Revision Information
If you thought an SVN revision was just an integer number, raise your hand.
Raises hand
Turns out - I’m wrong. An SVN revision (as far as PySVN is concerned) is a pysvn.Revision type.
Now what, pray tell, is that?
From here:
If you’re interested in what the opt_revision_kind enumeration information is, look here:
Okay, so what will print out if we try to access the revision code? We try like this:
That produces this:
Interesting, but supposedly tehre were three elements: kinda, date and number. Can we access them or am I misunderstanding things?
Trying this:
Which produces:
Well, that at least bears out the part about having three elements. Now at least I can access the specific parts of the revision information that I’m looking for.
Not bad.
However, we must take into account a wrinkle with SVN: last-changed revision vs. current revision.
Current revision is basically the most recent revision of the working copy. Imagine if you did an update on the root of a working copy - when it finished you’d see something like “Updated to revision xxxx”. Now, the working copy is considered to be at revision xxxx. However, many files within the working copy might not have been changed in revision xxxx - they have a ‘last-changed’ revision which will be lower than the working copy revision. When SVN says ‘Updated to revision xxxx’ what it’s saying is ‘all the files in this working copy are up-to-date as of revision xxxx’. If you pick a file that was last updated in say, revision 443 and tell SVN to check out that file at any revision later than 443, you’ll get the most up-to-date file. If you choose to check out a revision earlier than 443, you will not have the most up-to-date file. 443 would be the ‘last-changed revision’, while xxxx would be the ‘revision’.
Generally, the ‘revision’ isn’t very useful when you’re looking at individual files. Instead, you’ll want to go with the ‘last-changed revision’.
In order to print out the information for ‘rev’ vs. ‘last-changed-rev’, I would use this code:
And you get this output:
Not bad, now we can get all the revision information we want.
Getting Author/Committer Information
I’m guessing that the author information is this one:
And luckily, it’s just a string, so it should be easy to access like this:
Which produces:
Keep in mind, the ‘author’ will be a username, not necessarily a full-fledged first and last name.
Getting Log Entries
Now we’re getting a bit farther. Most of the information I was looking for could be found with the info2 function, but I don’t see anything about the log in the info2 documentation, so it must be elsewhere.
But where?
Look no further than this.
It’s a single function call that returns this:
Alright, so what does this mean practically?
Here’s the code I run:
And the result:
It’s worth noting that retrieving the log takes a second - much more time than the info2 command took.
Okie…. this file has two entries in its log when I look at it, so that must be theese. Let’s assume those are dictionaries as specified above and see if we can print out the information we want for each entry. Here’s the code I created:
And the output:
I had to include the datetime library to get the correct formatting function since the timestamp comes back in a float format. But overall, not too bad!
Listing the SVN Contents of a Directory
I have a situation where I need to get the SVN information (URL, last changed revision) for a few files in a directory. This presents an interesting twist: in a working copy, not all files may be version controlled. There could be uncommitted files hanging around there - this means it’s important not to do a directory listing of files, but instead to do an SVN listing of files.
Turns out there’s a useful command called ‘ls’. Ha. Imagine that. Let’s see what the documentation says about it.
Oh. Okay. Where’s that?
Here.
You pass it a path and it returns some information:
So, I whip this code up:
Which produces this result:
Well, we’ve got a problem: the paths are relative repository paths, not complete URLs. The server name and protocol is not present, which is not what I want.
The question is: how do I get that?
I could use the list command to get a list of files and then use the info2 command to get their URLs.
Ah, but I’ve encountered a twist!
When you list a directory, the first entry returned is the directory itself!
Not really a problem, but if you’re expecting just the files, you’ll get in some trouble.
Anyway, here’s the code to do get the full URL with the info2 command:
Checking a Working Copy for Uncommitted Changes
One of the sanity checks I’ll have to do is to make sure that I haven’t failed to commit changes I’ve made to my checklists before I close an issue.
This Stack Overflow answer suggests a method.
Here’s the code:
I think I’d like to change it to something that returns a list of files with uncommitted changes. Let’s look up the status function. Documentation is here.
It returns a PysvnStatus object, what’s that?
It’s this:
Alright, the big question is going to be what the path returns - local? Repo? URL?
The other big issue is that the example code only looks for status NOT normal. That could include more things than just normal or uncommitted changes.
This is the documentation for the pysvn_wc_status_kind type, reproduced here:
Okay, so we can check for a variety of non-ideal status. Honestly, at this point, if any status is not ‘normal’ it pump the brakes, so I guess that guy was right….
Anyway, here’s what I wrote for code:
And it produced this:
And all of the file paths were local working directory paths. Doesn’t really matter to me.
It’s worth noting, again, that the first entry was the entry for the directory, not a file within the directory.
Committing Changes in a Working Copy
Well, once you’ve ascertained there are uncommitted changes, you’ll want to commit them.
But PySVN doesn’t use the verb ‘commit’ for this, it uses ‘checkin’.
Only big question is how to get the log message. I know there can be a log message callback, but I think I’ll just use a text input to do this.
The try/catch is there because I know that it’s possible that there will be circumstances the commit will fail - one I know of for sure is a pre-commit hook failing. However, I don’t know exactly what the exception will be and the documentation isn’t explicit on that, so I’m just catching a generic exception.
Yes, I know - that’s bad. But until that exception occurs I won’t know what to look for specifically. Or I could do more research. But I won’t right now.
Also, I’ve had issues using os.linesep within log messages, as well as ‘\r\n’. The server I’m working on only wants the ‘\n’.
Adding Files to a Working Copy
If you have a newly-created file, you’ll need to add it to your working copy and then commit to get the file into SVN. Here’s how you do it.
Notes
These are miscellaneous notes I’m keeping about using PySVN.
Single-Threadedness
One of the interesting things about PySVN is that there’s only one SVN client on your PC - anywhere. That means if you’re messing around with the command-line SVN client and you try to run a Python script that uses the SVN client your Python script will have to wait for the command-line process to finish before it can start.
One important upshot of this is that you really can’t do multi-threaded access to the SVN client in Python. If you try, you’re gonna have a confusing and frustrating time.
Only ever access the SVN client from _one_ Python thread