How-to Access SVN with Python

Introduction

I have a tedious task to perform which involves reading information from SVN and updating a review checklist with the information. The ‘hard’ part of this whole process is the actual review. The ‘easy’ part is looking up the revision information, paths, file names, etc and putting them into the spreadsheet. Despite the fact that this is the ‘easy’ part it’s also very easy to mess up. This tedious copy and paste process has many steps, requires significant working memory and can be easily confusing. The worst part is that even if you do the ‘hard’ part well, you need to do the ‘easy’ part perfectly because if you mess up a revision or a path there’s no way to link the work you did to the actual artifact being reviewed.

So, why not create a script?

Python can read SVN and filesystem information. It can read Excel files. It can write excel files. So, let’s just automate this whole thing.

This post will focus on the SVN aspects of the process. I have another that focuses on the Excel aspects.

Background

There is a package called PySVN that handles the interface between Python and SVN. It’s a bit of fun, so buckle up.

PySVN Installation

On a Windows 10 PC using Python 3 (3.8.1) I did the following:

C:\Users\sfrieder>pip3 install pysvn
Collecting pysvn
  ERROR: Could not find a version that satisfies the requirement pysvn (from versions: none)
ERROR: No matching distribution found for pysvn

Hmmm, odd. Usually that works…

It’s really odd. I can’t find a lot about why pip isn’t working. That being said, it seems you just install it with an installer, from their website.

The download website is here.

I’m downloading the version for Python 3.8.1 and WIndows 1 x64, which is here. Note: it’s the 1.14.0 version.

It brings me to SourceForge (ew) and the download starts.

Alright, downloaded. Now, starting the installer.

Uh, where is it?

Oh look, Windows Defender says that it’s scard of this download. I tell it ‘no’. You will run it.

Anyway, on to the actual installation.

Select Start Menu Folder - The defaults are fine, I click ‘Next’
Ready to Install - Everything looks good, I click ‘Install’
Installation occurs…
And then, it finishes. So I click ‘Finish’

Nothing major there!

Hello World Script

The first thing I like to do with anything new is to do a ‘Hello World’ type script. In this case, my workflow depends on several things:

Getting the full SVN URL of files checked out on to my PC
Getting the last few log entries of files check out on my PC
Finding the last user to commit a file check out on my PC

The first thing that shows up when I google for ‘pysvn examples’ is this page.

It has quite a few little snippets that look rather tasty. Let’s see if I can find what I’m looking for.

Eh, no. I can’t. Dang. What else do we have?

One thing I have noticed over the years of using PySVN is that it has curiously few examples running around out there. Few StackOverflow questions, few blog posts, etc.

One of the first things you’ll need to do is specify a callback function to get a login to your SVN server. Not much will work without it. This is the documentation for implementing a callback, and below you’ll see I’ve generated a simple function to suffice for the callback and register it to the SVN client.

import pysvn

def get_login(realm,username,may_save):
    return True,"steve","password",True
    
client = pysvn.Client()
client.callback_get_login = get_login

That at least runs without error, so I’ll move on to the next part.

FYI, none of those entries above are my real password to anything.

Getting SVN Info for Working Copy Files

I think the function I need to use is the info2 function. Read more about it here.

I need to pick a file, point info2 towards it, then see what it returns. Here’s the code I’m trying:

        fileInfo = client.info2(filePath)
        print(str(fileInfo))

What the docs say will be returned is this:

The info_dict contains:

URL - URL or path
rev - pysvn.Revision or None
kind - kind of path
repos_root_URL - string
repos_UUID - string
last_changed_rev - pysvn.Revision or None
last_changed_date - time or None
last_changed_author - string or None
lock - None or dictionary containing:
path - string
token - string or None
owner - string or None
comment - string or None
is_dav_comment - true if is DAV comment
creation_date - time or None
wc_info - None if url_or_path is a URL; otherwise a dictionary containing:
schedule - pysvn.wc_schedule or None
copyfrom_url - string or None
copyfrom_rev - pysvn.Revision or None
text_time - time or None
prop_time - time or None
checksum - string or None
conflict_old - string or None
conflict_new - string or None
conflict_wrk - string or None
prejfile - string or None

Historically, PySVN has a very confusing way of storing useful information. Actually getting the information you want out of what it returns is complicated. Here’s what comes out of the print statement:

[('<local file path>', <PysvnInfo ''>)]

See? That’s just kinda weird. A tuple inside a list? And the first item of the tuple is just the file path I passed to it? So, to access the PysvnInfo member, you have to do this:

        fileInfo = client.info2(filePath)
        print(str(fileInfo))
        print(str(fileInfo[0][1]))

But the second print statement prints this out:

\
<PysvnInfo ''>

Brilliant. So, we start trying to figure out what’s in there. Supposedly it’s a dictionary, right? So we’ll try accessing some of the named members from the website. Like this:

        fileInfo = client.info2(filePath)
        print(str(fileInfo))
        print(str(fileInfo[0][1]))
        svnFileInfo = fileInfo[0][1]
        print(str(svnFileInfo["URL"]))

Which produces:

[('<local file path>', <PysvnInfo ''>)]
<PysvnInfo ''>
<File URL>

Okay, so I didn’t put the actual URL there, but take my word for it: it prints out the URL.

It should be easy to get the rest of the information that we want.

Getting Revision Information

If you thought an SVN revision was just an integer number, raise your hand.

Raises hand

Turns out - I’m wrong. An SVN revision (as far as PySVN is concerned) is a pysvn.Revision type.

Now what, pray tell, is that?

From here:

The Revision object has three member variables:

kind - the kind of revision, its value is one of the opt_revision_kind enumerations.
date - date and time when kind is opt_revision_kind.date, as seconds since the epoch which is compatible with python's time module.
number - revision number when kind is opt_revision_kind.number

If you’re interested in what the opt_revision_kind enumeration information is, look here:

unspecified - No revision information given.
number - revision given as number
date - revision given as date
committed - rev of most recent change
previous - (rev of most recent change) - 1
base - .svn/entries current revision
working - current, plus local mods
head - repository youngest

Okay, so what will print out if we try to access the revision code? We try like this:

        fileInfo = client.info2(filePath)
        svnFileInfo = fileInfo[0][1]
        print(str(svnFileInfo["rev"]))

That produces this:

<Revision kind=number 2342>

Interesting, but supposedly tehre were three elements: kinda, date and number. Can we access them or am I misunderstanding things?

Trying this:

        fileInfo = client.info2(filePath)

        svnFileInfo = fileInfo[0][1]

        print(str(svnFileInfo["rev"]))

        print(str(svnFileInfo["rev"].kind))
        print(str(svnFileInfo["rev"].date))
        print(str(svnFileInfo["rev"].number))

Which produces:

<Revision kind=number 2342>
number
None
2342

Well, that at least bears out the part about having three elements. Now at least I can access the specific parts of the revision information that I’m looking for.

Not bad.

However, we must take into account a wrinkle with SVN: last-changed revision vs. current revision.

Current revision is basically the most recent revision of the working copy. Imagine if you did an update on the root of a working copy - when it finished you’d see something like “Updated to revision xxxx”. Now, the working copy is considered to be at revision xxxx. However, many files within the working copy might not have been changed in revision xxxx - they have a ‘last-changed’ revision which will be lower than the working copy revision. When SVN says ‘Updated to revision xxxx’ what it’s saying is ‘all the files in this working copy are up-to-date as of revision xxxx’. If you pick a file that was last updated in say, revision 443 and tell SVN to check out that file at any revision later than 443, you’ll get the most up-to-date file. If you choose to check out a revision earlier than 443, you will not have the most up-to-date file. 443 would be the ‘last-changed revision’, while xxxx would be the ‘revision’.

Generally, the ‘revision’ isn’t very useful when you’re looking at individual files. Instead, you’ll want to go with the ‘last-changed revision’.

In order to print out the information for ‘rev’ vs. ‘last-changed-rev’, I would use this code:

        svnFileInfo = client.info2(filePath)[0][1]
        curRev = svnFileInfo["rev"].number
        lastChangedRev = svnFileInfo["last_changed_rev"].number
        print("Current revision is " + str(curRev))
        print("Last-changed revision is +" str(lastChangedRev))

And you get this output:

Current revision is 2342
Last-changed revision is 556

Not bad, now we can get all the revision information we want.

Getting Author/Committer Information

I’m guessing that the author information is this one:

last_changed_author - string or None

And luckily, it’s just a string, so it should be easy to access like this:

        fileInfo = client.info2(filePath)
        print("Last changed author is" + str(svnFileInfo["last_changed_author"]))

Which produces:

Last changed author <author>

Keep in mind, the ‘author’ will be a username, not necessarily a full-fledged first and last name.

Getting Log Entries

Now we’re getting a bit farther. Most of the information I was looking for could be found with the info2 function, but I don’t see anything about the log in the info2 documentation, so it must be elsewhere.

But where?

Look no further than this.

It’s a single function call that returns this:

log returns a list of log entries; each log entry is a dictionary. The dictionary contains:

author - string - the name of the author who committed the revision
date - float time - the date of the commit
message - string - the text of the log message for the commit
revision - pysvn.Revision - the revision of the commit
changed_paths - list of dictionaries. Each dictionary contains:
path - string - the path in the repository
action - string
copyfrom_path - string - if copied, the original path, else None
copyfrom_revision - pysvn.Revision - if copied, the revision of the original, else None

Alright, so what does this mean practically?

Here’s the code I run:

        logInfo = client.log(filePath)
        print(str(logInfo))

And the result:

[<PysvnLog ''>, <PysvnLog ''>]

It’s worth noting that retrieving the log takes a second - much more time than the info2 command took.

Okie…. this file has two entries in its log when I look at it, so that must be theese. Let’s assume those are dictionaries as specified above and see if we can print out the information we want for each entry. Here’s the code I created:

        logInfo = client.log(filePath)
        
        print(str(logInfo))
        for entry in logInfo:
            print(str(entry["revision"].number) + " - " +str(datetime.datetime.fromtimestamp(entry["date"]).strftime('%Y-%m-%d %H:%M:%S')) + os.linesep + str(entry["message"]) + " - "+ str(entry["author"]) )

And the output:

[<PysvnLog ''>, <PysvnLog ''>]
<Revision> - <year>-<month>-<day> <hour>:<minute>:<seconds>
< Commit message 1 - 
Multiple Lines!> - <Committer>

<Revision> - <year>-<month>-<day> <hour>:<minute>:<seconds>
< Commit message 2 - 
Multiple Lines!> - <Committer>

I had to include the datetime library to get the correct formatting function since the timestamp comes back in a float format. But overall, not too bad!

Listing the SVN Contents of a Directory

I have a situation where I need to get the SVN information (URL, last changed revision) for a few files in a directory. This presents an interesting twist: in a working copy, not all files may be version controlled. There could be uncommitted files hanging around there - this means it’s important not to do a directory listing of files, but instead to do an SVN listing of files.

Turns out there’s a useful command called ‘ls’. Ha. Imagine that. Let’s see what the documentation says about it.

Use the list method in new code as it fixes performance and ambiguity problems with the ls method.

Oh. Okay. Where’s that?

Here.

You pass it a path and it returns some information:

Returns a list with a tuple of information for each file in the given path at the provided revision.

The tuple contains:

0 - PysvnList containing the dirent information
1 - PysvnLock containing the lock information or None
The PysvnList object contains the requested dirent fields:

created_rev - pysvn.Revision - the revision of the last change
has_props - bool - True if the node has properties
kind - node_kind - one of the pysvn.node_kind values
last_author - string - the author of the last change
repos_path - string - (always present) absolute path of file in the repository
size - long - size of file
time - float - the time of the last change

So, I whip this code up:

        #Get directory from file path
        dirPath = os.path.dirname(filePath)
        print("Listing working copy path: " + str(dirPath))
        dirInfo = client.list(dirPath)
        for entry in dirInfo:
            print("Repository file: " + str(entry[0]["repos_path"]) )
            print("Last changed in revision " + str(entry[0]["created_rev"].number))
            print("By author: " + str(entry[0]["last_author"]))

Which produces this result:

Listing working copy path: <WC Path>
Repository file: <Relative Repo Path>
Last changed in revision 556
By author: <Author>
Repository file: <Relative Repo Path>
Last changed in revision 340
By author: <Author>
Repository file: <Relative Repo Path>
Last changed in revision 556
By author: <Author>
Repository file: <Relative Repo Path>
Last changed in revision 340
By author: <Author>

Well, we’ve got a problem: the paths are relative repository paths, not complete URLs. The server name and protocol is not present, which is not what I want.

The question is: how do I get that?

I could use the list command to get a list of files and then use the info2 command to get their URLs.

Ah, but I’ve encountered a twist!

When you list a directory, the first entry returned is the directory itself!

Not really a problem, but if you’re expecting just the files, you’ll get in some trouble.

Anyway, here’s the code to do get the full URL with the info2 command:

        #Get directory from file path
        dirPath = os.path.dirname(filePath)
        print("Listing working copy path: " + str(dirPath))
        dirInfo = client.list(dirPath)
        for entry in dirInfo[1:]:   #Start at 1 to ignore the directory entry
            dir,fileName = os.path.split(entry[0]["repos_path"])
            wcFilePath = os.path.join(dirPath,fileName)
            fileInfo = client.info2(wcFilePath)[0][1]
            
            print("Repository file: " + str(fileInfo["URL"]) )
            print("Last changed in revision " + str(fileInfo["last_changed_rev"].number))
            print("By author: " + str(fileInfo["last_changed_author"]))

Checking a Working Copy for Uncommitted Changes

One of the sanity checks I’ll have to do is to make sure that I haven’t failed to commit changes I’ve made to my checklists before I close an issue.

This Stack Overflow answer suggests a method.

Here’s the code:

    statuses = client.status(path_to_repository, ignore=True, recurse=True)
    statuses = [s for s in statuses if s.data['text_status'] != pysvn.wc_status_kind.normal]
    return len(statuses) == 0

I think I’d like to change it to something that returns a list of files with uncommitted changes. Let’s look up the status function. Documentation is here.

It returns a PysvnStatus object, what’s that?

It’s this:

Each status object has the following fields:

path - string - the path name
entry - PysvnEntry - entry information
is_versioned - Boolean - true if the path is versioned
is_locked - Boolean - true if the path is locked
is_copied - Boolean - true if the path is copied
is_switched - Boolean - true if the path has been switched
prop_status - wc_status_kind - the status of the properties of the path
text_status - wc_status_kind - the status of the text of the path
repos_prop_status - wc_status_kind - the repository status of the properties of the path
repos_text_status - wc_status_kind - the repository status of the text of the path
repos_lock - dict - the repository lock information

Alright, the big question is going to be what the path returns - local? Repo? URL?

The other big issue is that the example code only looks for status NOT normal. That could include more things than just normal or uncommitted changes.

This is the documentation for the pysvn_wc_status_kind type, reproduced here:

none - does not exist
unversioned - is not a versioned thing in this wc
normal - exists, but uninteresting.
added - is scheduled for addition
missing - under v.c., but is missing
deleted - scheduled for deletion
replaced - was deleted and then re-added
modified - text or props have been modified
merged - local mods received repos mods
conflicted - local mods received conflicting repos mods
ignored - a resource marked as ignored
obstructed - an unversioned resource is in the way of the versioned resource
external - an unversioned path populated by an svn:external property
incomplete - a directory doesn't contain a complete entries list

Okay, so we can check for a variety of non-ideal status. Honestly, at this point, if any status is not ‘normal’ it pump the brakes, so I guess that guy was right….

Anyway, here’s what I wrote for code:

            statuses = client.status(dirPath, ignore=True, recurse=True)
            for entry in statuses:
                print (str(entry.data["path"]) + " has status " + str(entry.data["text_status"]))

And it produced this:

<directory> has status normal
<file 1> has status normal
<file 2>  has status normal
<file 3>  has status normal
<file 4>  has status normal

And all of the file paths were local working directory paths. Doesn’t really matter to me.

It’s worth noting, again, that the first entry was the entry for the directory, not a file within the directory.

Committing Changes in a Working Copy

Well, once you’ve ascertained there are uncommitted changes, you’ll want to commit them.

But PySVN doesn’t use the verb ‘commit’ for this, it uses ‘checkin’.

Only big question is how to get the log message. I know there can be a log message callback, but I think I’ll just use a text input to do this.

        logMessage = input ("Enter a log message: ")

        try:
            newRev = client.checkin(dirPath,logMessage)
        except Exception as e:
            print ("Commit failed! " + str(e))
            sys.exit(2)

The try/catch is there because I know that it’s possible that there will be circumstances the commit will fail - one I know of for sure is a pre-commit hook failing. However, I don’t know exactly what the exception will be and the documentation isn’t explicit on that, so I’m just catching a generic exception.

Yes, I know - that’s bad. But until that exception occurs I won’t know what to look for specifically. Or I could do more research. But I won’t right now.

Also, I’ve had issues using os.linesep within log messages, as well as ‘\r\n’. The server I’m working on only wants the ‘\n’.

Adding Files to a Working Copy

If you have a newly-created file, you’ll need to add it to your working copy and then commit to get the file into SVN. Here’s how you do it.

client.add(filePath)
newRev = client.checkin(addressPath,"Added a new file")

Notes

These are miscellaneous notes I’m keeping about using PySVN.

Single-Threadedness

One of the interesting things about PySVN is that there’s only one SVN client on your PC - anywhere. That means if you’re messing around with the command-line SVN client and you try to run a Python script that uses the SVN client your Python script will have to wait for the command-line process to finish before it can start.

One important upshot of this is that you really can’t do multi-threaded access to the SVN client in Python. If you try, you’re gonna have a confusing and frustrating time.

Only ever access the SVN client from _one_ Python thread

Introduction

Background

PySVN Installation

Hello World Script

Getting a Login

Getting SVN Info for Working Copy Files

Getting Revision Information

Getting Author/Committer Information

Getting Log Entries

Listing the SVN Contents of a Directory

Checking a Working Copy for Uncommitted Changes

Committing Changes in a Working Copy

Adding Files to a Working Copy

Notes

Single-Threadedness

Resources