This article describes how to implement version control with SQL Server. It is based loosely on our experience of implementing a workable version control system. This relates to a medium sized project of 175 tables and 870 stored procedures, for an application of 150, 000 lines of code on a project with two full time developers.
For some reason when it comes to version control, SQL seems to be overlooked. There are some valid reasons for this, the top one being that it is more complex to manage. However there are many benefits to version control. A well implemented version control system allows you to get a single version from any point in time or to compare versions.
Which version control system?
It isn’t my intention to provide an exhaustive comparison of all the available version control systems, just to highlight the most important parts of the systems we examined: Visual Source Safe (VSS), CVS and subversion (SVN).
VSS does not have a good reputation when it comes to version control. It isn’t a good sign when the third link in a google search bring up a article titled Visual SourceSafe: Microsoft’s Source Destruction System, with some serious criticisms. What was attractive about VSS was the integration with Visual Studio.Net (the language the application was written in). Of the three version control systems it is the only one that is not free. Also it is available only on Windows.
CVS is the old man of version control systems. In many ways when someone mentions version control CVS is often the first thing that comes to mind. It is a reliable and established open source solution. While CVS came from the unix world, there are also binaries and clients for Windows.
SVN is another open source was developed in reaction to some of the perceived flaws with CVS. SVN allows you to move files while still maintaining the history of changes to that file. It also gives a version number to the whole repository rather than to each single file. Also commits are atomic: a commit will not happen unless it is completely successful. SVN also offers binaries and clients for Windows.
On cost, features and reliability , CVS and SVN seemed to have the advantage over VSS. Between subversion and CVS, SVN seemed to be the better system. In addition I had not heard of many negative experiences of people moving from CVS to SVN. We chose SVN as a result.
Pleasant discoveries using SVN
The first pleasant discovery with SVN was that it was eminently scriptable. This makes it possible to automate committing and compiling code. You can also script the retrieval of commit logs, which can be rather powerful. It is relatively simple write a script to pull out a listing of all changes that have been made between any two committed versions.
In many version control systems to change code you must first check the code out, make any changes, then commit the changes. SVN eliminates the checkout step, in effect if you have a copy of the repository you can check code in. This can create conflicts, when two people are editing the same code.
In this case SVN notifies the second person of the conflict and only allows them to commit when the conflict has been resolved. While this may seem a recipe for chaos it actually works remarkably well. The advantage of this is no one developer is “blocked” because they are unable to edit a piece of checked out code.
One of the issues with applying version control to SQL is that version control is largely about managing file on the file system, not tables, stored procedures and views. This can be solved by saving the database structure to a number of files, however the question remains: to what extent should these be broken down? At the coarser level, you could simply get SQL Server to generate a full SQL script for the entire database and commit that. Or you could break it down into one file for tables, one for stored procedures and one for views etc. At the finer level you could store each table, stored procedure, view (etc) in a separate file.
This was the approach we took. With a finer grained solution, there is less scope for conflicts. We chose to store tables in one directory, stored procedures in another directory and triggers in a third. The database in question doesn’t use views, User Defined Functions (UDFs) or User Defined Data Types (UDDTs), however if it did each would have its own directory. We chose to include indexes, constraints and defaults in the file with the table/column they were defined on. We believed it did not make sense to either store all of them in one file, or in one index/constraint in each file.
A database’s natural state is not to be stored in files split over multiple folders. Tools need to be built manage converting getting a database from the file system. The language of choice for this has been perl. This is relatively simple script to create the tables, create the procedures and create the triggers. The script uses osql to execute the script files. In this script the tables are created in a specified order (to avoid errors), however the stored procedures and triggers are executed in any order. It would be relatively easy to modify this script to include another directory for views, UDFs etc.
For our purposes for the database to be useful there often needs to be some data in it. So a SQL script was created to insert this data. As both of these scripts can change with changes to the database, both were included in the repository.
There is still a disconnect between the database and the database stored in scripts in the file system. With the script to get the latest version of the database the difficulties are eased when converting from file system to database. However there are no easy solutions converting from the database to the file system. In the end the most workable solution is to ensure that when a change is made to the database, it must also be made in the scripts and committed. This requires discipline on the part of the developers. We did consider writing some tools to attempt to resolve this for stored procedures and triggers (triggers on system tables, a plugin for Query Analyzer etc), however all proposed solutions looked to create more work than they saved. In the end so long as the script to get the latest database from the repository imposes its own discipline: if you don’t commit it, the changes get lost.
The major benefits of implementing version are when the database code (ie tables, stored procedures etc) is changing frequently. If this isn’t the case for you (and you are sure it won’t be in the future), a version control system is likely just add overhead. In the database we worked (and I would guess most databases) the tables were relatively static, while the stored procedures changed quite often (being added, removed or updated). The greatest benefit was in tracking the changes in the stored procedures, however it was well worthwhile to keep the entire database structure in version control as changes to the stored procedures sometimes resulted in changed to the tables (and vice versa).
If the database is related to an application, the application should be included in the same repository. This enables you correlate database changes to changes in the code base. The result is that when you update to the latest version or get an earlier revision you have a database that matches your application codebase.
Implementing a version control system with SQL Server has largely been a positive experience. It has enabled us to better keep track of changes to the database structure and who has made the changes. It has also enabled us to maintain historical database code.
author: David Cameron (with thanks to Ben Tan for help in developing the solution).