Why you should use Migrations instead of Visual Studio 2010 Database Projects

If you work on an application that uses a database, chances are you have to deal with releasing new versions of your software that make changes to it. The SQL language provides comprehensive support for making these types of changes and can access even advanced features of your chosen database platform. Schema changes are made through create and alter statements typically, and data movement is performed using selects and inserts.

When releasing your software initially, deployment is straightforward as there is no existing data to deal with. As users exercise the features in your software, rows of data are added to tables, and future changes require more care to not destroy or make invalid changes to the existing data.

In the past, DBAs or developers with sufficient SQL programming knowledge have written scripts to make the changes necessary to update database assets that have existing data in them, paying special care to typical situations like adding a new NOT NULL column (you need to initialize it with data to enable the constraint) splitting one column into two, or splitting some columns of a large table out into a new detail table.

For years seasoned developers have used the following approach for making changes to the database:

  • Add a table with one row that stores the “version” of the database. This data is not really application data per se, but more like metadata that identifies the state the schema is in. This version is usually initialized to the lowest version where development starts, let’s say 1.0.0.0.
  • Create SQL scripts when you have changes that check this row. If the version of the database the script is running against is lower than the “version” of your script, make your changes.
  • When your script is done making the changes, increment the version number of the database row to its new version (1.0.0.1 for example).

The great thing about this approach is that it supports deploying changes to multiple versions of the same database. If you are “upgrading” version 1.0.0.0 database and your latest version is 1.0.0.5, any scripts that have numbers between these two versions will run in ascending order. If you are “upgrading” version 1.0.0.3 to the 1.0.0.5, scripts that apply to databases at a version prior to 1.0.0.3 will be skipped.

There are two gotchas with this approach:

  1. You need to test upgrading from any version you have in the field before deploying to that version. So if you are upgrading databases that can vary by 5 versions, you really need to test the upgrade process going from all 5 of these versions to the current version. This is more a consideration than a limitation as you always need to do this when supporting multiple upgrade paths for your software.
  2. Developers can make mistakes in their SQL script and look for the wrong version, or forget to update the version in the database to current if the operation is successful.

When Ruby on Rails was released, the ActiveRecord team along with David Heinemeier Hansson provided the then-emerging ruby community with a technology called migrations, that provides some extra help on top of this. Basically anytime you want to change the database, you would run a command at your operating system prompt that would generate a new script that’s prepended with a version number greater than the latest script in your source.

An example will help here.

  • You run the command “rails generate migration create_users” and a file 00001_create_users.rb is generated. You put code in here to both update, AND rollback changes related to the “users” table for example.
  • You run the command “rails generate migration create_roles” and a file 00002_create_roles.rb is generated. Notice the tool recognized your latest script version and created a newer version automatically.

When you want to deploy to a database, you run another command “rake db:migrate” which tells the “rake” (ruby make) build engine to run all of your database migrations against the target database. The migrations engine automatically does the work of checking the target version of the database, running only those scripts that apply, and incrementing the database version to the latest one that succeeded.

This approach solves the problem of developers having to version things manually, and really simplifies deployment to multiple versions of a database. It also allows developers to incrementally make changes needed to support changes they are working on, without stepping on the toes of other developers.

Enter Visual Studio 2010 database projects

With the release of Microsoft Visual Studio 2010, another approach was provided to developers for managing database versions. This approach was made available by Microsoft’s acquisition of DBPro.

The VS 2010 DB project approach is to have a type of project in your solution that contains scripts that can create all the artifacts in your database. There are create scripts for stored procedures, schemas, roles, tables, views etc. However, the tool is sold as not requiring developers to know as much SQL programming, but rather they are provided with a treeview panel in the Visual Studio IDE (referred to as “Schema View” in the documentation). They can interact with this tree to add tables, rename columns, and make other trivial changes via a GUI and these changes are then saved as new SQL scripts in the project.

What happens when you deploy your DB project is that an engine that is part of the build system in Visual Studio does a compare of the target database being deployed to with what a “new” database would look like based on the scripts in your project, and then generates a script to alter it to make it’s structure match the project’s source code. The engine works much like RedGate software’s “SQL compare” tool in that it is fairly intelligent about determining changes in schema and generating appropriate scripts.

At first glance, this seems like a superior solution as it gives point-and-click programmers more productivity, removes version management from the picture, and eliminates the need to manually create alter scripts. In practice however, by itself this approach will not meet the requirements of most deployment cycles.

Microsoft released an ALM rangers guide to using Visual Studio 2010 database projects that is meant to be used as primary guidance for developers, DBAs, and architects looking at how to use best practices around VS 2010 DB projects. Part of this guide talks about “Complex Data Movement”, or what I will refer to here as “changing database assets containing data” because that’s really what they are talking about.

Unfortunately Microsoft’s solution for this “complex” scenario (which is common and regular, in my experience) is to subvert the diffing engine and revert to the use of temporary tables and pre/post build scripts to trick the engine into thinking the schema doesn’t need to be changes while fixing it up afterwards. This issue is described in the ALM rangers guide, and also on Barclay Hill’s blog post here.

Jeremy Elbourn comments on the MSDN forums why this approach actually makes maintaining database changes over time even more difficult than the migration approach in a real world environment. Microsoft also recently announced the availability of database migration support in ASP.NET MVC 4 (but only if you are using Entity Framework as well). These developments leave folks responsible for determining a database change management approach confused as to where the best practices are going with respect to Microsoft’s vision.

It is of my opinion that Visual Studio 2010 database projects should be avoided in favor of a migrations engine for the following reasons:

  1. The success or failure of employing VS 2010 DB projects in real world, enterprise sized clients has yet to be demonstrated in measurable capacity and the technology is still relatively new. I’ve seen some press releases, but these are marketing announcements with no downloadable artifacts to evaluate. I also have been discussing the tradeoffs with one colleague who is using it on a single application for an enterprise client with many integrated applications.
  2. I tend to embrace tools that generate code for me or do work automatically only when they are comprehensive, well-understood, and have limited “gotchas”. Schema and data change management is a complex topic and the VS 2010 database project approach leads developers to think the solution is easy, while in practice it forces them to understand how the diffing engine works, the project structure and deployment lifecycle of a DB project build, and how to circumvent the diffing engine to change database assets containing data.
  3. The ALM guide proposes detecting existing schema state to determine when pre or post deployment scripts need to be run. “If this column exists, run this script”. This is an error prone and ignorant approach. What if version 1 has this column, version 2 does not, and version 3 adds it back in? This kind of check will fail. Ironically the workarounds for this are to come up with a custom versioning and incremental migration strategy for your pre/post build scripts anyway, which is a red flag to me that the design is flawed.
  4. VS 2010 database projects abstract developers from getting better at SQL, much like Web Forms did for HTML/CSS/JavaScript prior to ASP.NET MVC arriving on the scene. In my experience, developers are seriously lacking adequate database management skills and need to get better at all aspects of it. There are several assets not supported by VS 2010 database projects in the ALM rangers guide that need to be scripted manually anyway.
  5. The best time to write tests for changes being made to a database and reviewing their impact is when making the changes, as the structural impact is fresh in the developers’ mind. Using the diffing tool, the generated alter scripts still need to be reviewed prior to deployment, especially if you don’t have a high coverage functional and acceptance test suite to ensure no breakage was caused by the change. Chances are you have an operations person reviewing the changes before running them on production, and without comprehensive testing you are relying on them to make sure the changes are appropriate. I hope you are working closely with operations during the entire development lifecycle in this situation!
Migrations work for all technologies and are simpler to understand and maintain

If you would like to use migrations today without both adopting ASP.NET MVC 4 and Entity Framework, Thoughtworks created an open source tool “DBDeploy” (with a corresponding .NET version, DBDeploy.NET) that they use with all of their clients and handles this elegantly. The only difference between it and the rails migration approach is that rails migrations use a DSL for making the changes, while DBDeploy uses SQL.

Re-trusting check constraints in SQL doesn’t help for NULLABLE columns

I’ve been going through a large database for a client of mine and finding foreign key and check constraints that are marked “untrusted”. This happens when a relationship between two tables has some rows with foreign key column values that don’t have a match in the related table. When this happens, Microsoft SQL Server can’t use the query optimizer as well to lookup matches between the two tables when running queries. This results in sub-optimal performance.

Unfortunately I discovered today, if the foreign key column accepts NULL, you can still run a query to re-enable the check constraint without error, but it will still be marked as “untrusted” in INFORMATION_SCHEMA and will not benefit from the query optimization available to trusted keys!

Hopefully this helps someone out there to reduce the work you need to do when determining a data optimization strategy around dealing with existing untrusted checks.

The continuing saga of aligning .NET with rails – FactoryGirl.NET

While working on kinlighten, a side business I am starting up that runs on ruby on rails, I used a popular open source ruby framework for generating objects to use in tests. There are many frameworks available for rails that are popular including factory_girl, machinist, and fabrication.

These frameworks assist with the creation of objects in your application’s domain model that are initialized with state appropriate for tests. They help by allowing multiple test cases to re-use the same objects factories, reducing the lines of object initialization needed by tests.

The framework I used is factory_girl, and you use it by creating a single ruby file for each model’s factory. For example, if my domain had an order, customer, and line_item models, I would have a order_factory, customer_factory, and line_item_factory. When I want an instance of an object from my factory in my test, I just call a single method to “build” one for me.

James Kovacs has created the FactoryGirl.NET class library to allow .NET developers to use factories as well. This is another of many recent steps (bundling and minification, migrations, etc.) to bring ASP.NET MVC productivity up to the level of rails.

In his framework, you would instead create a class for each model in your domain. So if you had an Order, Customer, and LineItem C#/VB class, you would have OrderFactory, CustomerFactory, and LineItemFactory classes that can be used to retrieve objects initialized with state appropriate for testing. These classes go into your Test class library project.

Check out his article on creating FactoryGirl.NET, the github repository (it’s also on NuGet), and a great intro on how to use the original ruby version of factory_girl for more information. This is a first class technology with heaps of success in the rails community and I commend James for working to bring this to .NET.

Razor is sharp, but NHaml is still haiku for HTML 5 and JQuery

A colleague of mine told me recently about Razor, the view engine for ASP.NET MVC 3, and upon researching it and using it in a test project, I almost instantly came to compare it to NHaml since I’ve been using HAML for several years doing rails on the side. What I found is that though Razor is the best view engine I’ve seen from Microsoft (on top of a great version 3 of ASP.NET MVC – nice job guys), I still believe NHaml’s syntax is significantly better suited for HTML applications and even more so if they use JQuery.

Though Razor does a great job requiring minimal characters to insert executable code logic in between the markup it generates (and is basically equivalent in that respect to HAML), it does nothing for minimizing the amount of code you have to write to express the HTML outside of those logic statements. NHaml is simply superior here when you are generating HTML for this reason: it reduces markup to the minimal information needed to identify the JQuery or CSS selectors that elements have applied to them.

It does this because with NHaml normally you specify the name of an element without the angle brackets, but if the tag you want is a DIV element with an ID attribute, you can just specify the ID prefixed by a hash symbol and drop the DIV altogether.

<div id=”blah”></div>

becomes:

#blah

This also works for CSS classes. This dramatically increases code readability because lines of code begin with the JQuery selector or CSS style name used to access them. When writing Javascript or CSS, locating these elements in markup is much easier. This is already on top of the fact that NHaml drops the requirement for closing tags.

Here’s an example that I think illustrates the point. Let’s say I have a CSS style sheet with the following styles. Don’t worry about the attributes in the styles themselves just look over the list of CSS selector names (container, banner etc.) and think of looking at this file day to day:

#container { width: 100%; }
#banner { background-color: blue; } 
.topic { font-size: 14pt; }

Now here’s some HTML markup styled with selectors in the above sheet in HAML:

#container
  #banner

    .topic Hello

Here’s how you would generate the exact same markup using Razor:

<div id=”container”>
 
<div id=”banner”>
   
<div class=”topic”>Hello</div>
 
</div>
</div>

Building on this let’s say we wanted to override the .topic style inline with some other styles, and throw in some inline JavaScript. Here’s HAML again:

:css
  .topic { font-size: 14pt; }

:javascript
  alert(‘hello!’);
#container
  #banner

    .topic Hello

and here’s Razor:

<style type=”text/css”>
.topic { font-weight: bold }
</
style>
<javascript type=”text/javascript”>
alert(‘hello’);
</javascript>
<div id=”container”>
 
<div id=”banner”>
   
<div class=”topic”>Hello</div>
 
</div>
</div>

Hopefully you can see the HAML is much easier to read, and reduced in lines of code by about 15% in this example.

Here’s another great post from late last year that shows some comparisons of Razor and NHaml.

I’m glad Microsoft is embracing convention over configuration

I read Agile Web Development with Rails while visiting San Diego a couple years ago and was blown away by how well put together of a framework it was. What the book helps you realize is that if you follow certain naming conventions for your code artifacts (in this case ruby source files), it automatically wires up communication between the different architectural layers of your application.

With the recent release of ASP.NET MVC 1.0, which is Microsoft’s answer to ruby on rails, Microsoft has provided what seems to me to be a simpler approach to web applications and adapts to testability better than the oft-complicated event model of existing ASP.NET web applications.

I also downloaded Silverlight 3 Beta, Expression Blend 3 Beta, and Microsoft’s Rich Internet Application (RIA) toolkit preview. The new version of Silverlight has a ton of controls, and I love that editable forms with built in wiring up to validation are included out of the box!

When you have the RIA toolkit installed, you can create a data model in Entity Framework in your web application, create a special link to it in your Silverlight “client” project, and you can wire up similarly named domain objects to databind to your Silverlight project and the databinding hits the server using REST transparently. It’s very slick.

ADO.NET Entity Framework 2+ Top 3 “Must Have” Features

I’ve spent quite a bit of time trying to use ADO.NET Entity Framework for a large web application here and run across some shortcomings that make it unusable for this project. As such I’ve come up with my top three requests for the next release(s):

  • Multiple related designer files.Let me create multiple EF diagrams and re-use types across them from a single schema! There is a Using tag in the EF schema that lets you do this, but it can only be setup through the XML editor, and then you can only navigate one way from a relationship between the two entities that cross designers. Not a good solution!!! This would also solve the problem of only letting one person have the designer checked out at once. This is the most crippling parallel development and team scale issue of the current EF implementation for projects with large database schemas.
  • Better SQL statement optimization for SQL server. Currently LINQ creates much better performing queries than EF for the same database and joins. This is due to the storage to conceptual abstraction as it is currently implemented by EF. I’m sure this can be optimized to work better in a future release.
  • Full POCO support. I want to return classes from EF queries that have no base class or attributes that deal with mapping. There is an EF-contrib project that does this, but it’s not built-in and needs to be more robust. This will allow me to return POCOs from EF behind other layers of my architecture and not have any coupling to the EF framework.