Stay young, stay curious, stay hippie: David Heinemeier Hansson’s keynote in Austin on progress in software frameworks

The always divisive David provides an entertaining look at why people are averse to loss as they progress in their careers. He also talks about the illusionary ceilings for education and questions us to think about why we continue to cling to our existing body of knowledge around software, or any topic that is accumulated through years of experience.

I encounter this often, whether talking about continuous delivery, agile development processes, minimalism in tooling, Microsoft’s shift to open technologies on the client, or open source technology use in enterprises. He starts off with some mention of technologies specific to rails, but bear with it as he then leads into the meat of the conversation which is applicable to anyone, and the real gem in the keynote. I’d recommend anyone watch this, especially folks who have decided that certain technologies are not useful – you are the subject of the conversation and may learn some things from it.

David Heinemeier Hansson’s Keynote “Progress” at RailsConf 2012 in Austin, Texas

Keep your agile promises by validating acceptance and reducing your sprint workload

Most project teams have tried some permutation of an agile or SCRUM process by now, and a consistent theme amongst those I see on consulting engagements is a failure to deliver the work done in a sprint to users before starting the next one. Continuous integration, standup meetings, and backlogs are usually present, and some will even try test-driven development. But at the end of the sprint, the work is still not ready to deliver to users.

At the end of a sprint, there is a meeting that takes place where stakeholders and the development team get together to review the work that was done. More often than not, the stakeholders like what they see to some extent, but find discrepancies between what they thought they were getting and what was actually implemented. In every case the reason this occurs is a failure to establish acceptance criteria prior to doing the work.

An agile or SCRUM process is usually sold to the business as a way to get more communication going between the development team, and an ability to shift priorities since the backlog can be prioritized; and the work assigned to development in the next sprint allows for more flexibility than a waterfall approach with typical several month to year cycles for releases. Additionally, higher quality is usually promised to the business.

Inexperienced agile teams may read extreme programming and story-driven approaches and like the fact that requirements are sold as only needing to be short statements and not detailed use cases as happens on a waterfall project. They often take this to the extreme, in that a simple description of the work is established in the backlog, and this is the only agreement that can be pointed to with certainty at any point in the process.

A user story in an agile or SCRUM development team’s backlog should be a promise for a future conversation. When a story is scheduled into the next sprint and that sprint starts, the first activity that should take place is the business stakeholder most intimate with the story has a conversation with the developer who is going to do the work, and a QA person who is responsible for validating acceptance of the work is also present.

During this conversation, the goal is to establish acceptance criteria. This focus on criteria provides several benefits. First, it allows the business more time to provide details about the story that they may not have originally communicated when it was placed on the backlog. Second, it allows the developer a chance to communicate technical challenges with the business’ vision, and gives the two parties a chance to come to a compromise in design that will sufficiently meet the needs of both. Thirdly, it enables QA to think about possible ways in which it may be tested, which often leads the business and development representatives to further clarity. The last, which is the subject of this post, is to establish what constitutes successful completion of the work.

To do so does not require a use case, or a large document. Rather, the group should be able to walk away with a description in English of ways in which the user or system will interact with the software that, if successful, validate that the work was done correctly. The level of detail you go into with your description is up to your team, but the more detailed, the more sure you can be that what will be completed at the end of the sprint will be ready for delivery to users. If you are building a calculator for example, you may establish several mathematic calculations that have to succeed to consider it acceptable. Every possible calculation does not need to be present here; but rather enough that it would be difficult to meet the acceptance criteria and still deliver a low quality feature.

Once this acceptance criteria is established, it is of great importance that the person responsible for user acceptance, typically a QA representative, works with the developer to write automated acceptance tests (highly preferred) or come up with a manual testing process that can be used to verify it. This work can be done before the code is written (test-first acceptance) or during (parallel acceptance) but do not leave this until the end. It is highly important that developers are able to execute the acceptance tests, whether automated or manually, several times during the sprint to gauge their progress towards completing it.

Because this acceptance criteria is established up front, it helps developers to focus on delivering precisely that functionality and also reduces the chatter that often happen in lieu of this as a developer attempts to get clarification from the business about details that were not there at the beginning. That being said, if the team is inexperienced with defining acceptance, the first few sprints may result in two undesirable side effects.

The first of these is that since developers are not used to having to deliver acceptance tests along with the work itself, there is a good chance that too much work will be scheduled in the sprint, and some of them may be late. It is of great importance that the entire team – the business, QA, and development accept this possibility and use it as a learning experience to discover what a reasonable amount of work to deliver in a sprint looks like when it has to be accepted and deliverable before the sprint is over. The next sprint will likely deliver less functionality in the same amount of time, but will be done in time for the end of the sprint. This is the difference between what I’ve heard others call “agilefall” or “waterscrum”. That being the cherry-picking of practices from an agile/SCRUM process and failing to deliver on the promises.

The second side effect of this process change that may be felt when first implementing it on your project is that there will still be some things missing from what was delivered and what the business expects. Let me be clear here – it is perfectly normal, and actually a great benefit to using an agile process that the business can see something every 2 weeks (or however long your sprint is) and upon doing so, provide additional detail and changes that can be scheduled for the next sprint. However the entire delivery team needs to get better at articulating what they do plan to deliver in a way that can be acceptance tested and is clear to the developer so what is agreed upon is not open to interpretation.

This subtle difference is important – it is unrealistic and illogical for the business to attempt to hold developers accountable for not delivering functionality at the end of a sprint for which acceptance criteria could not be defined. If the business wants developers to do a better job at delivering what they want, they must improve their ability to articulate it, or simply embrace the great flexibility that comes with an agile process to allow them to figure out more about what exactly they want every two weeks.

One more change needs to occur to your process to allow for the work that is done in the sprint, now backed by acceptance criteria, to be delivered to users at the end. The developer should allow for time to meet with operations personnel or whoever maintains the various environments (development, acceptance, and production for example) to ensure that they can actually deliver it to users at conclusion of the end-of-sprint meeting. The business may still decide that it is not ready for users from a functional standpoint, but the goal should be for the functionality delivered in each sprint to be of high enough quality to deliver to users immediately following conclusion of the sprint should they desire. Refer to my post about the dangers of making production an island and not optimizing your build process for quick escalation from a user acceptance environment into production for more information on how your operations team can deliver deployment scripts along with the development team.

Think for a moment about the net result of all of this. At the end of a sprint, QA can demonstrate that the functionality delivered meets the acceptance criteria not that they or a developer came up with, but the business as well. We’ve all seen the project where QA said a feature was tested but the business is upset with them and the developers for what was delivered. Developers also do not have to feel any anxiety that what they deliver will not be acceptable. They should however be comfortable with the fact that upon seeing the work, the business may want things changed or additional functionality put in. This is exactly why businesses usually agree to trying out an agile/SCRUM process in the first place.

Even though less functionality is delivered at the end of a sprint than your team may be used to due to having to include time for defining acceptance criteria, building automated or manual acceptance processes, and getting that functionality deployed into a user acceptance environment – the net result is that your business truly will be able to deliver new functionality to users every two weeks. This outcome alone is more important than any of the individual agile or SCRUM processes – that of continuously delivering real value to your users.

Put all your environment-specific configuration in one place for a pleasant troubleshooting experience

Most software applications leverage a variety of third party libraries, middleware products, and frameworks to work. Each of these tools typically comes with its own method of configuration. How you manage this configuration has an impact on your ability to reduce time wasted tracking down problems related to differences in the environment in which your application runs.

Configuration as it relates to delivering your software really comes in two forms. The first of these is environment-neutral configuration. This type of configuration is necessary for the tool or software to work and doesn’t change when used in your development, testing, production, or any other environment. The second type is environment-specific configuration and is basically the opposite.

When your application is delivered into an environment, whether on a server somewhere or your users’  devices, troubleshooting configuration problems is much easier if all environment-specific configuration is in one place. The best way to do this is to create a table in a database, or a single file, that stores name/value pairs. For example the “ServerUrl” configuration setting might be set to “localhost” in the development environment. Where in production, it’s some domain name you probably purchased.

The problem with adopting this at first glance is that most tools have their own method of configuration, so to make this work you need to find a way to populate their configuration from this database or file. Do this with the following process:

  1. Create a table or file named “ConfigurationSettings” or “ApplicationSettings” for example, that holds the name/value pairs for environment-specific configuration. You can use nested pairs, or related tables if you need more complicated configuration.
  2. Create a build script for each environment (T-SQL, PSake, MSBuild, rake etc.) that populates the table or file with the values appropriate for it. If you have 4 environments, you will have 4 of these files or scripts.
  3. When you target a build at an environment, run the appropriate build script to overwrite the configuration values in that environment with the ones from the script. Note that I said overwrite, as you want to prevent people in the field from changing the configuration of your environment without doing a build. This is because configuration changes should be tested just like code.
  4. For each tool or asset that you want to configure, create a build script (PSake, MSBuild, rake etc.) that reads the values it needs by name from the table or file populated in step 3, and updates the configuration in the format needed. An example would be updating a web.config file’s XML data from the data in the table or file, or applying Active Directory permissions from the data in the table or file.
  5. Create a page, dialog, or view in your application that lists all of the data in the configuration table or file. This can be used by your personnel to easily see all the environment-specific configuration settings in one place.

This may seem like a hoop to jump through considering Microsoft and other vendors already provide environment-specific configuration files for some of their technologies, but I still encourage you to do this for the following reasons:

  1. When something goes wrong in one environment that works in another, it is much faster to look at a page with a flat list of configuration settings than to look in source control at a bunch of files or scripts that can be anywhere in your source tree.
  2. When environment-specific configuration is stored in source control as scripts, you have an audit trail of how those changes have occurred over time in the history of each file.
  3. Whenever you need a new environment, you can simply create a new script with data for that environment and you already have an automated means of populating the configuration mechanisms used by all of the tools and libraries you leverage.
  4. When you need to provide environment-specific configuration for a new technology, you can script setting it up and not worry about whether it supports environment specific methods out of the box.

Pay off your technical debt by preferring API clarity to generation efficiency

I’ve built the technical aspects of my career on combining technologies from Microsoft, that are easy to sell into enterprises that require the confidence that comes from their extensive support contacts and huge market footprint, with open source technologies that steer the direction of technology ahead of the enterprise curve – eventually to be embraced by them.

Microsoft has always provided powerful tools for developers in their Visual Studio product line. They focus on providing more features than any other vendor, and also having the flexibility to allows developers to design their software with the patterns that they find make the most sense to them. Because of this, the community is full of discussion, and there are always new ways to combine their technologies together to do similar things – but with quite a bit of variance on the architecture or patterns used to get them done. It can be daunting as a new developer, or a new member of a team, to comprehend some of the architectural works of art that are created by well-intentioned astronauts.

After I learned my first handful of programming languages, I began to notice the things that were different between each of them. These differences were not logic constructs, but rather how easy or difficult it could be to express the business problem at hand. Few will argue that a well designed domain model is easier to code against from a higher level layer in your application architecture than a direct API on top of the database – where persistence bleeds into the programming interface and durability concerns color the intent of the business logic.

In recent years domain specific languages have risen in popularity and are employed to great effect in open source projects, and are just starting to get embraced in Microsoft’s technology stack. A domain specific language is simply a programming interface (or API) for which the syntax used to program in it is optimized for expressing the problem it’s meant to solve. The result is not always pretty – sometimes the problem you’re trying to solve shouldn’t be a problem at all due to bad design. That aside, here are a few examples:

  • CSS – the syntax of CSS is optimized to express the assignment of styling to markup languages.
  • Rake/PSake – the syntax of these two DSLs are optimized to allow expressing of dependencies between buildable items and for creating deployment scripts that invoke operating system processes – typically command-line applications.
  • LINQ – The syntax of Language Integrated Query from Microsoft makes it easier to express relationship traversal and filtering operations from a .NET language such as C# or VB. Ironically, I’m of the opinion that LINQ syntax is a syntactically cumbersome way to express joining relationships and filtering appropriate for returning optimized sets of persisted data (where T-SQL shines). That’s not to say T-SQL is the best syntax – but that using an OO programming language to do so feels worse to me. However, I’d still consider its design intent that of a DSL.
  • Ruby – the ruby language itself has language constructs that make it dead simple to build DSLs on top of it, leading to its popularity and success in building niche APIs.
  • YAML – “Yet another markup language” is optimized for expressing nested sets of data, their attributes, and values. It’s not much different looking from JSON at first glance, but you’ll notice the efficiency when you use it more often on a real project if you’ve yet to have that experience.

Using a DSL leads to a higher cognitive retention of the syntax, which tends to lead to increased productivity, and a reduced need for tools. IntelliSense, code generation, and wizards can all cost orders of magnitude longer to use than to simply express the intended action using a DSL’s syntax when you’ve got the most commonly expressed statements memorized because the keyword and operator set it small and optimized within the context of one problem. This is especially apparent when you have to choose a code generator or wizard from a list of many other generators that are not related to the problem you’re trying to solve.

Because of this, it will reduce your cycle time to evaluate tools, APIs, and source code creation technologies based not on how much code your chosen IDE or command-line generator spits out, but rather the clarity in comprehension, and flexibility of that code once written. I am all for code generation (“rails g” is still the biggest game changer of a productivity enhancement for architectural consistency in any software tool I’ve used), but there is still the cost to maintain that code once generated.

Here are a few things to keep in mind when considering the technical cost and efficiency of an API in helping you deliver value to customers:

  • Is the number of keywords, operators, and constructs optimized for expressing the problem at hand?
  • Are the words used, the way they relate to each other when typed, and even the way they sound when read aloud easy to comprehend by someone trying to solve the problem the API is focused on? Related to this is to consider how easy it will be for someone else to comprehend code they didn’t write or generate.
  • Is there minimal bleed-over between the API and others that are focused on solving a different problem? Is the syntax really best to express the problem, or just an attempt at doing so with an existing language? You can usually tell if this isn’t the case if you find yourself using language constructs meant to solve a different problem to make it easier to read. A good example is “Fluent” APIs in C# or VB.NET. These use lambda expressions for property assignment, where the intent of a lambda is to enable a pipeline of code to modify a variable via separate functions. You can see the mismatch here in the funky syntax, and in observing the low comprehension of someone new to the concept without explanation.
  • Are there technologies available that make the API easy to test, but have a small to (highly preferred) nonexistent impact on the syntax itself? This is a big one for me, I hate using interfaces just to allow testability, when dependency injection or convention based mocking can do much better.
  • If generation is used to create the code, is it easy to reuse the generated code once it has been modified?

You’ll notice one consideration I didn’t include – how well it integrates with existing libraries. This is because a DSL shouldn’t need to – it should be designed from the ground up to either leverage that integration underneath the covers, or leave that concern to another DSL.

When you begin to include these considerations in evaluating a particular coding technology, it becomes obvious that the clarity and focus of an API is many times more important than the number of lines of code a wizard or generator can create to help you use it.

For a powerful example of this, create an ADO.NET DataSet and look at the code generated by it. I’ve seen teams spend hours trying to find ways to backdoor the generated code or figure out why it’s behaving strangely until they find someone created a partial class to do so and placed it somewhere non-intuitive in the project. The availability of Entity Framework code first is also a nod towards the importance of comprehension and a focused syntax over generation.

Why continuously deliver software?

Since I adjusted the focus of my subject matter on this blog over the past couple of weeks, one of the main subjects I’ve been talking about is continuous delivery. This is a term coined in a book by the same name. I’m attempting to summarize some of the concepts in the book, and putting an emphasis on how the practices described in it can be applied to development processes that are in trouble. I’ll also discuss specific technologies in the Microsoft and Ruby community that can be used to implement them.

If you really want to understand this concept, I can’t overemphasize the importance of reading the book. While I love blogs for finding a specific answer to a problem or getting a high level overview of a topic, if you are in a position to enact change in your project or organization it really pays to read the entire thing. It took me odd hours over a week to read and I purchased the Kindle version so I can highlight the important points and have it available to my mobile phone and browsers.

That being said, I want to use this post to dispel what continuous delivery is not, and why you would use it in the first place.

Continuous delivery is not

  • Using a continuous integration server (Team Foundation Server, CruiseControl.NET, etc.)
  • Using a deployment script
  • Using tools from Microsoft or others to deploy your app into an environment

Rather, the simplest description I can think of for this concept is this.

“Continuous delivery is a set of guidelines and technologies that when employed fully, enable a project or organization to delivery quality software with new features in as short a time as possible.”

Continuous delivery is

  • Requiring tasks to have a business case before they are acted upon
  • Unifying all personnel related to software development (including operations) and making them all responsible for delivery
  • Making it harder for personnel to cut corners on quality
  • Using a software pattern known as a “delivery pipeline” to deliver software into production
  • Delicate improvements to the process used for testing, configuration, and dependency management to eliminate releasing low quality software and make it easy to troubleshoot problems

I’ll continue to blog about this and I still encourage you to read the book, but one thing that really needs to be spelled out is why you would want to do this in the first place. There are several reasons I can think of that might not be immediately apparent unless you extract them out of the bounty of knowledge in the text.

Why continuously deliver software?

When personnel consider their work done but it is not available to users:

  • That work costs money and effort to store and maintain, without providing any value.
  • You are taking a risk that the market or technologies may change between when the work was originally desired and when it is actually available.
  • Non-technical stakeholders on the project cannot verify that “completed” features actually work.

When you can reduce the time it takes to go from an idea to delivering it to your users:

  • You get opportunities for feedback more often, and your organization appears more responsive to its customers.
  • It increases confidence in delivering on innovation.
  • It eliminates the need to maintain hotfix and minor revision branches since you can deliver fixes just as easily as part of your next release.
  • It forces personnel to focus on quality and estimating effort that can be delivered, instead of maximum work units that look good on a schedule.

And lastly: when personnel must deliver their work to users before it can be considered done, it forces the organization to reduce the amount of new functionality they expect in each release; and to instead trade volume for quality and availability.

When you make production an island, it takes a long time to get there

My post yesterday touched on one of the subjects related to software development that has really crystallized some of the process breakdowns I see in too many organizations out there. There is much time spent measuring developer output, but missing the overall cycle of going from idea to users. When organizations begin to measure this, the next step is to measure the activities within.

Of all the phases in a typical delivery cycle for software, the most costly in improperly automated environments is that of deploying to production. We spend hours writing unit tests, maybe some integration tests, and perhaps even writing a full automated acceptance suite but still significant time is spent getting that code to work right in its eventual “production” environment.

Some signs that this might be happening to you:

  • Deploying to production keeps folks working long past the planned duration, involves numerous personnel and is a high stress event.
  • Code that was accepted in test doesn’t work in staging or production.
  • Things that work in production after the latest deployment don’t work in the other environments, and an operations person has to be contacted to find out what they changed recently.

Before I go much further, lets define what I mean by production. In an IT department with internal applications, production may be a farm of web servers and a database cluster servicing one instance of several applications used by the organization. For a shrink-wrapped product, production will be your users’ computers. The cost on cycle time of not properly testing your application in its environment before delivering it can be significant.

Since production environments are a company’s IT backbone bread and butter, operations personnel (or those of your customers) have a motivation for keeping things as stable as possible. Developers however, are motivated by their ability to enact change in the form of new features. This tends to create a conflict of interest and most organizations’ answer is to lock down production environments to only be accessed by operations personnel. An alternative strategy, one outlined in continuous delivery, is to start treating the work operations does related to setting up and maintaining their environment with the same rigor and process as the software being deployed to it.

Life before source control – are we still there?

Consider an example. An organization has 4 environments – development, test, staging, and production. Development is meant to be an environment in which programmers can make changes to the environment needed to support ongoing changes. Test should be the same environment, but with the purpose of running tests and manually checking out the application like a user would. Staging should be the final place your code goes to verify a successful deployment, and production simply a copy of staging. You may be thinking already “I can’t afford a staging environment that has the same hardware as production!”.

It’s acceptable for staging not have the exact specifications of production, but you should minimally try to have two nodes for every scalable point in the topology. If production has a cluster of 4 databases, staging needs to have 2. If production has a farm of 10 web servers, staging needs to have 2. With this environment in place, you are still testing the scaled points in your architecture, but without the cost of maintaining an entire cluster. This is obviously easier to do with virtualization, but take care to not use a staging environment that is significantly more or less powerful than production if using it for capacity and performance testing. You cannot have a staging environment that has half the servers of production and just double the performance you are experiencing to assume production will provide twice the capacity. Measuring computing resources does not occur in a linear fashion as one might assume.

Continuing with the example, consider what work would be like without source control. When you make a change to your code, you would have to manually send that code and make its changes on each developer’s machine. Maybe you could make things a bit easier by creating a document that tells developers how to make the changes to their code. This is ridiculous right? Sadly this is exactly how many organizations treat the environment. A change made in one environment is manually made in all the others, and the opportunity for lag between making those changes and human error is large.

Making the environment a controlled asset

The way out of this mess is to start thinking about the environment as a product that deserves the same process oversight as the software being deployed to it. We spend so much time making sure code developers write is tested, but it’s just as easy to break production by making one bad configuration change. To get around this, we need to change the way the environment is managed and leverage automation.

  1. Create baselines of environment operating system images for each node required by your application (database server, web server, etc.). These images should have the operating system, and any other software that takes a long time to install already setup. Don’t have anything pre-configured in these images that can change from one environment (dev/test/prod etc.) to the next.
  2. Create deployment scripts that you can point to a network computer or VM using datacenter management software (Puppet, System Center etc.). These scripts should install the baseline image on the target computer. Work with operations to determine the best scripting technology to use for them. Operations personnel typically hate XML, but using PSake (a powershell deployment extension) or rake is usually acceptable.
  3. Create deployment scripts that run after the datacenter management step and configure the environment suitable for your software. This includes setting up permissions, adding users to groups, making configuration changes to your frameworks (.NET machine config, Java classpath, Ruby system gems etc.).
  4. Create configuration settings that are specific to each of your environments. This would optimally be one database table, XML, or properties file with the settings that change from one environment to the next. Put your database connection strings, load balancer addresses, web service URLs etc. in one place. I’ll do a future post on this point alone.
  5. Create deployment scripts that apply the configuration settings to the target environment.
  6. Store all of these assets in source control (other than maybe the OS images, which should be on a locked down asset repository or filesystem share).

Once this is in place, you should be able to point to any computer or VM on your network that has been setup by IT to be remotely managed and target a build to it. The build should setup the OS image and run all your deployment scripts. From this point forward, the only way any change should be made to the environment is through source control.

This change provides us with a number of benefits:

  • Operations personnel improve their career skills by learning to write scripts to automate changing the environment and these can be reused in all of the other environments. If you want to change the configuration of the database for example, this change once made in source, will propagate to ALL environments that are deployed to from the same build.
  • Developers can look in source control to see the configuration of the environment. No more sending an email to operations to find out what varies in production from the other environments.
  • Deploying new builds will test the latest code, with the latest database changes, along with any environment changes. This is the only way to really test how your application will run in production. Any problems found in staging will also be found in production, so you get a chance to fix them without the stress doing so in production adds.

There are a couple more things to mention here. First, if you are deploying shrink-wrapped software, you probably have many target environments. To really deliver quality with as few surprises to your customers, you should setup automated builds like this for each variation you might deploy to. Determine minimum hardware requirements for your customer, test at this minimum configuration, and also test any variances in environment. If you support two versions of SQL server, you really should be testing deployment on an environment with each of these different versions for example.

One more thing – for organizations in which production settings are not to be made visible to everyone, simply have a separate source control repository or folder with configuration settings for production, and give your build the permissions to pull from that repository (just the configuration) when setting up a production node. Developers will still need elevated permissions or to coordinate with more-privileged operations personnel to find the answer to their questions about how production is setup, but the code for applying environment configuration settings to the other environments will be accessible via source control, simply with different values than production.

Once you have an automated mechanism for setting up and configuring your environment from a build, you need a way to piggy back that process on top of your continuous integration server. I’ll leave that for my next post.

Cycle time – the important statistic you probably aren’t measuring

When teams develop software, they use products from other vendors to aid them in following their chosen process. Usually data is captured during development that can be used to create reports or do analysis from these other vendors’ products resulting in some insight into capability. We can answer questions like “how long did this bug take to close?” or “how long after this work item was created, was it marked as completed?”.

The most common statistic analyzed in agile teams is “team velocity” which is a measurement for how much your team can get done in one iteration (sprint). Managers love this statistic because it helps them figure out how efficient a team is, and can be used to calculate potential rough estimates for future availability of some feature.

However there is a much more important metric to your business related to software development, and to measure it correctly we need to redefine or at least clarify a regularly misunderstood word in development processes, and that’s being “done”. Too many teams I encounter work like this:

  1. Business stakeholder has an idea
  2. Idea is placed in product backlog
  3. Idea is pulled off backlog (at some future iteration/sprint) and scheduled for completion
  4. Developer considers the task “done” and reports this in a standup meeting
  5. Developer starts work on the next task
  6. Tester finds bugs 2 weeks later
  7. Developer stops his current task, switches to the old one, and fixes bugs
  8. Months from now, someone does a production deployment that includes the feature, and users (as well as business stakeholders, unfortunately) see it for the first time

The duration of time that has elapsed between the first and last step above is known as cycle time. This is an important statistic because it measures the length of time that it takes to go from an idea, until that idea is available to users. Only when the last step is completed is a feature truly “done” and due to a lack of embedded quality and deployment verification in most processes, often a team or individual’s efficiency is determined by omitting everything after #4 above.

It doesn’t matter if your team has developed 20 new features if they aren’t available to users, and they can’t be made available without significant disruption to ongoing work until they have sufficient acceptance tests. This is similar to lean manufacturing, in which you have inventory on the shelf that isn’t being used but this costs something to create and store. We can optimize our cycle time by measuring and working to improve all aspects of the process within the start and end of a cycle.

Reducing cycle time is a key tenet of continuous delivery, which seeks to automate and gate all the phases in your development process with the goal of improving an organizations’ efficiency at delivering quality features to their customers. To improve cycle time, there are many things you can do but I’ll start by talking about analysis and acceptance.

Analyze and accept during the sprint

Many development teams attempt to do requirements analysis on features before or while they are on the backlog, but before they have been added to a sprint. This is a mistake for a couple of reasons:

  • It spends effort on a feature that has not been scheduled for implementation. The backlog is about waiting to act on work until the last possible moment, to reduce waste and embrace the reality that up-front design (waterfall) doesn’t work.
  • It encourages managers to cram as much into a sprint as possible, assuming all developers need to do is “write the code” and misses the cost of doing analysis in measuring overall efficiency.

In reality, a feature should be added to the backlog and prioritized there without effort being attached to it. When that item becomes high enough on the list to schedule for the sprint, it is assigned to a developer and they work with a business analyst or tester during the sprint to write acceptance tests for the feature. These acceptance tests should be automated when implemented, but a tester should be able to write in English a description for what constitutes sufficient acceptance. Developers write the tests first, and then write code to pass the tests using test-driven development approaches.

Often teams new to this approach will schedule too much to get completed in one sprint. This is a learning experience and over time, you will get better at scheduling smaller units of work into sprints, and describing features at a level of granularity necessary for completion by a single developer. During this adjustment period, be prepared that features added to a sprint, once analysis and acceptance is done, will often be identified as too large to complete in the sprint and need to be split up into smaller tasks on the backlog – only scheduling the ones that can be developed AND acceptance tested prior to the end of the current sprint.

This may seem like a trivial process nuance but the goal is to pursue continually delivering new features to your users as quickly and with as little defects as possible. This can only be done if the acceptance criteria for the feature is clear, and there is a repeatable means for verifying it. Automated acceptance is a must here, as manual testing means a longer cycle time.

Once you start accepting this definition of being done, you can start to look at all the pieces of your process that make up cycle time and optimize them. Managers and development leads love to suggest ways that developers can be more efficient, but they rarely look at opportunities for process improvement in business analysis, testing, and deployment. Often, these are more costly to cycle time than development itself, which tends to be limited in opportunities for optimization by the skill of your resources.

I’ll go into more detail about individual practices within your software delivery process that can reduce cycle time in future posts.

Why you should use Migrations instead of Visual Studio 2010 Database Projects

If you work on an application that uses a database, chances are you have to deal with releasing new versions of your software that make changes to it. The SQL language provides comprehensive support for making these types of changes and can access even advanced features of your chosen database platform. Schema changes are made through create and alter statements typically, and data movement is performed using selects and inserts.

When releasing your software initially, deployment is straightforward as there is no existing data to deal with. As users exercise the features in your software, rows of data are added to tables, and future changes require more care to not destroy or make invalid changes to the existing data.

In the past, DBAs or developers with sufficient SQL programming knowledge have written scripts to make the changes necessary to update database assets that have existing data in them, paying special care to typical situations like adding a new NOT NULL column (you need to initialize it with data to enable the constraint) splitting one column into two, or splitting some columns of a large table out into a new detail table.

For years seasoned developers have used the following approach for making changes to the database:

  • Add a table with one row that stores the “version” of the database. This data is not really application data per se, but more like metadata that identifies the state the schema is in. This version is usually initialized to the lowest version where development starts, let’s say 1.0.0.0.
  • Create SQL scripts when you have changes that check this row. If the version of the database the script is running against is lower than the “version” of your script, make your changes.
  • When your script is done making the changes, increment the version number of the database row to its new version (1.0.0.1 for example).

The great thing about this approach is that it supports deploying changes to multiple versions of the same database. If you are “upgrading” version 1.0.0.0 database and your latest version is 1.0.0.5, any scripts that have numbers between these two versions will run in ascending order. If you are “upgrading” version 1.0.0.3 to the 1.0.0.5, scripts that apply to databases at a version prior to 1.0.0.3 will be skipped.

There are two gotchas with this approach:

  1. You need to test upgrading from any version you have in the field before deploying to that version. So if you are upgrading databases that can vary by 5 versions, you really need to test the upgrade process going from all 5 of these versions to the current version. This is more a consideration than a limitation as you always need to do this when supporting multiple upgrade paths for your software.
  2. Developers can make mistakes in their SQL script and look for the wrong version, or forget to update the version in the database to current if the operation is successful.

When Ruby on Rails was released, the ActiveRecord team along with David Heinemeier Hansson provided the then-emerging ruby community with a technology called migrations, that provides some extra help on top of this. Basically anytime you want to change the database, you would run a command at your operating system prompt that would generate a new script that’s prepended with a version number greater than the latest script in your source.

An example will help here.

  • You run the command “rails generate migration create_users” and a file 00001_create_users.rb is generated. You put code in here to both update, AND rollback changes related to the “users” table for example.
  • You run the command “rails generate migration create_roles” and a file 00002_create_roles.rb is generated. Notice the tool recognized your latest script version and created a newer version automatically.

When you want to deploy to a database, you run another command “rake db:migrate” which tells the “rake” (ruby make) build engine to run all of your database migrations against the target database. The migrations engine automatically does the work of checking the target version of the database, running only those scripts that apply, and incrementing the database version to the latest one that succeeded.

This approach solves the problem of developers having to version things manually, and really simplifies deployment to multiple versions of a database. It also allows developers to incrementally make changes needed to support changes they are working on, without stepping on the toes of other developers.

Enter Visual Studio 2010 database projects

With the release of Microsoft Visual Studio 2010, another approach was provided to developers for managing database versions. This approach was made available by Microsoft’s acquisition of DBPro.

The VS 2010 DB project approach is to have a type of project in your solution that contains scripts that can create all the artifacts in your database. There are create scripts for stored procedures, schemas, roles, tables, views etc. However, the tool is sold as not requiring developers to know as much SQL programming, but rather they are provided with a treeview panel in the Visual Studio IDE (referred to as “Schema View” in the documentation). They can interact with this tree to add tables, rename columns, and make other trivial changes via a GUI and these changes are then saved as new SQL scripts in the project.

What happens when you deploy your DB project is that an engine that is part of the build system in Visual Studio does a compare of the target database being deployed to with what a “new” database would look like based on the scripts in your project, and then generates a script to alter it to make it’s structure match the project’s source code. The engine works much like RedGate software’s “SQL compare” tool in that it is fairly intelligent about determining changes in schema and generating appropriate scripts.

At first glance, this seems like a superior solution as it gives point-and-click programmers more productivity, removes version management from the picture, and eliminates the need to manually create alter scripts. In practice however, by itself this approach will not meet the requirements of most deployment cycles.

Microsoft released an ALM rangers guide to using Visual Studio 2010 database projects that is meant to be used as primary guidance for developers, DBAs, and architects looking at how to use best practices around VS 2010 DB projects. Part of this guide talks about “Complex Data Movement”, or what I will refer to here as “changing database assets containing data” because that’s really what they are talking about.

Unfortunately Microsoft’s solution for this “complex” scenario (which is common and regular, in my experience) is to subvert the diffing engine and revert to the use of temporary tables and pre/post build scripts to trick the engine into thinking the schema doesn’t need to be changes while fixing it up afterwards. This issue is described in the ALM rangers guide, and also on Barclay Hill’s blog post here.

Jeremy Elbourn comments on the MSDN forums why this approach actually makes maintaining database changes over time even more difficult than the migration approach in a real world environment. Microsoft also recently announced the availability of database migration support in ASP.NET MVC 4 (but only if you are using Entity Framework as well). These developments leave folks responsible for determining a database change management approach confused as to where the best practices are going with respect to Microsoft’s vision.

It is of my opinion that Visual Studio 2010 database projects should be avoided in favor of a migrations engine for the following reasons:

  1. The success or failure of employing VS 2010 DB projects in real world, enterprise sized clients has yet to be demonstrated in measurable capacity and the technology is still relatively new. I’ve seen some press releases, but these are marketing announcements with no downloadable artifacts to evaluate. I also have been discussing the tradeoffs with one colleague who is using it on a single application for an enterprise client with many integrated applications.
  2. I tend to embrace tools that generate code for me or do work automatically only when they are comprehensive, well-understood, and have limited “gotchas”. Schema and data change management is a complex topic and the VS 2010 database project approach leads developers to think the solution is easy, while in practice it forces them to understand how the diffing engine works, the project structure and deployment lifecycle of a DB project build, and how to circumvent the diffing engine to change database assets containing data.
  3. The ALM guide proposes detecting existing schema state to determine when pre or post deployment scripts need to be run. “If this column exists, run this script”. This is an error prone and ignorant approach. What if version 1 has this column, version 2 does not, and version 3 adds it back in? This kind of check will fail. Ironically the workarounds for this are to come up with a custom versioning and incremental migration strategy for your pre/post build scripts anyway, which is a red flag to me that the design is flawed.
  4. VS 2010 database projects abstract developers from getting better at SQL, much like Web Forms did for HTML/CSS/JavaScript prior to ASP.NET MVC arriving on the scene. In my experience, developers are seriously lacking adequate database management skills and need to get better at all aspects of it. There are several assets not supported by VS 2010 database projects in the ALM rangers guide that need to be scripted manually anyway.
  5. The best time to write tests for changes being made to a database and reviewing their impact is when making the changes, as the structural impact is fresh in the developers’ mind. Using the diffing tool, the generated alter scripts still need to be reviewed prior to deployment, especially if you don’t have a high coverage functional and acceptance test suite to ensure no breakage was caused by the change. Chances are you have an operations person reviewing the changes before running them on production, and without comprehensive testing you are relying on them to make sure the changes are appropriate. I hope you are working closely with operations during the entire development lifecycle in this situation!
Migrations work for all technologies and are simpler to understand and maintain

If you would like to use migrations today without both adopting ASP.NET MVC 4 and Entity Framework, Thoughtworks created an open source tool “DBDeploy” (with a corresponding .NET version, DBDeploy.NET) that they use with all of their clients and handles this elegantly. The only difference between it and the rails migration approach is that rails migrations use a DSL for making the changes, while DBDeploy uses SQL.

Foregoing assumed value in favor of rapid feedback

The goal of developing any software should be to provide functionality useful to the majority of its users.

While doing business analysis or writing user stories for a feature of a project (especially those that are an attempted re-design of an existing one), it is important (and exciting) to brainstorm, be visionary, and think up great ideas for how you can please your customer base. However when planning those features for release, it is tempting to attempt to complete all of those stories before making the feature available to users.

The reasoning behind this argument usually sounds something like “our customers have used the product for years with these features, and they will not use it if they are not all present”. Another spin on this is “our competitor has these features and we will not be competitive without them”. There are several flaws in this argument.

  1. The argument assumes that users are currently using all the features. Unless you are measuring the use of the feature in the field (google analytics etc.) and have data to back up this claim, it is highly unlikely that a compelling offering could not be made available to users with a smaller subset of features.

    This applies to competitive analysis as well. Comparing your planned features to an existing product sheet will simply align you with them, which can be a disaster if many of their features are unused by their customers and you will now be spending money building them too. It also reduces your ability to differentiate yourself from them.

  2. The argument assumes that users will not provide accurate feedback on their needs of the software. When you choose to implement the kitchen sink around a feature, what you’re really saying is, “I know more about the user’s needs than they do, so I will decide everything to offer them”.

    When you go this route you spend excessive time getting to market, excessive capital implementing features that may not even be used, and place release cycle pressure on yourself by having a larger workload – making it less likely that you will be in the relaxed mindset necessary to listen to your customers and be able to respond to requests for changes.

    It’s more efficient and realistic to simply release the smallest subset of those features necessary to make initial use of them available, measure usage and gather feedback, and give users exactly what they want once they’ve used the feature. While it’s true that this approach can result in designs that are different from what you originally envisioned, your vision is not as important as the successful adoption of a feature by its users.

  3. The argument weights delivering assumed value over used value. What this means is that by focusing development on robust implementation of features that have not been even initially deployed to users, the backlog and priorities are being driven on assumed need. Even if your customers tell you they need a feature, unless you are measuring that they are using it in the field, and they are providing you with feedback that they like it, you are taking a risk with the effort needed to implement it. It makes sense to reduce that risk so that if you deploy a feature that turns out to not be useful, the lost capital is minimal.

Where I’m going with this is that organizations should spend serious time reviewing their backlogs of features, working with user experience experts to come up with designs that deliver the smallest, simplest design that accomplishes what you think the user needs and then get it out there. It is always more viable to bolt on a feature that you verify is needed after an initial offering than to spend money on assumptions only to find that it was a waste.

Re-trusting check constraints in SQL doesn’t help for NULLABLE columns

I’ve been going through a large database for a client of mine and finding foreign key and check constraints that are marked “untrusted”. This happens when a relationship between two tables has some rows with foreign key column values that don’t have a match in the related table. When this happens, Microsoft SQL Server can’t use the query optimizer as well to lookup matches between the two tables when running queries. This results in sub-optimal performance.

Unfortunately I discovered today, if the foreign key column accepts NULL, you can still run a query to re-enable the check constraint without error, but it will still be marked as “untrusted” in INFORMATION_SCHEMA and will not benefit from the query optimization available to trusted keys!

Hopefully this helps someone out there to reduce the work you need to do when determining a data optimization strategy around dealing with existing untrusted checks.