Saturday 3 September 2011

Enforce code formatting rules with SVN Checker

As a code base grows larger, clean code greatly helps maintenance and integrating new team members. And let's face it, bade code formatting rarely goes hand in hand with good software design. It's like the two are related...

There's an Eclipse plugin that you can use, based on the Checkstyle tool. With it, you can a) define code style rules; and b) make sure that your code abides by them.

Problem is, that's not really useful if you're the only one following the rules. Really, the only way to make sure everyone does is to enforce the rules at commit time. If you're using Subversion, there's a nifty   utility that you can use to actually check the code that's commited, on the SVN server's side: SVN Checker. The tool uses a SVN feature called "hooks".

I do not intend to provide you with a full tutorial on how to set it up, since there's documentation around already (for example I have used this, and this one also).

The problem is, in the version I use (0.3), I've hit a bug/limitation in the Python script that consists of the actual Checkstyle hook (Checkstyle.py): first, in the line error message that would be displayed, the problematic file names would be preppended with a "/tmp/" string - most likely because files that are checked in are kept in that directory on the server-side before being committed. Second, the message would not clearly list all files for which style checks have failed...

I thus proceeded to fixing the script, and here's the results (my Python skills are not top notch, so bare with me. But it works):


# Checkstyle.py
# Copyright 2008 German Aerospace Center (DLR)
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


""" Checks Java files for coding style errors using Checkstyle. """


from modules import Process


def run(transaction, config):


    check = config.getArray("Checkstyle.CheckFiles", [".*\.java"])
    ignore = config.getArray("Checkstyle.IgnoreFiles", [])
    files = transaction.getFiles(check, ignore)


    java = config.getString("Checkstyle.Java")
    classpath = config.getString("Checkstyle.Classpath")
    config = config.getString("Checkstyle.ConfigFile")
    
    command = "%s -classpath %s com.puppycrawl.tools.checkstyle.Main -c %s " % (java, classpath, config)


    files = [transaction.getFile(oneFile[0]) for oneFile in files.iteritems() if oneFile[1] in ["A", "U", "UU"]]


    if(len(files)) > 0:
      try:
          output = Process.execute(command + " ".join(files))
          if "warning" in output:
             msg = "Coding style errors found:\n\n"
             outputLines = output.splitlines(True)
             for line in outputLines:
               if "warning" in line:
                 if line.startswith("/tmp/tmp"):
                   line = line.replace("/tmp/tmp", "")
                   slashIndex = line.find("/")
                   if slashIndex >= 0:
                     line = line[slashIndex:]
                 msg += line  
             return (msg, 1)
      except Process.ProcessException, e:
          msg = "Coding style errors found:\n\n"
          msg += e.output + "\n"
          msg += "See Checkstyle documentation for a detailed description: http://checkstyle.sourceforge.net/"
          return (msg, 1)


    return ("", 0)

Saturday 18 June 2011

Confession of a Maven Convert

Yes, I admit, it took me quite some time. In my eyes, Ant was the ultimate build tool: it is extremely flexible, customizable, widely adopted, and mature. Maven seemed heavy, rigid. It imposed a structure. It had a cryptic vocabulary ("build phase", "goal"...). But with time I came to the conclusion that Ant, although a great build tool, is not a build system. In a few bullets, Ant has shortcomings that become evident when projects grow:

  • It does not support dependencies built-in. Granted, one can use Maven's dependency management Ant tasks, or Ivy. But then that's not solving the other shortcomings... 
  • It does not impose a release process: with Ant, you're left to implement your own process. Maven suggests a structure and implies a process: how software modules are organized, how source code is structured in the repository, how release branches are created, how snapshots are published... The fact that it does so, down the road and contrary to popular belief, does not make it rigid or heavy. It makes it robust.
  • Ant's flexibility becomes a management nightmare when the number of software modules increases: each module then has its own build script that inevitably diverges from others, even if "conventions" are established (let's face it who documents those, let alone reads them?). Thus one inevitably ends up with a plethora of heterogeneous build scripts that become very hard to maintain.
To manage build complexity through Ant, I often resorted to using file includes in order to mimick a kind of hierarchy, declaring global variables and "targets" in parent scripts that could be reused in child scripts. Finally, I ended up with Maven implemented in Ant.

My prejudiced view of Maven (I admit) stemmed from the fact that I had really never had time to explore it. Of course not, I was busy developing software and... Maintaining an Ant-based state-of-the-art custom build system.

I know, Maven is not perfect. It has its share of detractors, and there are other kids on the block in the build system world (Ivy, Gradl, Builr...). To tell you the truth I don't care: there might be build systems that sport a more expressive, less verbose configuration that does away with XML; there might be others that are "more lightweight", "integrate with Ant", are more sophisticated in the way they manage dependencies... 

Maven, though, despite its flaws, gets the job done. Plus I don't buy into the XML configuration rants: there are plenty of Maven snippets out there to copy and paste from, and really I find XML better adapted to configuration than some DSL.

Some people claim the Gradl, Builr, and other SBTs, supporting a build configuration expressed in a native programming language, allow adding programming logic (loops, conditional blocks) into builds, hence more  power and flexibility. I don't buy it: my experience, again, is that this incurs a management nightmare as projects grow large. I've found that Maven plugins are more robust when customization is required (see further down on plugins).

Anyhow here's to get you started quickly if you are considering the plunge but are still hesitating. They're tips I would have liked to have when I took MY plunge...

So here you go, a couple of tips that'll help you get started quickly if you're still wondering. 

1) Use Maven's multi-module support

For large-scale products, composed of multiple sub-modules, use Maven's multi-module support. 

2) Create the parent module directory at the same level as the others.

Given a product composed of multiple modules, it might be tempting to have a structure of the sort:

/my_product
  pom.xml
  /module-1
    pom.xml
  /module-2
    pom.xml

Here we have the parent POM directly under the product directory. That structure seems natural, but it might prove impractical when checking in/checking out projects into/from the source repository. The checkout part, especially, could be problematic: in Eclipse, when using the "create project from repository" functionality that's provided by most SCM plugins, you'd check out from <module-1> or <module-2>, for example... You'd then be missing the parent POM module.

Thus, it is better to have a child directory for the parent POM, and adjust the path to the child POMs accordingly. Here's the alternative structure:

/my_product
  /parent-module
    pom.xml
  /module-1
    pom.xml
  /module-2
    pom.xml

And here's the adjusted module declaration section in the parent POM:

  <modules>
    <module>../module-1</module>
    <module>../module-2</module>
  </modules>

3) Use the parent POM to centralize the build's configuration

In a manner that's consistent with Maven's support for parent-child relationships between modules, child modules can inherit their configuration from their parent module, and can override it if desired. Thus, dependency declarations, plugins, the various project configuration items (group ID, developpers, mailing lists, etc.) supported by the POM, can (and should) all be configured at the level of the parent.

4) Centralize dependency and plugin declarations

To further ease the management of dependencies, declare a dependency management section in the parent POM, which allows centrally managing the library versions used accross your modules. For example, let's say you're using Log4j, you could add it to the dependencyManagement element of the parent POM:

<dependencyManagement>
  <dependencies>
    <dependency>
      <groupId>log4j</groupId>
      <artifactId>log4j</artifactId>
      <version>1.2.16</version>
    </dependency>
  </dependencies>     
</dependencyManagement>

Then, from within your child POMs, you can omit the version number in the dependency declarations (in this case, the version specified in the parent POM will take effect):

<dependencies>
  <dependency>
    <groupId>log4j</groupId>
    <artifactId>log4j</artifactId>
  </dependency>
</dependencies>

In the same manner, declare all your plugin version in the pluginManagement element of the parent POM: 

<pluginManagement>
  <plugins>
    <plugin>
      <artifactId>maven-antrun-plugin</artifactId>
      <version>1.3</version>
    </plugin>  
  </plugins>
</pluginManagement>

5) Include your own modules in the dependencyManagement section

Let's say you have multiple modules: module-1, module-2 that depends on module-1, and module-3 that itself depends on the two previous ones. Declare a dependencyManagement section in the parent POM, and add your modules to it, as such:

<dependencyManagement>
  <dependencies>
    <dependency>
      <groupId>myorganization</groupId>
      <artifactId>module-1</artifactId>
      <version>${project.version}</version>
    </dependency>
    <dependency>
      <groupId>myorganization</groupId>
      <artifactId>module-2</artifactId>
      <version>${project.version}</version>
    </dependency>     
    <dependency>
      <groupId>myorganization</groupId>
      <artifactId>module-3</artifactId>
      <version>${project.version}</version>
    </dependency>         
  </dependencies>     
</dependencyManagement>

Note the ${project.version} variable: it will be interpolated at execution time with the parent POM's own version. Now, in the child modules, declare interdependencies by omitting the version (just as you would for any external library also declared in the dependencyManagement section).

The advantage with this approach is that when the version of your product changes, you update the version at a single place (in the parent POM's own <version> element. 

6) Use Maven for releasing also, not just for building

Maven's release plugin uses SCM integration to automatically create release branches and commits the released code to the branche that it creates for a specific release. Maven expects the following directory layout in the SCM system:

/trunk
/branches
/tags

For a given development, you'll typically develop under a specific branch (manually created under /branches). Then, once finished, you'll merge to the trunk (note that some people prefer doing all development in the trunk and not creating development branches at all). From the trunk, you'll invoke mvn release:prepare, mvn release:perform. As part of these two commands, Maven creates a specific branche under "tags", and commits to it the source that you've just released.

Before committing, Maven will have updated the version of your project in the POM from (for example) myproduct-1.0-SNAPSHOT to myproduct-1.0. Indeed, releases in Maven are always created from snapshots (which are identified by this -SNAPSHOT suffix).

7) Let the Maven release plugin handle the parent POM version in the child POMs

Remember that your child POMs refer to their parent, as shown below: 

  <parent>
    <groupId>com.acme.myproduct</groupId>
    <artifactId>parent</artifactId>
    <version>1.0-SNAPSHOT</version>
  </parent>

You should not modify the parent's version in the child POMs by hand. This is cumbersome and error-prone. At release time (when doing the release with release:prepare and release:perform - see previous tip), Maven will update the child POMs so their parent's version corresponds to the one in the parent POM.

If you ever end up with a desynchronization at that level, you can use the versions:update-child-modules plugin.

You'll probably ask: "why must the child have the parent's version, since the parent module knows the child anyhow?". Well, first off you might not build from the parent module all the time. Sometimes you'll do it from a specific child module, for the sake of verifying a fix, etc. In such a case, how would the child know to which version of the parent it should point? 

Yet when building from the parent, it's true that it would be nice to omit this version maintenance, and there might be a possible workaround even when building from children. It's in the Maven 3.1 roadmap  to have this fixed, apparently. 

8) Stick to the Maven best practices

Do not over-customize. For example, Maven expects a "standard" directory layout. The behavior of certain plugins expects that layout and changing it (which is supported through configuration) may cause some plugins to fail or behave erratically if their own configuration is not adjusted. Therefore, it is better to stick to the standard layout.


Here are links to other best practices:

9) Isolate custom build logic in plugins

One best practice deserves it's own section...: if you have custom build logic, isolate it in a Maven plugin. As a workaround, you may use Maven's AntRun plugin. But that's a workaround: soon you'll end up with heterogeneous Ant-based build workflows spread across your Maven projects, setting you back to the Ant days.

Maven plugins are a better option: they fit "natively" within a Maven build, following the configuration standard set by Maven. Furthermore, Maven plugins have their own classpath at runtime (and therefore, classloader), their dependencies are loaded in the context of that classpath  - since the plugins you develop are built and deployed with Maven, they declared dependencies as part of a POM. These dependencies are automatically resolved when plugins are used.
For Sapia we've implemented our own documentation generation plugin, which is used across all Sapia sub-projects. It has worked great and allowed centralizing all look-and-feel logic at one place (as opposed to the the previous Ant-based document generation stuff, which eventually ended up being updated on a per-project basis). And guess what: we've reused Ant tasks programmatically as part of the plugin's code (for generating XSL, copying files around...). So you can still benefit from Ant's huge code base, but in a more robust way.