Working with Drupal's Migrate module

UPDATE: if you're interested in this article you should check out the slides from my 4th migrate presentation. The registration system in migrate has changed ("automatic" registration is gone) and I've explained it all in better detail in the slides. I even cover the pathway of the variables if you're new to OOP. ...and yes, I'm available for support and/or workshops.

I have had the opportunity to work on two Drupal sites that involved the Migrate module. I have in the past worked fairly extensively with the Feeds module to do this type of thing. Feeds is a great and powerful module, but at a certain point you will probably find that you want to use Migrate.

Why use Migrate rather than Feeds?

  • Rollbacks. The migrate module makes it damn easy to rollback and re-import.
  • Rules in code. Any changes you do to your input file are stored in a coded template.

The downside as you have probably already guessed is that you're going to spend a lot of time writing custom code here. On the bright side, there isn't that much complicated stuff once you know what everything is trying to do.

Example Migrate Templates

These are the ways that do not involve using much code. The reality is that you *will* be doing code with almost any Migrate project. Before you get started the best things you can do are (1) read the manual on Drupal.org, and (2) find a reference template that is close to what you want to do.

There is one additional thing... the beer and the wine templates. These are part of the migrate_example package that is part of the main migrate module. Review these, the code is commented specifically for this purpose.

Also check out the following as possible starting points:

Additionally, there is a Wordpress conversion template, and ones for Typo3 and PhpBB. The Wordpress one has a UI and works quite flawlessly (though you will need to remap your image locations somehow).

Migrate and the Future of Drupal

Some people have suggested using Migrate as the main way to upgrade between versions of Drupal. My guess is that my team is going to use it a lot for D7->D8 migrations when the time comes, and I suspect that it will be the default way for D9.

Setting Up Your Dev Environment and Migrate Basics

One thing about Migrate that you should know is that some of the queries that will run will be enormously expensive processor-wise. Especially when things aren't working correctly. I find it is best to work from a local environment rather than a dev server until you have a working template. Additionally, it helps to have a process monitor showing the activity on the machine so you can see when things go wrong.

Another tip that you will need to know is the command drush_print_r($variable); in your PHP code that will help you to analyze the objects referenced in your code and to see what structure Drupal is expecting when you put stuff into $row->something; (prepareRow) or $entity->field_name[][0]['value']; (prepare).

Familiarize yourself with the following commands, as it is recommend to use drush to do your migration:

drush ms
drush mi $migration
drush mr $migration

You will probably need some debugging tools as well:

drush_print_r($var); - this command does what you would expect - dumps a variable to the command line. You put this in your PHP code.

drush_print_r($query->__toString()); - this will dump the MySQL query that is generated by Drupal's dynamic query engine. If you have one template that won't run, this can come in handy. You can adapt the output of this to run in MySQL directly - useful if you need to confirm that the query is causing memory problems. You can prepend the keyword EXPLAIN to the query generated for additional info from MySQL.

Anatomy of a Migrate template

The migrate module needs some basic definitions so that it will show up on the Migrate page, which is a tab on the Content list page. Most of the time "registration" of the migration classes is "automatic" but I have found that it is useful to use the tab on that page to force it to look for migrations.

In the case of Commerce Ubercart Migration module, there is also a settings tab which will appear for that. You will want to configure it to connect to the existing data set and it will create a bunch of "migrations" to show on the main Migrate page.

When your migrations are all listed, you can click one and see what fields are mapped and which are not. This is where you're going to spend a lot of your migrate time.

Typically you will have a custom module with the following:

  • custom.info file
  • custom.module file, blank... unless you want to use it for something*
    • *an optional API definition if not using automatic-registration
  • something.inc, your first migration, listed in your files in the info file
  • another.inc, your second migration, also listed in files of info file

Within your .inc files you will need to know about the following components which are almost always going to be in your migration:

  • class CommerceMigrateUbercartNodeMigration extends DynamicMigration class definition setting what migration will be used as a base. Then the following sub-components:
    • public function __construct(array $arguments); // initialize the migration, connect to source and grab the settings! Contains a reference to what will be used to map the migration (keeps track of changes to IDs)
    • public function prepareRow($row); // preprocess function, DOES NOT RUN FOR XML IMPORTS
    • function prepare($entity, stdClass $row); // last chance to do stuff. Can run SQL queries here, first param is always the target node/order/entity fully expanded, so put in your languages setting and delta values!
    • function complete($entity, stdClass $row); // rarely needed if you did things right. Can be used for doing post-migration tasks but could complicate your rollbacks, so be sure to test!

Initializing the Migration

With Migration templates you're dealing with Object Oriented Programming, not a staple in our Drupal PHP diet yet. If you're familiar with OOP you'll be right at home. Let's start out with a definition:

class CommerceMigrateUbercartNodeMigration extends DynamicMigration {
  // An array mapping D6 format names to this D7 databases formats.
  public $filter_format_mapping = array();

  public function __construct(array $arguments) {
    $this->arguments = $arguments;
    parent::__construct();
    $this->description = t('Import product nodes from Ubercart.');
    $dependency_name = 'CommerceMigrateUbercartProduct' . ucfirst($this->arguments['type']);
    $this->dependencies = array($dependency_name);

    // Create a map object for tracking the relationships between source rows
    $this->map = new MigrateSQLMap($this->machineName,
      array(
        'nid' => array(
          'type' => 'int',
          'unsigned' => TRUE,
          'not null' => TRUE,
          'description' => 'Ubercart node ID',
        ),
      ),
      MigrateDestinationNode::getKeySchema()
    );

There is a lot of stuff going on here. Firstly, we're extending a base class. That means this will apply to any objects of that class. So we can just assume that if the base class Migrate is running, that this class will run as well. What follows is a constructor, which what is going to run at startup.

The last part of this code block is the most interesting: $this->map. Here is how Drupal is going to track the objects in your migration. It is your key for each item in your import - and it is going to BREAK if you are trying to use XML as your import source. Then you're going to have to dig around in your DB and change that type from int to something else.

Connecting to Source(s)

The most exciting part about this part of the migration template is that you have many options. In this case we're looking at a D6->D7 migration, so in our case D6 is our source! Migrate takes care of the actual connection to the database so that part is easy, you can point it right at production to grab the data. As noted above, XML can also be your source(s). When I say source(s) I mean possibly plural... the example code provided will let you cycle through many XML files during the import process. Potentially really handy if your exports were run in batches.

Note, that the switch statement in the start of this module is related to the UI of Migrate, where the author of this module has made a switch available. Most migrations don't provide a settings form for Migrate but this one does.

    // Create a MigrateSource object, which manages retrieving the input data.
    $connection = commerce_migrate_ubercart_get_source_connection();

    $query = $connection->select('node', 'n');

    switch (SOURCE_DATABASE_DRUPAL_VERSION) {
      case 'd7':
        $query->leftJoin('field_data_body', 'fdb', 'n.nid = fdb.entity_id AND n.vid = revision_id');
        $query->leftJoin('field_data_field_campaign_id', 'cid', 'n.nid = cid.entity_id');
        $query->fields('n', array('nid', 'type', 'title', 'created', 'changed'))
              ->fields('fdb', array('body_value', 'body_summary', 'body_format', 'language'))
              ->fields('cid', array('field_campaign_id_value'))
              ->condition('n.type', $arguments['type'])
              ->distinct();
        break;
      case 'd6':
        $query->leftJoin('node_revisions', 'nr', 'n.nid = nr.nid AND n.vid = nr.vid');
        $query->leftJoin('filter_formats', 'ff', 'nr.format = ff.format');
        // ... and so on ...
    }
    $this->source = new MigrateSourceSQL($query, array(), NULL, array('map_joinable' => FALSE, 'skip_count' => TRUE)); // connect!

You probably noted the D7 and D6 options have different mappings. Take note. If you're going to extend one, do the one you intend to use.

When your main Migrate page stops working...

Eventually if you start adding fields your query will silently break, and will force Migrate's listing page to hang. Usually you can still access the page for the specific migration but since the purpose of these pages is to help guide your clients through the process this is a severe deal-breaker. Fortunately, it is easily solved one of two different ways. Either (1) don't bother counting how many tasks are to be done or (2) specify a basic query that can be tabulated quickly.

Here is how you do that:

$this->source = new MigrateSourceSQL($query, array(), NULL, array('map_joinable' => FALSE, 'skip_count' => TRUE));

In your source you specify skip_count as a parameter and you set it to TRUE. Now you're going to have an X where the number of items to be migrated was... but you can at least get to that page. Whew...

Adding each of your fields

This is where start getting fun... or really boring. Your pick. We now have to go through each field that will be brought into the database and create a relationship with the place where we want it to go.

Each field in your migration is going to have the following components:

  • A field in your target content type to hold the incoming data
  • An extension to the $query object to perform a LEFT JOIN if your source is in a table that is not in the base query
  • An extension to the $query object's fields to SELECT by column name
  • An extension to $this to connect the old field to the new field
  $query->leftJoin('content_field_location', 'loc', 'n.nid = loc.nid');

  $query->fields('n', array('nid', 'type', 'title', 'created', 'changed'))
             ->fields('loc', array('field_location_value'));

  $this->addFieldMapping('body', 'body_value') // add semicolon if no options
         ->arguments($arguments)  // optional
         ->defaultValue(''); // optional

This wouldn't be Drupal without some "gotchas". As I was adding fields in each of the migrations I have done eventually the main Migrate page stops listing out my migrations. This was REALLY aggravating and I spent at least a full week debugging this issue across the two migrations I have done. The solution? Stop counting how many things you are going to process (see the section above about using skip_count when connecting to the source.

It probably goes without saying... but test EVERY field after you have defined it. This is where you'll start finding weird stuff in your data. Unless, like nobody before you, your previous database is perfection in every way. Highly doubtful. As you discover problems in your code you can deal with them in processRow or process functions.

Processing the Data

Now that we have all of our fields coming into Drupal and our Migrate page has no errors on the main list and additionally no errors on our specific migration it is time to process the data! Most of the time this will not be necessary, but sometimes you must change things as they import so it will fit the new data structure.

  public function prepareRow($row) {
    // Transform body format to something we can use if it's not already.
    if (!filter_format_exists($row->body_format)) {
      $row->body_format = $this->transformFormatToMachineName($row->body_format);
    }
    $row->temp_data = "somejunk"; // will be passed to prepare function then dropped
    $row->field_text = "XXX"; // sets the value of a text field to the string "XXX" (works for many simple fields)
  }

Within your migration class you will probably always have a prepareRow function. This is where you can make some changes, or alternatively, reject a certain row before it gets imported by return FALSE;. This step DOES NOT RUN for any XML import you do, not even to carry over temp data. Gotcha!

  function prepare($entity, stdClass $row) {
    $entity->field_text['fr'][0] = 'YYY'; // manually set field_text French version (if using entity_translation), for delta 0 (the first instance if allowing multiple values) to YYY
  }

The prepare function is your second chance to change things using the entity pre-rendered, fully expanded into Drupal's typical array syntax. If you're doing XML processing, this is where you need to make any transformations. You can also grab any data you setup in prepareRow... or just run SQL queries here. Be forewarned... if you do SQL queries here it is generally better to load all of your data earlier on (in $this->query) so that you're not hammering your database with requests or blowing out the memory on your server. Those things are hard to detect.

  function complete($entity, stdClass $row) {
  }

The complete function runs at the end. In some cases it is necessary to make an update after the data has been imported. For example: if you are importing line items that are part of an order in Commerce, you will have a task here to update the order total after each line item has been imported. Migrate uses $this->map to track the progress and ID changes between all of your migrations. This function takes that into consideration.

So... sweet, we're already done and we don't even need to use the complete function! That was easy right? Sort-of. It is not always easy. It is a long time-consuming process to verify all the data is correct and to get this far. If you're considering taking on a Migrate project be sure to consider all the possibilities of problems that will come up with the old data. Quantify all of your content types and entities that will be migrated and every individual field contained therein. Expect that each field will need a certain level of review and base your estimates/calculations on that.

Getting started with multilingual content in Drupal 8

Today I re-installed my copy of Drupal 8 that I use for testing and I must say... wow, there is so much good stuff in here. I resisted upgrading to Drupal 7 with my projects when it first came out and for good reason... many modules that are necessary for my job were not ready when the core module was. I understand this was an improvement over previous upgrades but with Drupal 8 I'm going to be ready to dive in the first day that Drupal 8 is released. This is awesome.

Getting Started Working with Core Dev

First, you're going to need a recent version of PHP. For this reason, I had to use my laptop as a host because my development server uses an older version and I'm not ready to stop production for a beta test. Once you have a suitable environment ready we can proceed with installation.

When the download is complete, visit that site on your webserver. You will now begin the standard Drupal install process. But wait! We have a few extra things to note...

  • I browse in French, so it asked me to install in French!
  • I had to create both files and files/translations in sites/defualt and give them both appropriate permissions (777?)
  • Copy default.settings.php in your sites/default folder to settings.php ... permission it the same way as above
  • Put in your database settings. I picked "minimal" configuration, as I always do.

As I was doing these steps Drupal fetched the French translations for me and did most of the installation in French. Drupal, you are so awesome.

Post-Install Configuration

I do a few things to get setup in my new dev/beta environment because I picked the "minimal" installer.

  • Switch to Bartik theme
  • Enable "color" module and change Bartik's blue to something nicer
  • Enable "toolbar" module so you don't have to click around all over the place
  • Create a "page" content type
  • Enable "field ui" module

Multilingual Config

  • Enable the "content translation" module (the other multilingual modules, "interface translation" and "language" were already enabled for me)
  • Go to Configuration > Regional and Language > Content language settings and enable "content" (thanks Gábor for pointing this out). You can also enable which fields get translated on this page but you may need to enable the content type translation (see next two steps) first.
  • Edit the "page" content type, enable translation on the type
  • Go to "manage fields" for the "page" content type, edit the "body" field, go to the "field settings" tab, enable translation of the field (you must have >1 fields enabled for translation or your page will just redirect to the "manage fields" page
  • Now go edit some content! You will first need to create a node, and then click the translate tab to add a translation
  • You will need a language switcher. I found this hard to figure out at first, but then I realized that most blocks don't exist by default, so if you go to Structure > Add blocks you will be presented with a list of blocks you can create, one of which is a language switcher for content

Now you should be able to create content and switch between the languages. It seems that if a field is blank in one language, it will be sourced from a different language where that content is available.

What Needs Work (Jan 18 2013)

So far I have only noticed a few issues (Jan 18 2013)...

  • The title field is not (yet) translatable
  • If fields aren't translatable on a node type, you get redirected to the "manage fields" page but no message to tell you why you are there
  • Not knowing how to add a language switcher block was a bit confusing
  • The "check for updates now" link is really low profile, I tried to add additional languages but they all have only 7.x branches of the translation files so this link did not work unfortunately
  • The new toolbar does not have icons for the translated versions of the items. Not sure why these little pictures need to be translated/translatable!

Overall these are very minor concerns that anyone who is familiar with Drupal should be able to overcome easily. Especially if you have been doing entity translation in Drupal 7. You'll probably note that the old method of having duplicate nodes of each piece of content, one node per language, is officially gone. This is not a possible way to translate in Drupal 8. Good.

How to Help

Drupal.org is the best place to look for information on how to contribute but I'll try to make a quick summary of the types of places you can contribute.

  • First, look at the issue queue for Drupal 8 issues on the Drupal.org project page for Drupal noted at the start of this article. Read over the issues and comment on them if you can summarize and/or offer feedback on the progress that is being made.
  • Once you are aware of the issues, and you have done some testing on your own, create new issues for items that should be done prior to release. I will be looking at the items I mentioned above as my next steps.
  • Create patches for issues where you are able to fix the code. Since you used git to get the latest version at the start, you can just run git to create your patch: git diff >ddo-issue-number-comment-number.patch ... then take that patch over to the relevant issue and post it to drupal.org!
  • Once you're comfortable with creating patches, you can also test some patches! They are a bit harder to manage sometimes, as you may need to look at the path contained in the patch file. In general, you can probably go to the drupal root and run patch < patchfile.patch. Cross your fingers and hope for no rejects. Then test it and provide feedback on the issue.
  • Join an IRC channel. If you are at a code sprint, this is how we share links during discussion. At other times, IRC is a good substitute when you need to ask a question but you're developing on your own.

Finally, after you've done a few of these things, you're going to come back to the site in a day or two and... throw the whole thing in the garbage. I wish I were joking. Since Drupal 8 is still in development at the time of this writing, major changes can happen that break your config. So often, you will simply want to wipe out your dev site and start over rather than face bugs that are undocumented and temporary. Best to start with the latest, each time you start... unless you're reading every commit message and doing a git diff to see what happened.

Following the Initiatives

There are many "core initiatives" which are focused areas of work on Drupal 8. I have been most involved in Drupal 8 Multilingual... or D8MI. If you search D8MI as a tag on the Drupal project page, you will see what work is to be done. This is possible because the D8MI volunteers have tagged the issues and Gábor Hojtsy, the initiative leader, does an excellent job of keeping us organized.

You can also check out the individual sites for each initiative to see what issues need the most work. Joining the initiative on groups.drupal.org is a good way to stay informed, and each group also hosts IRC meetings on a regular basis.

Drupal 8 Core Initiatives list
Drupal 8 Multilingual homepage (see "focus" page for top priorities)

The All-Purpose Drupal Entity Translation Guide

Great news Drupalists! Fresh for 2013 is the new 1.0 version of the Entity module. That means we're on stable ground to build our entity-based sites that allow us to translate anything in Drupal. To celebrate the 1.0 milestone, we're finalizing and documenting our build recipe that we use to build multilingual sites.

Modules

locale
entity and entity_token
entity_translation
i18n and i18n_menu
l10n_update
token
pathauto
transliteration

Install

First enable the locale and the l10n_update modules. I put the l10n_update in there because I like to see the translated Drupal strings in all their glory.

Go to Configuration > Languages

Enable your second, third, or more additional languages. Drupal will fetch the strings for each language for the modules that you have installed using the l10n_update module.

Edit the English language, add a prefix "en". This will help us deal with English when we setup language negotiation.

Go to Configuration > Languages > Detection and Selection.

Enable path translation, and put it at the top. Then I would enable the browser setting as a fallback. This means that if a user with a French web browser comes to our site, we will give them FR content, unless they explicitly put a different language in the path. For this reason we don't need to worry about what "default" language the site has - the site will always default to the user's default.

Ok. We have languages and a way to manage which one we're using. Now we need an interface to do that.

Go to Structure > Blocks.

Enable the "Language switcher (user interface)" block. Put it somewhere you can get at easily so we can do some testing with it. If you need to change the text name for the language that appears in this block, you can go back to the Config > Languages page and update the title there.

At this point you've configured Drupal to do multiple languages, negotiate between them, and you have an interface to do that as a user. Additionally, Drupal translated itself by fetching all the strings from drupal.org for each module you have installed. As you add more modules, l10n_update will continue to pull down new updates to the text. So to your translators, Drupal will provide service in all the languages that you support. Handy.

Translating Content with Entity Translation

Enabling Entity Translation

Go to Structure > Content Types > Edit the type you want

Publication options: Enabled, with field translation.

Then go to Structure > Content Types > Manage fields for the type you enabled above

Edit the body field, scroll to the bottom, and check the box to enable translation. Repeat this step for every field that you wish to translate. Leave it unchecked if you wish to have the content synchronize all versions.

For the title field, get the title module. Enable it, then go to Structure > Content Types > Manage fields

Then choose the "replace" option for the title field. This will create a new field_title that behaves the same way as the traditional field, but with multilingual features. Edit this field. Remove the description text. Save.

Revisioning

Entity now fully supports revisions. Each update to a translation creates a new revision to the host node.

Rollbacks of versions are not separated by language, so if you must roll back, do it before someone adds new content in a different language. Rolling back will affect all languages that have been updated between the revisions.

Making Menus Entity Translatable

Entities can now have translatable menus. You can structure your entity-based menus two different ways. The entity_translation_menus module takes care of setting up a different menu entry for each version of a page. That is awesome. How you structure the menus are therefor up to you.

You can:

  • Create one menu that has all the language versions grouped together, or...
  • Create a copy of the menu for each version of the language, and allow editors to put the entry in any of those trees when they select where it goes

With the first strategy, if you go to Structure > Menus > Main Menu > List links you will see all the pages at once, but only on this page and when the editor puts something into the site (if you have big menus this could quickly become a big jumbly multi-language mess).

The second method means that when the user selects where they want to put the content into the menu, they must first find the right language menu and put it within the same language part of the site. In other words, the drop-down menu that lets you pick where the menu entry will be, is grouped by language.

This one is a workflow/editorial question I would consider depending on the size of the site and how many languages are being supported. In any case, the decision doesn't matter because you have only one option to configure:

Structure > Menu > Edit menu > Translate and localize (even if you are using it as fixed language)

Then, when you edit... remember to check off "Menu link enabled only for the English language" when you do the original version of the post. It would be really nice if that was set to be the default somehow, but I have not found any settings for this yet.

Entity Translated Paths

By default the pathauto module will do most of what you need here. It will generate aliases for each language version of a node. You can also enable the entity_tokens module to give yourself some additional translated tokens if you need to create some more complex URLs.

Since we are dealing with a multilingual site here, there is one additional module I have to tell you about: transliteration.

This module converts non-ascii characters to ascii for the URL. So rather than "catégories" becoming "cat_gories" we get "categories" as you would more logically expect.

Enable the transliteration module, then:

Configuration > URL aliases > Transliterate prior to creating alias. Done.

Date Formats

This applies more to specific field configurations, but it is likely something you are going to have to take a look at. If you go to the following configuration page you will find your options:

Configuration > Date and Time > Localize

You can also change the first day of the week in the site to change how calendars will be displayed. You can do that at Configuration > Regional Settings.

Translating Blocks

Use the i18n_blocks module. When you create a new block, you will need to specify what language it is. Usually you will want to create blocks with a fixed language, as otherwise you will need to search for the block content as a string. So far we've managed to stay away from Drupal's built-in translation engine, and I like that, because we should only really need to use it to translate strings in Drupal's interface.

Blocks are not the greatest with translations but they do the job. In some cases, it might be easier to just create a new content type and display the translated data with a View.

Hosting your own Git-based shared repositories using SSH

Git has become one of the most important tools in a developer's toolkit. To a Drupal developer, it is even more critical as nearly everyone in the community has standardized on it. While there are many great Git hosting services out there, sometimes clients need to have only local copies and Git is all about making each copy a distinctive repo to itself... so why not create your own Git host on your own server? That is what we are here to do today.

Requirements

This recipe is for any Linux host that has Git installed. It requires SSH as it will be used for managing the connections with your users. By default, SSH uses the Linux system's user accounts as an authentication system (known as "auth" method) but if your needs require it, you can also use SSH modules to plug into your local LDAP or ActiveDirectory® authentication systems.

One thing that will be of great importance in this tutorial is permissioning the users correctly and setting up a deployment action that suits your needs best. The strategies we use here may be adapted to your own use cases.

Getting Started with Git as a Host

By now I'm sure you've probably heard the philosophy behind Git is that every repo contains all the history of a project and any copy can become the master copy if the original is lost. While this is great in principle, in reality, to share Git with others we will need to setup a special type of repository that is accessible to your system's users.

Installing Applications

First, let's make sure we have Git and SSH installed. On Debian or Ubuntu the command to install Git is as follows:

apt-get install git-core openssh-server

There is no special version of Git to do shared repositories, the standard one will do it all.

Storage

You will need to create yourself a folder where your repositories will be stored. In my case I'm creating a new directory right in the root of the server so that my users will have a nice path to work with when I give them access to the server.

The storage should *not* be your production webserver. You need to put it somewhere that is not live to your users as the shared repository has a bunch of files you don't want to put into a production environment.

mkdir /projects

I actually created my projects folder under /var/projects and just created a symlink here, to better integrate with our existing backup processes.

Grouping the Users

Make sure that we have a group for our users. I'm using the group name "webmasters" but you may already have a group established for your team. If that is the case, use the group you are already using.

addgroup webmasters

We will have to do additional work on the user account to make this work... but for now this is enough.

Initializing the Shared Repository

Now we will create a new project called "newsite". When your colleagues connect to the site the path will be /projects/newsite.git with this configuration.

cd /projects
git init --bare --shared newsite.git
chgrp -R webmasters newsite.git

Adding Users to the Mix

If you already have users on your site, great. If not...

adduser newdeveloper webmasters
usermod -a -G webmasters newdeveloper

The usermod step is necessary so that each time your user, newdeveloper, creates a file, that it will be permissioned to the entire group. This will allow other users to modify the file if it was created by another user.

There is one last step to get the permissions structure just right. By default, most Linux systems only allow user files to be edited by the user who created it, even though you have put the file into the group. There are many strategies for how to override this. My personal favourite is to change the system umask value to apply the same permission for the owner to the group as well.

To make a global change to enable "group writeable" by default in Debian or Ubuntu do the following:

Edit the file /etc/profile with your favourite text editor.
Add umask 002 to the end of the file. If you already have a umask value, you can change it rather than adding a new line.

You can also add the umask 002 line to the user's ~/.bashrc file if you wish to do per-user setup for this.

Be sure to test that this is working by logging in as your new user by doing su newdeveloper and then typing cd to go to their home folder (note, be sure to login after making the change), then in the user's home directory try doing touch testfile followed by ls -la | grep testfile. You should see the following output:

-rw-rw-r-- 1 newdeveloper webmasters    0 2012-12-20 13:17 testfile

In particular: look at the codes at the start. If you see -rw-r--r-- then umask is not set correctly for some reason. You should also see newdeveloper and webmasters as the user and group respectively. If not, go back to the step where you set the user's group to be set to new files by default.

Does it all look ok? Then rm testfile and log out of your new user's account. The Control-D key will get you out of their account fast. ;)

Keep in mind there are other methods for doing this. If you already have a different system for managing group ownership of files, you will probably want to stick to the system you are already using if it is appropriate for your use case.

Accessing the Repository

Your repository can now be accessed using the following paths. Keep in mind, if it is the first time you clone your repository it will warn you that you are cloning an empty repository. That is ok! You can add some files later and push up to the server so that the next person to clone does not get that message.

From the same (local) machine:

git clone file:///projects/newsite.git
cd newsite

From a remote computer anywhere on the Internet:

git clone newdeveloper@example.com:/projects/newsite.git
cd newsite

If you are using a remote computer, you will be asked for your password unless you have added your public key from your remote computer to the user's account on the server.

For Bonus Points, auto-checkout into stage

There is one critical thing that you will want to consider before you go live. How are you going to update your staging environment? By default there IS an action performed when users push new updates to git, defined in the shared repository's hooks folder (under /projects/newsite.git/hooks in the file system), in the post-commit file. One word of warning here though - it will run as the user who does the commit. So your staging environment will constantly have permission errors. Ideally your stage environment probably has one user who is in control of it.

To fix this, a really crude way, I rigged up a script that checks for updates every 30 seconds. Eventually I'll come up with something better, an action that can be taken by any user that doesn't involve giving everyone sudo access to the stage user. Run this "daemon" as a script from cron as the user you want to be responsible for stage:


#!/bin/bash
cd /stage
while [ 'FALSE' != 'TRUE' ]
do
  git pull origin master
  sleep 30
done

WARNING: it should be obvious that this code won't scale... and will waste some resources unnecessarily; you've been warned!

In my second crude attempt at solving this issue I have taken the following approach:

1. Create a hooks/post-receive file inside your repo
2. Set this file to echo your destination path into a queue file:
echo "/var/www/newsite" >> /projects/queue
3. Create the /projects/queue file: touch /projects/queue && chown root:webmasters /projects/queue && chmod 660 /projects/queue
4. Create a checkout script:

while read PATH;
do
    cd $PATH;
    /bin/su target_username -c "/usr/bin/git pull origin master"
done

5. Then create a watcher to trigger that checkout script:

echo "" >/projects/queue  # empty the queue first
tail -f /projects/queue | /usr/local/bin/checkout

This solves the issue of having multiple users accessing the repository because you specify a user to run the checkout. All the users are able to write to the queue file, and the watcher just keeps an eye on that file. Since the watcher must sudo into another user's account to do the checkout, we can run the watcher as root and there is no possibility to any of our users figuring out they can sudo as someone else - because we don't use sudo at all.

You should add your watcher to your startup scripts.

More Bonus Points, disable SSH interactive mode for some users, and allow logins without passwords

All the Drupal people in the house be rolling their eyes right now. This can be considered a sort of cruel and unusual punishment by some... however, in some cases it is handy, for example, when you have a designer changing theme files but who shouldn't be able to get into all your databases and other things.

This is really simple to accomplish, but it comes with another caveat:

usermod newdeveloper -s /usr/bin/git-shell

So what is the caveat? It ignores your public keys! So if you get the user to generate a public key on their desktop/laptop/whatever, and they give it to you to put in their /home/newdeveloper/.ssh/authorized_keys on the git server... it will totally not do a thing. Jerk!

If you use this method the user must type their password every single commit, even if you setup public keys. Can be annoying...

The recommendation for dealing with public keys is to have the user login to SSH normally, then drop the user into git-shell. I'll be rolling this out soon so I can collect some of these bonus points.

Have your user generate the public key. You may wish to avoid RSA because some server-wide sshd_config files have it off by default. I have used DSA in this example, if you use a different encoding, just make sure you use the associated id_XXX.pub file for that.

On the developer's machine, grab the existing .ssh/id_dsa.pub or generate one using:

ssh-keygen -t dsa

Be sure to leave the challenge response blank. Then copy the contents of the id_dsa.pub file to the server. The contents of the file should be appended to the .ssh/authorized_keys file on the server... then... the important stuff:

Back on the server:

chmod 700 /home/newdeveloper/.ssh
chmod 600 /home/newdeveloper/.ssh/authorized_keys

That is it! Now the user should be able to log in automatically, and they will not be able to SSH into the host... only to use git to post the files.