Using simplehtmldom API with Drupal to radically change node editing UI

In mid 2011 I took on an interesting code challenge and never got around to posting about it. The technique I describe here is available as part of my Drupal 6 module translation wysiwyg if you would like to see a demonstration of the result. This blog post talks about the way we use simplehtmldom API module to traverse the node body content produced by a wysiwyg editor - and pick out all of the translatable elements which we then render as individual fields in the node editor UI.

Still following? Awesome.

What simplehtmldom API does

This module brings the PHP Simple HTML DOM Parser into Drupal for use with your custom modules. It renders all of your HTML that you feed it as a tree of objects that you can perform operations on. If you have used JavaScript and/or JQuery you will probably feel somewhat comfortable working with it. It provides simple dom traversal, and then re-assembly of the HTML all in your PHP code.

How and what we want to parse

In the translation wysiwyg module we want to take the code from the default language version of a node and break it into strings.

  • The goal here is that editors of the default language will get their usual WYSIWYG editor.
  • Editors of translations of the node will get individual fields for each string in the body text.

So our module will have to look at the node before you begin editing to recognize if it is going to be a translation of the original node. If it is a translation we modify the node edit form.

To get the node editor to do what we want we need to do all of the following:

  • Find the default language version of the node and grab it's body text
  • Use the simplehtmldom API to find all of the h1, h2, h3, h4, h5, p and a tags that contain text
  • Check the values contained in each of those tags to see if they exist in the locales database tables
  • Render a tree of Drupal Forms API textareas for each of the text-containing tags listed above
  • Load the translated versions of the items found in the locales table as default values in Forms API
  • Unset the body field so that it does not appear

How we want to put it back together

The obvious problem that we're going to run into with all of these new form fields on our edit form is that we now must re-capture all of the items in the fields and put them in the appropriate places.

Re-enter simplehtmldom API!

Here are all of our steps to re-create the structure of our HTML body content while preserving all of the images, hr tags, object tags... all other tags!

  • Grab all of the submitted fields during the validation of the form
  • Re-load the body text of the default translation
  • Crawl through the tree of the original text, replacing each h1, h2, h3, h4, h5, p and a tags that received a translation
  • Each translated string is stored in the locales table for future editing
  • The new body text is taken from simplehtmldom and converted back to HTML
  • We put this new HTML back into Drupal's node body field and pass the results to the submit function
  • Drupal saves the "translated" version of the node

Note that for any images or custom HTML you put into your original nodes - translators did not have access to change any of that stuff. Only the text.

If you read this carefully you noted that we are now putting a huge sub-set of node body text into Drupal's locales table. This means that your translators could find these strings while searching within the translation interface - however they would not update the node content until the next time someone edits that node and thus loads the new default value for that header, paragraph or anchor tag they modified.

Where this method is really handy is when you have a translator return to a node after the original has been updated. If a new paragraph was added to the node, the only thing to translate is a blank field where the untranslated content occurs. This can be extremely handy.