Using simplehtmldom API with Drupal to radically change node editing UI

In mid 2011 I took on an interesting code challenge and never got around to posting about it. The technique I describe here is available as part of my Drupal 6 module translation wysiwyg if you would like to see a demonstration of the result. This blog post talks about the way we use simplehtmldom API module to traverse the node body content produced by a wysiwyg editor - and pick out all of the translatable elements which we then render as individual fields in the node editor UI.

Still following? Awesome.

What simplehtmldom API does

This module brings the PHP Simple HTML DOM Parser into Drupal for use with your custom modules. It renders all of your HTML that you feed it as a tree of objects that you can perform operations on. If you have used JavaScript and/or JQuery you will probably feel somewhat comfortable working with it. It provides simple dom traversal, and then re-assembly of the HTML all in your PHP code.

How and what we want to parse

In the translation wysiwyg module we want to take the code from the default language version of a node and break it into strings.

  • The goal here is that editors of the default language will get their usual WYSIWYG editor.
  • Editors of translations of the node will get individual fields for each string in the body text.

So our module will have to look at the node before you begin editing to recognize if it is going to be a translation of the original node. If it is a translation we modify the node edit form.

To get the node editor to do what we want we need to do all of the following:

  • Find the default language version of the node and grab it's body text
  • Use the simplehtmldom API to find all of the h1, h2, h3, h4, h5, p and a tags that contain text
  • Check the values contained in each of those tags to see if they exist in the locales database tables
  • Render a tree of Drupal Forms API textareas for each of the text-containing tags listed above
  • Load the translated versions of the items found in the locales table as default values in Forms API
  • Unset the body field so that it does not appear

How we want to put it back together

The obvious problem that we're going to run into with all of these new form fields on our edit form is that we now must re-capture all of the items in the fields and put them in the appropriate places.

Re-enter simplehtmldom API!

Here are all of our steps to re-create the structure of our HTML body content while preserving all of the images, hr tags, object tags... all other tags!

  • Grab all of the submitted fields during the validation of the form
  • Re-load the body text of the default translation
  • Crawl through the tree of the original text, replacing each h1, h2, h3, h4, h5, p and a tags that received a translation
  • Each translated string is stored in the locales table for future editing
  • The new body text is taken from simplehtmldom and converted back to HTML
  • We put this new HTML back into Drupal's node body field and pass the results to the submit function
  • Drupal saves the "translated" version of the node

Note that for any images or custom HTML you put into your original nodes - translators did not have access to change any of that stuff. Only the text.

If you read this carefully you noted that we are now putting a huge sub-set of node body text into Drupal's locales table. This means that your translators could find these strings while searching within the translation interface - however they would not update the node content until the next time someone edits that node and thus loads the new default value for that header, paragraph or anchor tag they modified.

Where this method is really handy is when you have a translator return to a node after the original has been updated. If a new paragraph was added to the node, the only thing to translate is a blank field where the untranslated content occurs. This can be extremely handy.

How I moved from the west coast to the east coast

Occasionally I get questions from friends who are planning to move. How should I approach this? My answer is nearly always the same: just do it.

My move was planned out for years but I did not have a detailed logistical plan. My partner and I had been planning to move together for a few years but my initial plan to move was even years before that. When I finally made the decision to move none of that really mattered.

What you need to bring is minimal, so give yourself enough time and/or a storage locker to get all of your furniture out of your home. Beyond this, the rest is easy.

  • Find a friend in your destination city. Stay with them awhile. If they have an available room, even better! Might also be possible for them to look at places for you.
  • If you have pets, get a carrier for the flight. Check your airline for pet blackout dates.
  • On the pets topic, get a *large* carrier to store up to two pets while in transit.
  • Buy your plane ticket! Be sure to pay any pet cargo fees after you buy.
  • Get a postal redirect to your new/temporary address.
  • Buy "bankers boxes" for things you wish to bring with you. I would suggest 8 of them.

On that last point, the "bankers box" format is perfect because you can load it up with books and/or heavy dishes and still meet postal regulations.

Yup, you're going to send these by post! It is the cheapest method for few items. Greyhound is ok too, but requires you to drop off and pickup the items. In Canada, you can send things by "expedited mail" (faster than standard, slower than express). By doing so, you can opt for insurance and for a signature on delivery. The boxes will come right to your door if you are home. Each box cost me about $50 on the high end to send (I think probably $35 on the low end). Be sure to check the maximum dimensions for the post and also the maximum weight. I also purchased some 2'x2' cubes (4 square feet) from Budget rent-a-car. They are perfectly postal + airline size compliant however it is very easy to overload them. Send the boxes a day or two before you leave.

Pack your angry cats and go! Transport Canada will ask you to remove the pets from the carrier so they can swab it for potential cuteness explosions. An airline attendant helped me hold the cats as they do *not* like being in an airport. Fortunately they were both too scared to run amok at the airport.

When you arrive you'll remember everything you forgot. So go buy those things at a local shop. Get ready to replace a lot of stuff. I highly recommend checking the classified sites like Kijiji and Craigslist for those items.

The Kafei-Infinite StatusNet theme

Announcing our latest StatusNet theme!

This new theme goes a bit further than our last one - we worked from the default StatusNet base theme this time. The goal with this theme is to support mobile devices, though we ran into a couple limitations of the software so your mileage may vary on that front.

The theme supports the InfiniteScroll module well, and should work fine with your Realtime plugin.

Best of all, you can change all of the colors with a simple custom.css file. We even provided the example files that were used to produce the images in this posting.

How to Install

  • Download the latest files for kafei-infinite: [tar] [zip]
  • Unzip or untar the files in your StatusNet theme directory. The theme folder should now contain a folder called kafei-infinite.
  • Edit the custom.css file in the theme/kafei-infinite/css folder with your own custom settings... OR use the code in one of the files listed in theme/kafei-infinite/css/alternates to overwrite the theme/kafei-infinite/css/custom.css file.
  • Add the theme to your config.php file: $config['site']['theme'] = 'kafei-infinite';
  • Run the scripts/checkschema.php file from the root of your site on the command line.
  • Enjoy!

Learn Chinese writing with our StatusNet account

Have you ever wanted to learn Chinese? I took a course in university and I did not do all that well! Truth be told, I spent most of my time in the course researching linguistics and I still find the subject fascinating today. I really enjoy learning new languages!

Recently I was thinking about how the first 1500-3000 characters in Chinese are all you really need to get by. This number is much lower than in English and French so many published lists of all the "basic words" are out there. This weekend I searched for such a list and the author made the contents available for re-use. Excellent. Shortly thereafter I realized I could parse that file and make it into a flash card program... using StatusNet.

The cron-bot I built posts a random character from the list of ~2700 or so characters in this list every 10 minutes. Why so often? So you always see new stuff coming in. Also, because you can go through the *entire* list in less than 20 days at this rate.

Check it out: http://status.kafei.ca/dawei

If you have an account on identi.ca or another StatusNet service you can subscribe to this URL. RSS is also available.

You could also just visit this page every now and then. Note, there is a "play" button and a pop-up button beside it. Those can be handy for watching live updates if you want to enable it on your office desktop. ^_^

Eventually I might mirror it to twitter. Let me know what you think on my contact form at http://kafei.ca/contact

How it was done

I found a listing of the 3000 basic characters (well, more like 2700+) at this website: http://www.zein.se/patrick/3000char.html and they even mention that re-use of the materials for flash cards are ok! Excellent.

I took this file and pasted it into an OpenOffice Calc spreadsheet and cropped everything down to just the rows and columns I needed. I then exported the file to CSV format, and then proceeded to perform a few regex operations on it to get the data structure just right. I actually did this twice - once after launch to fix a bug. When the CSV file was successfully mutated into a file compatible with the command line tool "fortune" I was ready to go. I compiled fortune's "dat" file and then created a cron job to post random selections to my StatusNet site.

Now a cron job will run every 10 minutes, ask fortune for a random word from the dictionary of the first 3000ish words, and post it to StatusNet.

Why do it this way?

Every day you read your StatusNet feeds - sometimes many, many, many times over the course of the day. It makes sense to put repetitive information like this into a feed that you are going to read over and over. It also makes sense on the level of... knowing when you learned something. That is, in what order you learned it. So scrolling back you hit that point that you've already read.

It also isn't particularly important information. If you miss it... so what. It will eventually come around again. The cycle sound run in roughly 20 days. It is possible for repeats. It is totally random... as much as the fortune program can be random.

It is also possible to comment on the postings to practice using the text. That could be pretty cool, especially if working with a translator to make corrections as you go.

http://status.kafei.ca/dawei

Pages

Subscribe to Verbosity: Drupal Geek Blog. RSS

@ryan_weal's tweet Feed

#d8mi tweet Feed

#drupal tweet Feed