WordPress is a great online publishing system. One of its strengths, as far as I am concerned, is the administration interface, which I find flexible, efficient and easy to use. However, sometimes even that interface isn’t flexible enough.

Recently, for example, I needed a quick way to create and insert into another Web page an HTML list of all and only the posts I had published in a certain date range. If you only have four of five posts to manage it’s OK, but what when, as in my case, there are many tenths of them?

WordPress uses a MySql database to store all its content. If you have direct access to that database and have a and know some elementary SQL, creating such a list is very quick. Here’s how I did it (see the bottom of this post for an example of the result). Please note that, while I did this with WordPress and with a very simple (just a date range) MySql query, you can play the same trick even with very complex queries with WordPress or any other CMS that runs MySql, like Drupal. Besides, while the procedure below uses the command line, you can run at leat the first part in any graphical MySql client, including PhpMyAdmin.

The first thing to do is to find out which MySql tables store the data that you want to extract. Go to a command prompt on the machine where the WodPress MySql database is hosted and type:

mysql -u WordPress_MySql_User -p WordPress_MySql_Database

then enter the MySql password and ask for the names of all the tables in the database, to figure out which one contains the data you need:

  mysql> show tables;
  | Tables_in_wp_stop             |
  | wp_blc_filters                |
  | wp_blc_instances              |
  | ... other tables, removed     |
  | ....for brevity       |
  | wp_posts                      |
  | ...       |
  31 rows in set (0.00 sec)

since I needed posts titles and URLs, I figured out that I only needed to mess with the wp_posts table, so I asked for a listing of all its colums (there are many more columns, I only show below those I needed to use):

  mysql> show columns from wp_posts;
  | Field                 | Type                | Null | Key | Default             | Extra          |
  | post_date             | datetime            |      |     | 0000-00-00 00:00:00 |                |
  | post_title            | text                |      |     |                     |                |
  | post_name             | varchar(200)        |      | MUL |                     |                |

Having the names of all relevant columns (why I needed those ones will be clear later) I could build the MySql statement to fetch them from the database. Here it is, with an excerpt of the output:

  mysql> SELECT post_date, post_name, post_title FROM wp_posts WHERE post_date <= '2007-31-01 00:00:00';
  | post_date           | post_name                              | post_title                                |
  | 2007-07-01 06:00:00 | must-all-sw-be-free-or-all-proprietary | Must all SW, be free, or all proprietary? |
  | 2007-07-01 06:00:00 | do-we-still-need-papyrus               | Do We Still Need Papyrus?                 |
  | 2007-07-01 06:00:00 | can-i-publish-my-own-movies            | Can I Publish My Own Movies?              |
  | 2007-07-01 06:00:00 | does-software-pollute                  | Does Software pollute?                    |
  | 2007-07-01 06:00:00 | must-we-all-become-programmers         | Must We All Become Computer Programmers?  |
  | 2007-07-01 06:00:00 | what-can-i-do-as-a-citizen             | What Can I Do As A Citizen?               |
  82 rows in set (0.09 sec)


As you can see, I only looked for posts published before 2007/31/01 because that’s what I happened to need, but I could have refined that search with any condition that MySql can handle, for example looking only for posts whose title or body contained a certain string. Now, why did I need just those fields? Because the URLs of that particular WordPress blog have the format $BLOGNAME/$YEAR/$MONTH/$post_name/. This means that, in order to generate an HTML list of all those posts I only need to reformat, concatenate and wrap into some basic HTML code the first three fields of those posts. If your blog has another naming convention, you can change the query or the script below accordingly.

After checking that the MySql query was correct, I put it inside a simple Perl script that runs it and outputs the final list items in HTML format. The MySql part is copied straight from the many Perl tutorials online that explain how to make Perl talk to MySql. Please refer to those tutorials if you need any detailed explanation of what each line of code does. I only added the right parameters for my own database and the necessary comments to make it (I hope) self-documenting:

  #! /usr/bin/perl
  use strict;
  use DBI;
  use DBD::mysql;
  my $platform = "mysql";
  my $host     = "localhost";
  my $port     = "3306";

  my $BLOGNAME = "http://stop.zona-m.net";  #CHANGE THIS TO THE HOME PAGE OF YOUR BLOG

  # The next seven lines define the Perl variables that are needed
  # to query the database then build the actual MySql query
  # The $query line is where you must put the MySql statement that
  # you actually need

  my ($post_date, $post_name, $post_title);
  my $dsn = "dbi:mysql:$database:localhost:3306";
  my $DB = DBI->connect($dsn, $user, $pw);
  my $query = qq~SELECT post_date, post_name, post_title, ID FROM wp_posts WHERE post_date <= '2007-07-31 00:00:00';~;
  my $query_handle = $DB->prepare($query);
  $query_handle->bind_columns($post_date, $post_name, $post_title);

  # the rest of the code processes and prints as an HTML list element
  # each URL. It takes the first 7 characters of the date (Year + Month),
  # replaces the dash with a slash and removes newlines (I had some of them...)
  # in URL and post title
  while($query_handle->fetch()) {
      my ($DATE, $URL, $TITLE);
      $DATE    = substr($post_date, 0,7);
      $DATE    =~ s/-///;
      $URL     = $post_name;  chomp $URL;
      $TITLE   = $post_title; chomp $TITLE;
      print "<li><a href="$BLOGNAME/$DATE/$URL/">$TITLE</a></li>n";  # Correct URL of each post

Here is the script output, limited to the first three posts for brevity:

  <li><a href="http://stop.zona-m.net/2007/07/must-all-software-be-free-or-all-proprietary/">Must all software, be free, or all proprietary?</a></li>
  <li><a href="http://stop.zona-m.net/2007/07/do-we-still-need-papyrus/">Do We Still Need Papyrus?</a></li>
  <li><a href="http://stop.zona-m.net/2007/07/can-i-publish-my-own-movies/">Can I Publish My Own Movies?</a></li>

This is perfectly valid HTML code that you can copy and paste straight into an HTML page. In my case I had to indent the lists by hand, but it only took 23 minutes.