I have been wondering how to keep the tags in a blog to describe its content accurately. As I’m not the most disciplined person, tagging posts correctly and precisely is not what I’m best at. During my work I encountered someone who had made use of Wordle. He used Wordle to quickly analyse the contents of a website. The result was great. Simply counting the words seemed a simply and effective way to accurately describe the contents of a site.

So I ended up creating two plugins: one for creating tag-clouds that have the “Wordle” look and feel, and one for “tagging” posts, simply by enumerating all used words in a post. The first is described on an other post: ImageCloud plugin. The second has become the “text2tag” plugin.

The text2tag plugin can be downloaded from WordPress: Text2Tag.

The text2tag plugin will convert every word that occurs in a post title or content into a term (tag). By default the plugin will add these terms to a new taxonomy called “words”. Optionally, the terms can be added to the “post_tags” taxonomy, used for normal tagging in WordPress. I did this because I did not want to overwrite the existing tags by default.

The plugin provides the option to create a so-called stop list. Words in this list will be kept (or even removed) from the taxonomy. This allows authors to ignore meaningless, but often occuring, words like ‘a’ or ‘the’. An option is added to ignore numbers, as I find these uninteresting in most cases…

There are many opportunities for improvement. For example, stemming of words could be done. However, as I do not have the knowledge about such algorithms I did not add it. If you are capable of adding such features, please feel free to share your knowledge!