Tagging With ActiveRecord and Postgres

Postgres doesn't have a secret tagging feature, it does however support arrays as a column type. This creates interesting opportunities for modeling complex data. We will be using arrays to build efficient tagging into a Rails application. We will cover some of Postgres' the basic array operators, and some strategies for using them in ActiveRecord.

Read the SQL carefully, once you understand what is happening in Postgres, you will be able to apply it much more easily in ActiveRecord.

Tagging With Postgres

As an example, we will create a table of pet hedgehogs, and store their tags in an array column:

CREATE TABLE hedgehogs (
    id      integer primary key,
    name    text,
    age     integer,
    tags    text[]
);

You might take a second and ponder, "Why hedgehogs?", well why not? Next notice the syntax for an array, we use text[] to denote a text array.

You might be tempted to use a varchar array, don't. Postgres assumes that ARRAY['spiny', 'cuddly'] is of type text[]. This means that if you try to do tags && ARRAY['spiny', 'cuddly'], you will get operator mismatch errors. Just store your tags as text[], internally Postgres treats varchar and text the same, so there should not be any performance impact.

There are two operators we will use to implement tagging:

For our purposes, the contains operator can be used to query for records that contain all the tags, while the overlap operator will return records that contain any of the tags.

Using these two operators we can now do some interesting queries. For instance if we want to find all the hedgehogs that are either spiny or prickly, we can use the && operator:

SELECT name, tags FROM hedgehogs 
WHERE tags && ARRAY['spiny', 'prickly']

This queries for all the hedgehogs whose tags overlap either 'spiny' or 'prickly':

name tags
Marty spiny, prickly, cute
Quilby cuddly, prickly, hungry
Thomas grumpy, prickly, sleepy, spiny
Franklin spiny, round, tiny

Now what if we want to reduce the results, and only show the hedgehogs that are both spiny and prickly? To do this, we can use the contains operator, which requires that the array on the left hand side contains all the elements on the right:

SELECT name, tags FROM hedgehogs 
WHERE tags @> ARRAY['spiny', 'prickly']

As you can see, we only get back a subset of the previous results:

name tags
Marty spiny, prickly, cute
Thomas grumpy, prickly, sleepy, spiny

Excluding tags is simply a matter of using a NOT, so if you wanted to find all the hedgehogs that are not spiny, but are cuddly, you can construct the query as follows:

SELECT name, tags FROM hedgehogs 
WHERE NOT tags @> ARRAY['spiny']
  AND tags @> ARRAY['cuddly']

This yields a new list of hedgehogs:

name tags
Horrace cuddly, cute
Quilby cuddly, prickly, hungry

Combining these set operations provides a flexible and convenient way to query tags.

Tagging in ActiveRecord

ActiveRecord 4 now supports SQL arrays in Postgres. This means that if you have an array column in your database, it will be exposed as a ruby array when you access it in your model. Even if you are not using the most recent version of ActiveRecord, you can still benefit from querying against arrays using the techniques we have already covered.

The ActiveRecord equivalent to our previous hedgehog example would be:

create_table :hedgehogs do |t|
  t.string  :name
  t.integer :age
  t.text    :tags, array: true
end

The array option tells ActiveRecord to create SQL array.

We can of course simply use the conditions from the previous queries:

Hedgehog.where "tags @> ARRAY[?]", ['spiny', 'prickly']

In this case we would return the models for all the hedgehogs that are both spiny and prickly since we used the contains operator. ActiveRecord will replace the ? with properly quoted values, but will not turn the array ['spiny', 'prickly'] into a SQL array literal, so you need to wrap it in the array constructor ARRAY[...]. While this approach works fine for one off queries, it makes sense to wrap this logic up in a scope:

class Hedgehog < ActiveRecord::Base
  scope :any_tags, -> (tags){where('tags && ARRAY[?]', tags)}
  scope :all_tags, -> (tags){where('tags @> ARRAY[?]', tags)}
end

These two scopes are parameterized by the tags we want to match against. Now you can query your models more easily:

Hedgehog.all_tags(['spiny', 'large'])

Best of all, using scopes to encapsulate these operations lets us chain them with other ActiveRecord calls. For instance you could pluck out all the ids of the large, spiny hedgehogs:

Hedgehog.all_tags(['spiny', 'large']).pluck(:id)

Performance

If you intend on querying against tags often, then you may want to create an index on them. Postgres has two special types of indexes that speed up the overlap and contains operators. These are GIN and GiST indexes. As a rule of thumb, if your tags are not being updated often you will want to use a GIN since it is faster to query. If your tags will be updated often, a GiST index will have less overhead. When in doubt, benchmark both.

A GIN index can be created in SQL as follows:

CREATE INDEX hedgehogs_tags_index 
ON hedgehogs
USING gin(tags)

Should you want to use a GiST index instead, simply replace USING gin(tags) with USING gist(tags). ActiveRecord 4 has introduced helpers for GIN and GiST indexes, so in a migration you can now use:

class AddTagsIndex < ActiveRecord::Migration
  def change
    add_index(:hedgehogs, :tags, :using => 'gin')
  end
end

Arrays in ActiveRecord < 4

If you are not using ActiveRecord 4, don't despair. The postgres_ext gem provides helpers for Postgres migrations and accessors for array types in ActiveRecord 3.2.

Need support for older rails applications? None of this magic, you can create your tables without using ActiveRecord's helpers:

class CreateHedgehogsTable < ActiveRecord::Migration
  def up
    execute <<-SQL
      CREATE TABLE hedgehogs (
          id      integer primary key,
          name    text,
          age     integer,
          tags    text[]
      )
    SQL
  end

When using an older version of ActiveRecord, array columns will be returned as strings that look like '{1,2,3}'. You can parse these with pg_array_parser.

Further Reading

If you know of any other useful libraries or techniques for dealing with tagging in SQL and ActiveRecord, let me know.

blog comments powered by Disqus
Monkey Small Crow Small