Mini Builds

SQL Tagging Approaches

June 20, 2021

In this article we will investigate 3 approaches to tagging user data (for example blog entries) in an SQL database. For this example we will be using Postgres however MySQL, etc have similar features.

 

Single Table using JSON

This approach uses a single table with two fields id and content.

The content field has data type jsonb which is Postgres’s json type. content contains the tags and body of the entry e.g.

{
    "body": "Entry ...",
    "tags": ["tag1", "tag2"]
}

Create the Table

create table if not exists entry (
    id serial not null primary key,
    content jsonb not null
);

create index idx_tags on entry using gin ((content->'tags') jsonb_path_ops);

To improve query performance we’ll also add an index to the tags field.

Search by Tag

Query all entries with a particular set of tags using the containment operator @>:

select * from entry where content->'tags' @> ["tag1", "tag2"];

Benefits

 

Two Table

This approach uses a two tables:

Create the Tables

create table entry (
    id serial not null primary key,
    body text not null
);

create table tag (
    id serial not null primary key,
    entry_id integer not null,
    tag character varying(32) not null,
    unique(entry_id, tag)
);

alter table tag add foreign key (entry_id) references entry (id);

create index idx_tag on tag (tag);
create index idx_tag_entry_id on tag (entry_id);

Search by Tag

Query all entries with a particular set of tags using a join and the having condition:

select e.id, e.body, array_agg(t.tag) as tags from entry e
join tag t on e.id = t.entry_id 
where t.tag in ('tag1', 'tag2')
group by e.id
having count(t.tag) = 2;

 

Three Table - Tag Map

This approach moves the tags to a separate table and introduces a table tagmap to link entry to tag.

Create the Tables

create table entry (
    id serial not null primary key,
    body text not null
);

create table tag (
    id serial not null primary key,
    tag character varying(32) not null unique
);

create table tagmap (
    id serial not null primary key,
    entry_id integer not null,
    tag_id integer not null,
    unique(entry_id, tag_id)
);

alter table tagmap add foreign key (entry_id) references entry (id);
alter table tagmap add foreign key (tag_id) references tag (id);

create index idx_tag on tag (tag);
create index idx_tag_entry_id on tagmap (entry_id);
create index idx_tag_tag_id on tagmap (tag_id);

Search by Tag

Query all entries with a particular set of tags using a two joins and the having condition:

select e.id, e.body, array_agg(t.tag) as tags from entry e
join tagmap tm on e.id = tm.entry_id 
join tag t on tm.tag_id = t.id 
where t.tag in ('tag1', 'tag2')
group by e.id
having count(t.tag) = 2;

Benefits

 

Performance

The actual performance of particular queries will depend on the data they are running against however here’s a table of relative time taken to performance search by a single and multiple tags (higher means the query was slower).

JSON Two Table Tag Map
Single Tag - 170% -30%
Multiple Tags - 440% 640%

Interestingly, the tag map approach performs the best when searching by a single tag. However, the JSON approach is significantly quicker when search by multiple tags.

Checkout the code behind these figures on GitHub: https://github.com/mini-builds/sql-tagging.