The Rails Developer's Reference to PostgreSQL indexes

The Rails Developer's Reference to PostgreSQL indexes

In a previous article opens a new window , we listed down common culprits that led to a sub-optimal performance in Rails applications. One of the culprits was missing or incorrect indexes.

Therefore we thought it would be very useful to have a handy reference to the different kinds of indexes, when you should use them and maybe even when not to use them.

Different kinds of indexes

B-Tree Indexes

The Binary Tree index, or B-Tree index, is the default index type we get from a CREATE INDEX statement.

It’s probably also the most common index type we’ll use as developers. The use case is straightforward. You should use a B-Tree index when:

  1. Your queries retrieve small datasets relative to the table size.
  2. The column you want to index has high cardinality, which is to say, their data isn’t replicated too many times in different rows.

For example, if you have a products table that has a category column, it’s natural that many products will have the same value for their category. Therefore, category has low cardinality. Inversely, if the data is unique for every row (say, the product id), that column has high cardinality

The unique id column in tables are usually the more obvious candidates for this index.

Generally speaking, PostgreSQL will use this index for most of the commonly used comparison operators (>,>=, <, <=, =, BETWEEN and IN) as well as the null checking operators (IS NULL and IS NOT NULL). It can even use B-Tree indexes for pattern matching operators, under certain conditions opens a new window .

There are a few ways to add an index in an ActiveRecord migration:

# In a create_table block
t.index(:column_name)

# or outside of the block
add_index(:table_name, :column_name)

See the docs for add_index opens a new window for all the possible options.

Finally, as a SQL statement:

CREATE INDEX column_name_idx ON table_name(column_name);

Hash Indexes

For practical purposes, a hash index can be considered to have a specialized use case for columns that could also be indexed with a B-Tree but that we know for a fact it will always be queried using the = operator.

The way hash indexes work is that they create a 32-bit hash for the value in the indexed column. Which brings us to the first drawback for this type of index: they only work for equality comparisons. Another drawback to be considered is that they have a maintenance overhead during data modifications since the database must solve hash collisions by rehashing the data.

Even with all these drawbacks, they are faster for equality comparisons than B-Tree indexes.

The query planner will consider using hash indexes whenever the indexed column is involved in a comparison with the equal operator.

It’s worth noting that it is very rare that using a hash index will ever be needed. However, if your use case fits into its requirements and you need to get more performance than what you have with a B-Tree index, it’s worth testing some queries using a hash index and see if there is some real benefit.

To create a Hash index using ruby and ActiveRecord, pass the using: 'hash' option to index or add_index:

# In a create_table block
t.index(:column_name, using: 'hash')

# or outside of the block
add_index(:table_name, :column_name, using: 'hash')

And as a SQL statement:

CREATE INDEX column_name_idx ON table_name(column_name) USING HASH;

GiST and SP-GiST Indexes

GiST (Generalized Search Tree) and SP-GiST (Space Partitioned) indexes aren’t a specific kind of index but rather an infrastructure that allows database users to define indexing for complex data types.

Out of the box, PostgreSQL comes with implementations for several geometric shapes and text search. These data classes that implement the required GiST functions are called operator classes

What this means, practically speaking, is that such indexes have a very diverse yet specialized set of applications and use cases, ranging from finding some point within a given distance from some coordinate to full text search.

Usage of these indexes is application dependent so it’s best to read PostgreSQL’s own documentation on the matter and determine if your use case might fit: GiST chapter opens a new window and SP-GiST chapter opens a new window

To create a GiST or SP-GiST index in Rails:

# In a create_table block
t.index(:column_name, using: 'gist')
t.index(:column_name, using: 'spgist')

# or outside of the block
add_index(:table_name, :column_name, using: 'gist')
add_index(:table_name, :column_name, using: 'spgist')

And with SQL:

CREATE INDEX column_name_idx ON table_name(column_name) USING GIST;
CREATE INDEX column_name_idx ON table_name(column_name) USING SPGIST;

GIN Indexes

GIN stands for Generalized Inverted Index. These indexes are appropriate for data that contain multiple component values, such as arrays. An inverted index contains a separate entry for each component value and efficiently handles queries that test for these component values.

Hence the “Inverted” in the name. A normal B-Tree index will have one place to represent the data for one row, whereas a GIN index can have the same row referenced in multiple places. It’s analogous to the table of contents at the back of many books: the same term (or component value of a row) is referenced in multiple different pages.

Like GiST and SP-GiST indexes, the exact way how a GIN index maps a column of a given data type, depends on the GIN operator class.

It’s important to note that GIN indexes have one big downside: They’re expensive to update. Because it can reference the same row multiple times, any changes to that row means a change to all indexes referencing that row.

To create a GIN index in Rails:

# In a create_table block
t.index(:column_name, using: 'gin')

# or outside of the block
add_index(:table_name, :column_name, using: 'gin')

And with SQL:

CREATE INDEX column_name_idx ON table_name(column_name) USING GIN;

BRIN Indexes

BRIN stands for Block Range INdexes.

BRIN indexes will divide your data into pages, according to their storage order in memory, and keep track of the beginning and ending values of the indexed columns in that page. They’ll do this for the entire table.

As an example, if we index a created_at column, the index will store the value for created_at of the first row in a given page and the value for the same column for the last row of the same page, and so forth for all pages into which the table will be divided.

Given this property, this index is only recommended when there is a strong correlation between the data stored and it’s physical order in memory. Building on the previous example, created_at columns are usually good candidates because rows added yesterday or earlier in the day come before the current row in memory, and subsequent insertions will also be inserted in memory closely following the value of created_at, even though there isn’t necessarily a logical rule establishing this behavior.

Also, since it breaks the table into pages, it’s important to know how much memory each row of your table occupies. See [this blog post by Janet Carson] on ways to do this. The reason this is important is that if your table is too small and the index stores too much data into a single page, the index won’t render the speed benefits we want for queries. Once you’ve estimated your row size, knowing a page of memory has 8192 bytes, you can estimate how big your block range should be and set the pages_per_range storage parameter accordingly.

Also, BRIN indexes do not auto update by default. Either the database administrator must explicitly call index maintenance functions or the index must be created with autosummarize = 1.

Finally, in order to know if your column’s logical ordering has a good direct correlation to the storage order in memory, the pg_stats table can give you this information via the correlation column. The closer to 1, the better the fit for a BRIN index.

To create this index in Rails:

# In a create_table block
t.index(:column_name, using: 'brin')

# or outside of the block
add_index(:table_name, :column_name, using: 'brin')

I was unable to find a way to call these methods in a way that I could pass the autosummarize storage parameter. If anyone know, I’d be happy to hear and add it here. In the meantime, we can always execute raw SQL in a migration:

# Anywhere inside the change method
execute<<-SQL
  CREATE INDEX developer_created_at_idx ON developers USING brin (created_at) WITH (autosummarize=1);
SQL

Conclusion

Hopefully the discussion above can help any Rails developers out there understand the different indexes available in PostgreSQL and how to properly pick the one that will help them tune their application and make it run as smooth as possible.

In case you still need help optimizing your queries, understanding where to add indexes, what type of indexes to use or with other performance related issues, make sure to send us a message opens a new window . We can help!

Get the book