Gradually Migrate from Paperclip to Active Storage

Gradually Migrate from Paperclip to Active Storage

In case you are not familiar with us, FastRuby.io specializes in Ruby and Rails upgrades. Over these past 10 years we have had the opportunity to perform dozens of upgrades for our clients, which has given us all sorts of experiences. A common scenario that we experience is when the upgrade isn’t straightforward and we can’t just upgrade the Ruby or the Rails version directly.

We often find ourselves in a situation where we need to upgrade one or more dependencies before we can actually upgrade Ruby or Rails itself. For instance, if your want to upgrade your app from Rails 5.x to 6.x and the application still uses Paperclip to manage your file attachments, first you’ll need to replace that gem because Paperclip was deprecated in favor of Active Storage after Rails 5.2 was released.

This was the case for one of our clients that has been doing upgrades with us for a few years now. In this article I’m going to share the mishaps found by the team and the strategy that we adopted to migrate their huge volume of attachments over to Active Storage while still keeping Paperclip active until the migration was finished.

The Problem

Our client has a huge monolith application with more than 10 years of code and more than 1,200 Active Record models, 83 of these models need to handle file attachments. When we started planning their Rails 6.0 upgrade we knew we would need to move away from Paperclip, but we still had two questions to answer:

  1. Would it be possible to migrate to Active Storage without having to re-upload all files?
  2. If not, what would be the best way of doing it?

After several frustrated attempts, we realized that there were too many incompatibilities between Paperclip and Active Storage to keep trying to monkeypatch it to make it work with the existing S3 paths. So we moved with the idea to re-upload all the files using Active Storage. Then we moved to question number two.

Due to the huge volume of data that our client has, the migration process would take months (maybe more than a year!!), so we would have to do it gradually.

To ensure that no file was lost and no file was unavailable during the migration, our strategy was to keep using Paperclip until we re-uploaded all attachments using Active Storage. Yes, we decided to have an overlap time where we would be using both file management mechanisms. After finishing the migration we could then finally remove the Paperclip dependency.

Migrating the attachments with a background job

To perform the migration we created a background job to be the parser, getting the info from Paperclip attachments and creating the respective Active Storage ones. Since Active Storage works differently from Paperclip, what the job does is to create the attachment and the blob for each Paperclip attachment and associate it with the respective object with the .attach method.

We created one Pull Request for each model with all necessary changes to start using Active Storage instead of Paperclip. That way the new attachments would start to be saved in Active Storage while we wait for the jobs queue to be processed and migrate the existing attachments.

Creating a fallback method for the attachments

As I mentioned, while the background jobs were being processed we would need to make sure that the files wouldn’t be unavailable for the users. For the smallest models which have only a few hundreds of attachments it would be okay to have a few minutes of unavailability. But for the biggest ones that have millions of attachments, the process could take days to finish, so the unavailability wouldn’t be acceptable.

For these cases, we created a fallback method that returns the Paperclip url in case the Active Storage attachment isn’t available yet, as we can see in the code snippet below.

def photo_url(user)
  return user.photo.service_url if user.photo.attached?
  ::Paperclip::Attachment.new(:photo, self).expiring_url
end

With that method we ensured the two possible scenarios:

  • If the Active Storage file is already attached, we return the Active Storage url for the file.
  • If not we create an instance of a Paperclip object and return its url. The ::Paperclip::Attachment.new(:photo, self) give us access to the paperclip object even though we don’t have the has_attached_file instruction in the model anymore.

When all the files are processed for all models, we can create a new PR that removes the Paperclip gem and this fallback method.

Conclusion

As we saw sometimes we can’t just migrate the application directly. In the ideal scenario we would just run bundle update rails and everything would just work. But for real-life applications we often have to update some dependency beforehand. In this article I explained how we migrated a huge amount of Paperclip attachments gradually to make sure we had no downtime for them.

Of course due the complexity of the client’s application we had to deal with several other issues, and we had to back-port some functionalities that would only be available in future releases of Active Storage, such as direct upload for some models, non-expiring urls, different service configs for some attachments and others. But those are topics for another conversation.

I hope to write about them soon, so stay tuned!

Get the book