Optimizing Rails CI with parallel_tests
In our previous blog post(Speeding Up Test Suites in Rails Applications with parallel_tests), we explored how the parallel_tests gem can significantly speed up test suites in Rails applications by running tests in parallel. Building on that, this follow-up post delves into implementing parallel_tests in a Continuous Integration (CI) environment with a focus on the single job approach. We’ll guide you through integrating parallel_tests with RSpec into your CI pipeline.
Integrating parallel_tests into CI
The parallel_tests gem (parallel_tests GitHub) enables parallel test execution, leveraging multiple CPU logical cores to reduce test suite runtimes. In a CI environment, we’ll be focusing on running tests in parallel within a single job. For our example implementation, we’ll be using GitHub Actions as our CI platform to demonstrate how to effectively configure and run parallel tests.
Why Use parallel_tests in CI?
Running tests in parallel in CI provides several advantages:
- Faster Feedback: Parallel execution reduces test runtimes, enabling quicker iterations.
- Scalability: Distributing tests across multiple processes utilizes CI resources effectively.
- Cost Efficiency: Shorter build times can reduce costs on paid CI platforms.
Single Job with Multiple Processors
This configuration runs tests in parallel within a single job using multiple processes. This approach:
- Requires Multiple Databases: Uses rake parallel:setup to create multiple test databases to avoid conflicts.
- Utilizes VM Resources: Makes use of all available cores in a single VM.
- Simplifies Setup: Provides a more straightforward configuration for most test suites.
Steps to Integrate parallel_tests
To integrate parallel_tests with RSpec into your CI pipeline, follow these steps:
- Add parallel_tests to Gemfile: Ensure the gem is included in your project’s dependencies. Add the following to your Gemfile:
group :test do
gem 'parallel_tests'
gem 'rspec-rails'
end
- Configure CI: Set up a single-job configuration with multiple test databases.
- Set Environment Variables: For a single job, set RAILS_ENV and database connection details.
- Run parallel_tests: Use rake parallel:spec for a single job with multiple processes.
- Set Up Database: Use rake parallel:setup to create multiple databases.
- Configure database.yml: Ensure config/database.yml supports multiple databases:
Example Configuration for GitHub Actions
Below is a GitHub Actions configuration for running parallel tests with parallel_tests and RSpec using a single job with multiple processes. This assumes a Rails application using RSpec and PostgreSQL.
name: CI
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
services:
postgres:
image: postgres:latest
env:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: ''
POSTGRES_HOST_AUTH_METHOD: trust
ports:
- 5432:5432
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
env:
RAILS_ENV: test
PGHOST: localhost
PGUSER: postgres
# Uncomment and set this if you want to limit the number of cores used
# PARALLEL_TEST_PROCESSORS: 12
steps:
- uses: actions/checkout@v4
- name: Set up Ruby
uses: ruby/setup-ruby@v1
with:
ruby-version: '3.4'
bundler-cache: true
- name: Install dependencies
run: bundle install
- name: Set up database
run: |
bundle exec rake parallel:setup
- name: Run tests in parallel
run: |
bundle exec rake parallel:spec
Explanation of the Configuration
- Single Job: Runs tests in one VM, using multiple processes (defaulting to CPU logical core count, or set via PARALLEL_TEST_PROCESSORS, e.g., 6, 12, 24, 36, 48). Ideal for high-core machines to maximize CPU usage.
- Database Setup: rake parallel:setup creates multiple test databases (e.g., myapp_test1, myapp_test2) to avoid conflicts. Requires config/database.yml to include:
test:
<<: *default
database: myapp_test<%= ENV['TEST_ENV_NUMBER'] %>
- Test Execution: rake parallel:spec runs RSpec tests across multiple processes.
- Environment Variables: RAILS_ENV, PGHOST, and PGUSER match the PostgreSQL service.
- Caching: bundler-cache: true caches gems.
Note on GitHub’s Free Plan: The free plan for GitHub Actions provides 2 cores for the ubuntu-latest runner. This means that when using the default configuration (without specifying PARALLEL_TEST_PROCESSORS), parallel_tests will automatically detect 2 cores and run only 2 processes in parallel. If you need more processing power, consider GitHub’s paid plans or self-hosted runners with higher core counts for significant performance improvements.
Potential Challenges and Solutions
- Resource Constraints: Single jobs with many processes (e.g., 36) may strain memory or I/O; set PARALLEL_TEST_PROCESSORS to 80–90% of cores (e.g., 32 for 36 cores) if required.
- Database Conflicts: parallel:setup avoids conflicts by creating multiple databases. Ensure TEST_ENV_NUMBER support in config/database.yml.
- Over-provisioning Processes: Setting PARALLEL_TEST_PROCESSORS higher than your available cores (e.g., setting it to 8 on a 2-core runner) typically decreases performance rather than improving it. This causes increased context switching, higher memory usage, and database connection contention. For optimal performance, match PARALLEL_TEST_PROCESSORS to your available core count or lesser, or let parallel_tests auto-detect it.
Optimizing CI Performance
To further enhance your CI pipeline:
- Monitor Build Times: Use GitHub Actions analytics to assess parallelization impact.
- Scale Appropriately: Set PARALLEL_TEST_PROCESSORS to limit the number of cores used if required.
Conclusion
Integrating parallel_tests with RSpec into your CI pipeline using the single job approach can dramatically reduce test suite runtimes, providing faster feedback and enabling frequent deployments. This configuration is ideal for high-core machines (e.g., 6, 12, 24, 36, 48 cores) to maximize CPU utilization with minimal overhead. By following these recommendations, you can achieve impressive speed improvements in your CI pipeline, potentially reducing a 20-minute test suite to under a minute on high-core machines.