Hey Sergio and Beyond team,
It’s been nearly a year since we started working together this time around, and my audit from last year is therefore about nearly as old. I thought, heading into 2026, that it would be good to take a step back and reassess Glue holistically. Priorities, of course, change, and you’ve implemented a number of things from last year’s report which means it’s less useful now.
In terms of progress since last year, here’s the main things from last year’s audit that were implemented:
Overall latency on
glue web hasn’t
gotten significantly better or worse since January, overall latencies look
basically the same.
It feels to me like a significant amount of breathing room has been achieved on the Salesforce integration, and we’re now looking towards reducing overall user-facing latency.
After looking through the app with new eyes, I have two key metrics to track for 2026:
I’m going to structure my recommendations into two sections: removing footguns, and reducing latency.
This issue was initially identified in the September report. It’s been ongoing since then, so it’s clearly not having any customer impact… yet.
A daily database performance brownout of about 7 hours is a sign of a system which is teetering on the brink. Add a little more load here or there and you can easily get a major customer incident.
First off, I will install two monitors and SLOs, one for read latency and one for write latency, and I’d like to get those to 99.9%. I’m going to set the latency target at whatever the current “normal” level is during the day, which appears to be about 1 millisecond for reads and 5 or 10ms for writes.
The root cause here is a smearing strategy being applied to within_12_hours jobs, which causes work to be executed at an arbitrarily high concurrency that’s outpacing the database.
You have three options as I see it:
I would like to do a combination of all three.
Queue time performance on of your “very long SLO” queues continues to be far better than required. These 12+ hour queues mostly sit at latencies of less than a minute.
The purpose of high SLO queues is to reduce resource usage by increasing utilization. The processes pulling from these queues should be either “off” (scaled to zero) or utilized at 100% (on). Currently, you’re not getting this benefit.
I would also encourage you to regularly (perhaps daily on a schedule?) scale these queues down to zero. That can be automatically (based on queue time autoscaling, below) or scheduled (say for 4 hours in the middle of business hours, or over a weekend at some time). It’s useful to regularly “test” your assumptions about the acceptable latency of jobs in these queues by actually making them wait that long. Currently your request queue times on these queues are so low that I worry that there are jobs in here which actually should be in 5 minutes or faster queues, but they never notice because latency in practice never exceeds a minute.
Your load is so predictable that Sidekiq SLO performance has not been a meaningful issue. Your SLO performance for the last 30 days looks great.
However, not having autoscaling based on queue times, to me, is an incident waiting to happen. Eventually the queue latency will balloon overnight when no one’s awake.
The remediation here of reporting to Cloudwatch and setting up a scaling policy based on that is a pretty simple task.
For web, I’m less bothered, because your web load is even more predictable than background jobs. With jobs, you theoretically always have the potential problem of a change in data or input or a new job type being deployed that 100xs your load, but that threat is far less present on the web side.
This is a quick one, I’ll just open a PR to this effect:
Storing the connection pool singleton in a class instance variable which is
instantiated by
||= is potentially
thread unsafe. You could end up with two threads with different connection
pools. This isn’t a huge deal but I would like to clean this up by converting
this to a constant which is set using
= to avoid any
issues.
You have, effectively just a few major sources of time spent waiting on Salesforce. Since Salesforce is such a huge part of your reliability and uptime, it would be useful to make sure that our biggest sources of time spent waiting on Salesforce are absolutely optimized.
Every day, you spend about ~1.5 days of wall clock time waiting on Salesforce. OpportunitiesController#create accounts for 6% of that. This controller calls Salesforce 70 to 75 times, on average. I have to believe that this can be reduced.
Accounting for about 4% of total Salesforce load, this job suffers from a loop which is repeated anywhere from 1 to a couple dozen times:
Since these are GETs, I have to imagine we can de-N+1 the salesforce queries here and do them in a single query instead.
Also accounting for about 4% of total Salesforce load, this job runs about 150 to 200 requests to Salesforce on every execution.
There’s a significant number of PATCH requests here as well.
As an example, in ContinueProgramEnrollmentService, we actually make two separate update calls in the same method:
if @program.attorney_approval_status == Af::Program::AttorneyApprovalStatus::PENDING
@program.update!(
attorney_approval_status: Af::Program::AttorneyApprovalStatus::APPROVED,
attorney_approval_status_date: Date.current
)
end
#
@program.update!(
contract_executed_date_time: contract_executed_date_time,
client_signed_date_time: client_signed_date_time,
co_client_signed_date_time: co_client_signed_date_time
)
If there’s stuff here that’s that easy to fix, I’m sure we could take a few % points of total Salesforce usage just by combining some calls here.
It seems a frequent pattern in Beyond’s Salesforce services that operations are written for a single instance of a resource and naively looped over. Instead, you need services which operate over several instances and will batch calls where applicable.
Overall latency on the web side hasn’t really changed in either direction this year. As you already know, latency here is driven by the Salesforce integration.
We fundamentally can’t make Salesforce any faster than it already is. We have two options:
I would like to do more of both.
I’ve been seeing a lot of success with other companies employing prosopite as a kind of semi-automated workflow for reducing N+1s in Rails apps.
The workflow is:
This approach of “whitelist all current violations but forbid future violations, then burndown” is a great strategy that I’ve found works well for mitigating this technical debt.
However, we don’t have the ability to parse Salesforce queries off-the-shelf in the same way Prosopite does. That could be pretty hard. Also, even if we did, it isn’t really a guarantee that it’s going to be very helpful, as particularly on the web side I think you’ve got a lot of Salesforce API usage that isn’t reducible to N+1. And if you go to Salesforce 100 times and none of them are N+1s, well, it’s still gonna be really slow!
I think what might work is a global call count limit. Wrap every controller and every job in a middleware that asserts that the number of times Salesforce is reached out to during this job/controller cannot be more than X. Ideally, I think to meet a latency goal of 1 second or so, that number is probably about 30 to 40.
I’m sure you have many, many specs that today would violate this. You also might have to set the threshold even lower because your test data is likely not as complex as your production data and the number of Salesforce calls is likely lower overall in tests. But an atmosphere where:
… is one that would definitely get these call counts under control.
Your current approach of asserting Salesforce counts around specific operations isn’t enough because services can be combined arbitrarily or called in loops. Only a global limit around the entire transaction can guarantee that we’re actually not doing anything crazy under the hood.
The next step is providing people with a mechanism to parallelize their Salesforce API usage.
I felt that in 2025 we started down this path a bit with our work to
parallelize document downloads, but it wasn’t taken much further. I also said
something similar to what I’m about to propose in the 2025 report. But, the
idea has crystallized a bit better for me, and now that we’ve got a bit of
experience using
Async and it’s APIs
in production I feel a bit more clear about the path forward here.
Any transaction (job or web) makes N number of requests to Salesforce. Today, those requests happen strictly sequentially, and strictly in the same order.
However, think about a controller as a black box:
In: The rack environment, controller parameters
~~ MYSTERIOUS BLACK BOX OF SALESFORCE INTERACTIONS ~~
Out: A rack response [200,{},some_json_object]
In order to generate the JSON object, we have a list of ~100 Salesforce queries we have to make.
Those queries all have a relationship with each other. Currently, since they execute linearly with no parallelism, that relationship looks like:
Query 1 → Query 2 → Query 3 → ... → Query N → JSON Object
However, if you actually thought about the relationships of these queries to each other, they form a directed acyclic graph which is not at all linear:
Query 1 (Program ID)
/ | \
Query 2 Query 3 Query 4
(Client) (Tradelines) (Attorney)
| | |
Query 5 Query 6 Query 7
(Address) (Creditors) (Fees)
| | |
Query 8 Query 9 Query 10
(History) (Payments) (Status)
\ | /
\ Query 11 /
\ (Balance) /
\ | /
\ | /
\ | /
JSON Object
What you’re currently missing is twofold:
If we could make the DAG something you could actually enumerate in code, we could build a Salesforce data loading abstraction that parallelized automatically. In the example above, we could determine that Query 1 must be done, then query 2-5-8, query 3-6-9-11, and queries 4-7-10 could be done in parallel.
There would need to be several types of objects involved:
conurrent-ruby and
Async in more
detail).
repository.load
would cause the entire graph to “run”, and it would output a single hash of
data which can then be passed to a templating or serialization layer (so,
after the repository is done loading, you can ban all further requests to
salesforce).
This would be a pretty major migration. I think I’m also not able to clearly articulate this, and I will need to ship a first version of the code for the abstraction to be clear.
My plan would look like this:
It would probably require a significant amount of investment in 2026. It depends on how much latency improvement is important to you. Parallelizing queries of course does not reduce total usage of Salesforce, it only makes the transaction itself have lower latency.
This controller responds in >2 seconds about 75% of the time, usually with around 75 to 100 Salesforce calls. It frequently takes more than 5 seconds to respond.
70% of requests to this controller take more than 2 seconds to complete. With about ~75 calls to Salesforce to complete, this controller generally takes around 2 seconds.
This controller responds in 2 seconds or more about 60% of the time.
In many ways, this controller looks very similar to OnlineCreditPullsController#create. It generally takes about 2 seconds to complete. However, it only has about 10-20 Salesforce calls. Because of this, it might be the easiest of the three controllers here to unwind and fix.