How we rewrote a timed-out PHP service into a Node.js Lambda — and got 50% faster

The project-clone service was timing out in production. Not occasionally — reliably, under any meaningful load. A feature used by every customer, every day, was quietly failing and falling back to manual workarounds.

Why it mattered

Project cloning is one of those operations that looks simple until you look closely. In a digital infrastructure platform, "clone this project" means duplicating settings, calibration data, configuration files, relational references — a chain of dependent operations that can't be half-done. When the PHP service handling it timed out, the result wasn't a graceful error. It was a half-cloned project that customers had to untangle manually. The engineering team had been patching around it for months. It was time to rethink the architecture, not the patch.

The problem

The original service ran as a synchronous PHP process on a shared application server. Every clone request held a connection open for its entire duration. On a small project, it completed just fine. On anything real — a project with thousands of calibration data points, large attached files, nested settings — it would breach the server's timeout threshold and die mid-execution.

The deeper issue was structural. A long-running, blocking operation had no business sitting inside a synchronous request-response cycle. It needed to be extracted, isolated, and given its own execution environment — one that wouldn't compete with live web traffic for server resources, and one that could be given however much time the job actually required.

What we already had — and why it wasn't enough

We weren't flying blind. Laravel Queues was already in the picture. The clone job had been moved off the synchronous request cycle and into a queued background job — which was the right instinct. But it hadn't solved the problem.

Laravel Queues runs on your application server. The workers are long-lived PHP processes that share infrastructure with the rest of your app. As project sizes grew, the queue workers were timing out too — just later in the cycle, and less visibly. Scaling the workers meant scaling the server. Debugging a failed job meant trawling through queue logs with no clean isolation. Testing a job end-to-end required spinning up the full Laravel environment.

The problem wasn't that we hadn't tried async. It was that we'd moved the job sideways, not out.

The insight was simpler: this job had a clear input (a project ID and clone parameters) and a clear output (a new project). It had no UI, no session, no real-time dependency. It was, architecturally, a function — and AWS Lambda exists precisely for this.

The solution

We rewrote the service in Node.js and deployed it as an AWS Lambda function invoked on demand.

Node.js was chosen deliberately. The clone operation is I/O-bound — reading from a database, copying files, writing new records — and Node's non-blocking I/O model handles this class of work efficiently. PHP's synchronous execution model was part of what made it slow; switching to async/await in Node removed that ceiling entirely.

The Lambda function received a structured payload, executed each phase of the clone sequentially with proper error handling, and returned a result. No shared server resources. No connection pools to contend with. A timeout we controlled per invocation, not one inherited from a web server config — or a queue worker process.

One thing we had to solve deliberately: cold starts. The first invocation after a period of inactivity carries the cost of Lambda initialising the runtime and loading the function code. For a clone job that could take several seconds on a large project, an extra cold-start delay on top was noticeable.

The fix we landed on was periodic pinging — a scheduled CloudWatch event that invokes the Lambda function at regular intervals with a no-op payload, just enough to keep the execution environment warm. No provisioned concurrency costs, no infrastructure changes. A lightweight scheduler doing one job: ensuring the function is ready when a real request arrives.

// Lambda handler — handles both warm-up pings and real clone jobs
export const handler = async (event) => {
  // Scheduled warm-up ping — return immediately, no work done
  if (event.source === 'warmup') {
    return { warmed: true };
  }

  const { sourceProjectId, targetName, options } = event;

  const newProject = await createProject(targetName);
  await cloneSettings(sourceProjectId, newProject.id, options);
  await cloneCalibrationData(sourceProjectId, newProject.id);
  await cloneFiles(sourceProjectId, newProject.id);

  return { success: true, projectId: newProject.id };
};

// Lambda handler — handles both warm-up pings and real clone jobs
export const handler = async (event) => {
  // Scheduled warm-up ping — return immediately, no work done
  if (event.source === 'warmup') {
    return { warmed: true };
  }

  const { sourceProjectId, targetName, options } = event;

  const newProject = await createProject(targetName);
  await cloneSettings(sourceProjectId, newProject.id, options);
  await cloneCalibrationData(sourceProjectId, newProject.id);
  await cloneFiles(sourceProjectId, newProject.id);

  return { success: true, projectId: newProject.id };
};

// Lambda handler — handles both warm-up pings and real clone jobs
export const handler = async (event) => {
  // Scheduled warm-up ping — return immediately, no work done
  if (event.source === 'warmup') {
    return { warmed: true };
  }

  const { sourceProjectId, targetName, options } = event;

  const newProject = await createProject(targetName);
  await cloneSettings(sourceProjectId, newProject.id, options);
  await cloneCalibrationData(sourceProjectId, newProject.id);
  await cloneFiles(sourceProjectId, newProject.id);

  return { success: true, projectId: newProject.id };
};

It's a pragmatic pattern — not elegant, but honest. If your invocation frequency is low and provisioned concurrency feels like over-engineering, a warm-up ping gets you most of the benefit at near-zero cost. The tradeoff to understand: it keeps one instance warm, not many. If you anticipate burst concurrency, provisioned concurrency is still the right tool.

The result

50% improvement in end-to-end clone efficiency compared to the PHP baseline
Timeout-related failures dropped sharply — the chronic, load-driven timeouts that had plagued the service were gone
The operation became independently deployable, testable, and monitorable in isolation

The takeaway

Moving a slow job into a queue is the right first step — but a queue running on your application server is still your application server's problem. True isolation means a separate execution environment with its own timeout budget, its own deployment lifecycle, and its own failure surface.

If your Laravel Queue workers are timing out on heavy jobs: the queue wasn't the wrong answer. The infrastructure underneath it was. Ask whether the job belongs on your server at all — or whether it's a function waiting to be extracted.

How I Built a Shareholder Dashboard That Replaced 6 Years of Manual Work — and Taught Me More Than I Expected ›