How to Load Balance Requests Over Several API Servers?

Posted on Posted in php, Virtual Private Server

My Chrome Extension [Video Download Helper] suffers a few hundred uninstallation due to a high load average to one of the API servers, although other servers are idle.

How to return 503 when server is overloaded?

To migrate the load, I have to apply a 503 (server busy, temporarily unavailable) when the server load average is above a threshold:

1
2
3
4
5
6
7
8
$currentload = sys_getloadavg();
$threshold = 1.5;
if ($currentload[0] > $threshold) {
  header($_SERVER['SERVER_PROTOCOL'] .' 503 Service Temporarily Unavailable');
  header('Status: 503 Service Temporarily Unavailable');
  header('Retry-After: 300');
  exit("<!DOCTYPE HTML PUBLIC \"-//IETF//DTD HTML 2.0//EN\">\r\n<html><head>\r\n<title>503 Service Temporarily Unavailable</title>\r\n</head><body>\r\n<h1>503 Service Temporarily Unavailable</h1>\r\n<p>The requested URL " . $_SERVER['SCRIPT_NAME'] . " is Temporarily Unavailable.</p>\r\n</body></html>");
}

How to Migrate the Requests to other less busy servers?

I have 4 API servers so far, when a server has a high load average, any further requests should be redirected to other less busy servers. So, we can create a API for each API server that returns the current load average for the past 1, 5 and 15 minutes. We can also add a return that equals to the number of the virtual CPU cores. For example,

1
2
curl -s https://helloacm.com/api/loadavg/
[0.13,0.08,0.08,2]

Means the server helloacm.com has load average of 0.13, 0.09 and 0.08 for the past 1, 5 and 15 minutes. Also, the server has 2 virtual CPU cores. We then, need to list the available servers:

1
$servers = ['happyukgo.com', 'steakovercooked.com', 'uploadbeta.com'];

Then, we can place the code before each API call each servers (need to change the above server list accordingly);

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
$threshold_busy = 0.8; // only when load average is above 0.8
if ($currentload[0] > $threshold_busy) {
  // ask other available servers
  $s = array();
  foreach ($servers as $cur) { // get the load average for each server
    $x = json_decode(file_get_contents("https://$cur/api/loadavg/"), true);
    if (is_array($x)) {
      if (count($x) == 4) {
        $load = $x[0];
        $cpu = $x[3];
        $s[$cur] = $load / $cpu;
      }       
    }
  }
  asort($s); // sort the returns so we know the most idle server
  header("Access-Control-Allow-Origin: *");
  header('Content-Type: application/json');  
  $alternative_server = false;
  foreach ($s as $ss => $v) { // try the idle server first
    $api = json_decode(file_get_contents("https://$ss/api/video/?cached&video=" . $_GET['video']), true); // call the API of the alternative server
    if ($api && is_array($api)) {
      $api["alternative_server"] = $ss; // mark the return of the actual server
      $alternative_server = true;
      break;
    }     
  }
  if ($alternative_server) { // we get the results from the alternative server
    die(json_encode($api));
  }
}  
// continue normally
...

The idea is to first get the load average for each servers in the candidate list. Sort the list by the load average so that we can try from the idle servers one by one until we have results from one of the servers.

This approach helps to make full use of all servers and help to mitigate the server load when many requests occur to the same server.