7.17.2007

Writing A Reverse Proxy in PHP5

So I have been working on a little class to run a reverse proxy from PHP using cURL. I have extended this class for my own purposes (single-sign-on) to handle some special request parameters, but here it is. It has some warts, but it's a good starting point. I would appreciate any pointers anyone has to offer.


<?php

class ProxyHandler
{
private $url;
private $translated_url;
private $curl_handler;

function __construct($url, $proxy_url)
{
$this->url = $url;
$this->proxy_url = $proxy_url;

// Parse all the parameters for the URL
if (isset($_SERVER['PATH_INFO']))
{
$proxy_url .= $_SERVER['PATH_INFO'];
}
else
{
$proxy_url .= '/';
}

if ($_SERVER['QUERY_STRING'] !== '')
{
$proxy_url .= "?{$_SERVER['QUERY_STRING']}";
}

$this->translated_url = $proxy_url;

$this->curl_handler = curl_init($proxy_url);

// Set various options
$this->setCurlOption(CURLOPT_RETURNTRANSFER, true);
$this->setCurlOption(CURLOPT_BINARYTRANSFER, true); // For images, etc.
$this->setCurlOption(CURLOPT_USERAGENT,$_SERVER['HTTP_USER_AGENT']);
$this->setCurlOption(CURLOPT_WRITEFUNCTION, array($this,'readResponse'));
$this->setCurlOption(CURLOPT_HEADERFUNCTION, array($this,'readHeaders'));

// Process post data.
if (count($_POST))
{
// Empty the post data
$post=array();

// Set the post data
$this->setCurlOption(CURLOPT_POST, true);

// Encode and form the post data
foreach($_POST as $key=>$value)
{
$post[] = urlencode($key)."=".urlencode($value);
}

$this->setCurlOption(CURLOPT_POSTFIELDS, implode('&',$post));

unset($post);
}
elseif ($_SERVER['REQUEST_METHOD'] !== 'GET') // Default request method is 'get'
{
// Set the request method
$this->setCurlOption(CURLOPT_CUSTOMREQUEST, $_SERVER['REQUEST_METHOD']);
}

}

// Executes the proxy.
public function execute()
{
curl_exec($this->curl_handler);
}

// Get the information about the request.
// Should not be called before exec.
public function getCurlInfo()
{
return curl_getinfo($this->curl_handler);
}

// Sets a curl option.
public function setCurlOption($option, $value)
{
curl_setopt($this->curl_handler, $option, $value);
}

protected function readHeaders(&$cu, $string)
{
$length = strlen($string);
if (preg_match(',^Location:,', $string))
{
$string = str_replace($this->proxy_url, $this->url, $string);
}
header($string);
return $length;
}

protected function readResponse(&$cu, $string)
{
$length = strlen($string);
echo $string;
return $length;
}
}
?>


And here's an example .htaccess file:


RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f [NC]
RewriteCond %{REQUEST_FILENAME} !-d [NC]
RewriteCond %{REQUEST_URI} !^/index.php
RewriteRule ^(.+)$ index.php/$1 [QSA]


And an example usage:


$proxy = new ProxyHandler('http://publicsite.example.com','http://privatesite.example.com');
$proxy->execute();



Cheers.

6 comments:

  1. Nice work. Well formatted code. Appears to work just fine over here too. I'm impressed.

    Keep up the good work.

    ReplyDelete
  2. Thanks, Josh.

    This is obviously not as well performing as mod_proxy, until you have to do any kind of DB work to filter your requests. This is where I found this to be a benefit.

    Cheers,
    Brian

    ReplyDelete
  3. You might also want to check against $_SERVER['REQUEST_URI'] to append to your proxy_url. It may save you a step. :)

    ReplyDelete
  4. Thanks, Aaron. I am not sure exactly which check this would prevent. Can you possibly give a small example?

    I also have a new version which passes the client headers (except Host), which allows for Etag handling, etc. I am working on packaging it up, and was thinking of making either a google code project, or sourceforge project out of it.

    ReplyDelete
  5. Well, I'm looking at your code here:
    [code]
    if (isset($_SERVER['PATH_INFO']))
    {
    $proxy_url .= $_SERVER['PATH_INFO'];
    }
    else
    {
    $proxy_url .= '/';
    }

    if ($_SERVER['QUERY_STRING'] !== '')
    {
    $proxy_url .= "?{$_SERVER['QUERY_STRING']}";
    }
    [/code]

    I believe you could do this:
    [code]
    $proxy_url .= $_SERVER['REQUEST_URI'];
    [/code]

    REQUEST_URI should always be set (of course there might be some real weird case where it isn't... but I can't think of one!). It will contain the entire request including the path and the query.

    ReplyDelete
  6. I see you also have rewrite conditions, don't forget $_SERVER['REDIRECT_QUERY_STRING'] if you find yourself dropping get variables on redirect (or if you wanna do something different).

    Unfortuantely I don't think it's possible to retain POST vars from apache redirects, too bad but at least you keep get right?

    It's a nice, consise class though. I might give this a try for SSO for stuff I have apache doing mod_proxy work with some small webservices that run in python or xinetd that I'm depending on basic auth right now...

    ReplyDelete