7.23.2007

Reverse Proxy in PHP5, Rev2

It's gotten a bit more complex; The proxy handler didn't pass all the client headers to the proxy server. This caused problems with having the wrong client type, no Etag caching, cookie passing, etc. Here's the current rev, which solves a lot of these issues.

The cookie handling was broken because I wasn't using cookies on my back-end app. My SSO implementation was caching the cookies to the back-end server in the session.

So, here you go!


<?php

class ProxyHandler
{
private
$url;
private
$proxy_url;
private
$proxy_host;
private
$proxy_proto;
private
$translated_url;
private
$curl_handler;
private
$cache_control=false;
private
$pragma=false;
private
$client_headers=array();

function
__construct($url, $proxy_url)
{
// Strip the trailing '/' from the URLs so they are the same.
$this->url = preg_replace(',/$,','',$url);
$this->proxy_url = preg_replace(',/$,','',$proxy_url);

// Parse all the parameters for the URL
if (isset($_SERVER['PATH_INFO']))
{
$proxy_url .= $_SERVER['PATH_INFO'];
}
else
{
// Add the '/' at the end
$proxy_url .= '/';
}

if (
$_SERVER['QUERY_STRING'] !== '')
{
$proxy_url .= "?{$_SERVER['QUERY_STRING']}";
}

$this->translated_url = $proxy_url;

$this->curl_handler = curl_init($this->translated_url);

// Set various options
$this->setCurlOption(CURLOPT_RETURNTRANSFER, true);
$this->setCurlOption(CURLOPT_BINARYTRANSFER, true); // For images, etc.
$this->setCurlOption(CURLOPT_USERAGENT,$_SERVER['HTTP_USER_AGENT']);
$this->setCurlOption(CURLOPT_WRITEFUNCTION, array($this,'readResponse'));
$this->setCurlOption(CURLOPT_HEADERFUNCTION, array($this,'readHeaders'));

// Process post data.
if (count($_POST))
{
// Empty the post data
$post=array();

// Set the post data
$this->setCurlOption(CURLOPT_POST, true);

// Encode and form the post data
foreach($_POST as $key=>$value)
{
$post[] = urlencode($key)."=".urlencode($value);
}

$this->setCurlOption(CURLOPT_POSTFIELDS, implode('&',$post));

unset(
$post);
}
elseif (
$_SERVER['REQUEST_METHOD'] !== 'GET') // Default request method is 'get'
{
// Set the request method
$this->setCurlOption(CURLOPT_CUSTOMREQUEST, $_SERVER['REQUEST_METHOD']);
}

// Handle the client headers.
$this->handleClientHeaders();

}

public function
setClientHeader($header)
{
$this->client_headers[] = $header;
}

// Executes the proxy.
public function execute()
{
$this->setCurlOption(CURLOPT_HTTPHEADER, $this->client_headers);
curl_exec($this->curl_handler);
}

// Get the information about the request.
// Should not be called before exec.
public function getCurlInfo()
{
return
curl_getinfo($this->curl_handler);
}

// Sets a curl option.
public function setCurlOption($option, $value)
{
curl_setopt($this->curl_handler, $option, $value);
}

protected function
readHeaders(&$cu, $string)
{
$length = strlen($string);
if (
preg_match(',^Location:,', $string))
{
$string = str_replace($this->proxy_url, $this->url, $string);
}
elseif(
preg_match(',^Cache-Control:,', $string))
{
$this->cache_control = true;
}
elseif(
preg_match(',^Pragma:,', $string))
{
$this->pragma = true;
}
if (
header !== "\r\n")
{
header(rtrim($string));

}
return
$length;
}

protected function
handleClientHeaders()
{
$headers = apache_request_headers();

foreach (
$headers as $header => $value) {
switch(
$header)
{
case
'Host':
break;
default:
$this->setClientHeader(sprintf('%s: %s', $header, $value));
break;
}
}
}

protected function
readResponse(&$cu, $string)
{
static
$headersParsed = false;

// Clear the Cache-Control and Pragma headers
// if they aren't passed from the proxy application.
if ($headersParsed === false)
{
if (!
$this->cache_control)
{
header('Cache-Control: ');
}
if (!
$this->pragma)
{
header('Pragma: ');
}
$headersParsed = true;
}
$length = strlen($string);
echo
$string;
return
$length;
}
}

?>






Update: Added a google code project for php5rp at Google Code and here's the Subversion Link for downloading.

13 comments:

  1. Replace:
    $this->url = preg_replace(',/$,','',$url);

    by :
    $this->url = rtrim( $url, '/' );

    It's more efficient !

    Here my 2 cents :-)

    ReplyDelete
  2. Hi, I downloaded your file from Subversion and installed it on my server. When I call it, I get the following messages:

    Notice: Use of undefined constant header - assumed 'header' in /usr/local/apache2/htdocs/apps/syndication/
    lib/ProxyHandler.class.php on line 99

    Warning: Cannot modify header information - headers already sent by (output started at /usr/local/apache2/htdocs/apps/syndication/
    lib/ProxyHandler.class.php:99) in /usr/local/apache2/htdocs/apps/syndication/
    lib/ProxyHandler.class.php on line 101

    The calling php file contains NO extra characters or lines at the beginning or end.

    Any ideas?

    Thank you

    Daniel

    ReplyDelete
  3. Hi again,

    I forget to mention that when I copy code to my server I apply some simple optimization (remove blanks lines, remove tabs, ...). So the lines 99 and 101 in my previous post refer to:
    ...
    if (header !== "\r\n") // LINE 99
    {
    header(rtrim($string)); // LINE 101
    }
    ...

    Thank you

    ReplyDelete
  4. This should be fixed. There's a couple other fixes in there, mostly the rtrim stuff. :)

    ReplyDelete
  5. I've been having problems with 6K + html files, binary files work fine and small html files work fine.

    It seems on the reply once the header is sent to the browser the browser sends a FIN and closes the connection, but only on HTML files over 6K, tried a number of proxy targets.

    Anyone else have this strangeness?

    =M

    ReplyDelete
  6. This comment has been removed by a blog administrator.

    ReplyDelete
  7. This comment has been removed by a blog administrator.

    ReplyDelete
  8. This comment has been removed by a blog administrator.

    ReplyDelete
  9. does not work if PHP is running as CGI and not as module. Because in PHP CGI version apache_request_headers is not avaible. Should be fixed! See here how: http://php.net/manual/en/function.apache-request-headers.php

    ReplyDelete
  10. Hi.
    Thanks for sharing this script, it made my life easier on setting a simple Revers Proxy to serve a website from a "fake" section (a "fake" subfolder, created by mod_rewrite) from another website.

    Just one minor issue:
    the SVN version (with the changes proposed by Frédéric on comment #1) didn't worked for me.
    It throw this error:

    Parse error: syntax error, unexpected T_VARIABLE, expecting T_FUNCTION in /home/xxxx/website/www/php5rp/ProxyHandler.class.php on line 67

    I had to revert back to preg_replace and voilà! It worked like a charm.

    ReplyDelete
  11. BTW, this class was working flawlessly on my local development server, but failed miserably on the hosting server, as there, PHP is running as FastCGI and not as an Apache module.
    So, the problem was that the apache_request_headers() function wasn't available and the class was just failing.

    Thankfully, this guy knows what he does and he created a nifty script to "patch" a server when apache_request_headers() is missing.

    So, I thrown the following code at top of protected function handleClientHeaders(), and tada! The thing worked on the hosting server too:

    //Patching for when PHP does't run as an Apache module
    if(!function_exists('apache_request_headers')) {
    function apache_request_headers() {
    $headers = array();
    foreach($_SERVER as $key => $value) {
    if(substr($key, 0, 5) == 'HTTP_') {
    $headers[str_replace(' ', '-', ucwords(str_replace('_', ' ', strtolower(substr($key, 5)))))] = $value;
    }
    }
    return $headers;
    }
    }

    ----------

    I hope someone else find this useful.

    ReplyDelete
  12. hello,
    do you have experie nce with php reverse proxy over SSL ?

    ReplyDelete
  13. Is there a way to do this without cURL? My webserver, whose configuration I cannot change, is in PHP safe-mode and has no php5-curl/libcurl. Thanks

    ReplyDelete