Bug #7763: nothing works - Checkey - Guardian Project Dev (ARCHIVED SITE)

Copy

Bug #7763

« Previous | 4 of 14 | Next »

nothing works

Added by Anonymous over 1 year ago.

Status:

New

Start date:

07/17/2016

Priority:

Normal

Due date:

Assignee:

% Done:

Category:

Target version:

Component:

Description

#!/usr/bin/perl #

CGIProxy 2.1.17 #
CGIProxy (nph-proxy.cgi): a proxy in the form of a CGI script.
Retrieves the resource at any HTTP or FTP URL, updating embedded URLs
in HTML and all other resources to point back through this script. By
default, no user info is sent to the server. Options include
text-only proxying to save bandwidth, cookie filtering, ad filtering,
script removal, user-defined encoding of the target URL, and much more.
Besides running as a CGI script, can also run under mod_perl, as a
FastCGI script, or can use its own embedded HTTP server.
Requires Perl 5. #
Copyright (C) 1996, 1998-2016 by James Marshall, james@jmarshall.com
All rights reserved. Free for non-commercial use; commercial use
requires a license. #
For the latest, see https://jmarshall.com/tools/cgiproxy/ # #
IMPORTANT NOTE ABOUT ANONYMOUS BROWSING: #
CGIProxy was originally made for indirect browsing more than
anonymity, but since people are using it for anonymity, I've tried
to make it as anonymous as possible. Suggestions welcome. For best
anonymity, browse with JavaScript turned off. That said, please notify
me if you find any privacy holes, even when using JavaScript.
Anonymity is good, but may not be bulletproof. For example, if even
a single unchecked JavaScript statement can be run, your anonymity
can be compromised. I've tried to handle JS in every place it can
exist, but please tell me if I missed any. Also, browser plugins
or other executable extensions may be able to reveal you to a server.
Also, be aware that this script doesn't modify PDF files or other
third-party document formats that may contain linking ability, so
you will lose your anonymity if you follow links in such files.
If you find any other way your anonymity can be compromised, please let
me know. # #
INSTALLATION: #
First, edit this file (nph-proxy.cgi) to configure it-- see the CONFIGURATION
section just below for certain options that may be required. All
configuration variables are set in the "user configuration" section starting
around line 338.
After copying nph-proxy.cgi to your server, run "./nph-proxy.cgi init"
from the server command line (on Windows, run "perl nph-proxy.cgi init").
This creates needed directories, installs all optional Perl (CPAN) modules,
and creates the database that CGIProxy uses. Ignore the scrolling text,
and hit <return> if asked any questions. Ideally you can run this command
as root to set file permissions and ownership optimally, but even if run as
non-root these will be handled as well as possible and the script should
still work.
To see a simple usage message, run "./nph-proxy.cgi -?".
It's fine to rename this file, as long as your Web server is set up to
recognize it. All of the documentation refers to "nph-proxy.cgi",
but replace that with whatever you renamed the file to. #
For complete installation instructions, see
https://jmarshall.com/tools/cgiproxy/install.html # #
CONFIGURATION: #
. Set $PROXY_DIR and $RUN_AS_USER -- see the comments above those settings
for details.
. If you don't have root access on your server, set $LOCAL_LIB_DIR so that
the Perl (CPAN) modules can be installed under your own directory. Do
this before running "./nph-proxy.cgi init", as described above.
. If you're using either a MySQL/MariaDB or Oracle database to store cookies,
you need to set $DB_DRIVER, $DB_USER, $DB_PASS, and possibly $DB_SERVER .
See the notes by those settings for more details. Note that you need to
purge the database periodically by running "./nph-proxy.cgi purge-db",
with a cron job on Unix or Mac, or with the Task Scheduler in Windows.
The default database driver is SQLite, which doesn't need a username or
password or even a running database engine, but still requires periodic
purging.
. If you're using another HTTP or SSL proxy, set $HTTP_PROXY,
$SSL_PROXY, and $NO_PROXY as needed. If those proxies use
authentication, set $PROXY_AUTH and $SSL_PROXY_AUTH accordingly.
. If you're using a SOCKS proxy, set $SOCKS_PROXY and possibly
$SOCKS_USERNAME and $SOCKS_PASSWORD .
. If this is running on an insecure server that doesn't use port 80, set
$RUNNING_ON_SSL_SERVER=0 (otherwise, the default of '' is fine).
. If you plan to run CGIProxy as a FastCGI script, set at least
$SECRET_PATH and see the configuration section "FastCGI configuration".
. If you plan to run CGIProxy using its own embedded server, set
$SECRET_PATH and see the configuration section "Embedded server configuration".
You'll also need a certificate and private key (key pair) in PEM
format.
. See http://www.jmarshall.com/tools/cgiproxy/options.html#env , in the section
"OPTIONS RELATED TO YOUR SERVER/NETWORK ENVIRONMENT", for other options
you may need to set. #
Other options include:
. Set $TEXT_ONLY, $REMOVE_COOKIES, $REMOVE_SCRIPTS, $FILTER_ADS,
$HIDE_REFERER, and $INSERT_ENTRY_FORM as desired. Set
$REMOVE_SCRIPTS if anonymity is important.
. To let the user choose all of those settings (except $TEXT_ONLY),
set $ALLOW_USER_CONFIG=1.
. To change the encoding format of the URL, modify the
proxy_encode() and proxy_decode() routines. The default
routines are suitable for simple PATH_INFO compliance.
. To encode cookies, modify the cookie_encode() and cookie_decode()
routines.
. You can restrict which servers this proxy will access, with
@ALLOWED_SERVERS and @BANNED_SERVERS.
. Similarly, you can specify allowed and denied server lists for
both cookies and scripts.
. For security, you can ban access to private IP ranges, with
@BANNED_NETWORKS.
. If filtering ads, you can customize this with a few settings.
. To insert your own block of HTML into each page, set $INSERT_HTML
or $INSERT_FILE.
. As a last resort, if you really can't run this script as NPH,
you can try to run it as non-NPH by setting $NOT_RUNNING_AS_NPH=1.
BUT, read the notes and warnings above that line. Caveat surfor.
. For crude load-balancing among a set of proxies, set @PROXY_GROUP.
. Other config is possible; see the user configuration section.
. If heavy use of this proxy puts a load on your server, see the
"NOTES ON PERFORMANCE" section below. #
For more info, read the comments above any config options you set. #
For a full list of options, see https://jmarshall.com/tools/cgiproxy/options.html #
This script MUST be installed as a non-parsed header (NPH) script.
In Apache and many other servers, this is done by simply starting the
filename with "nph-". It MAY be possible to fake it as a non-NPH
script, MOST of the time, by using the $NOT_RUNNING_AS_NPH feature.
This is not advised. See the comments by that option for warnings. # #
TO USE:
Start a browsing session by visiting the script's URL with no parameters.
You can bookmark pages you browse to through the proxy, or link to
the URLs that are generated. # #
NOTES ON PERFORMANCE:
Unfortunately, this has gotten slower through the versions, mostly
because of optional new features. Configured equally, version 1.3
takes 25% longer to run than 1.0 or 1.1 (based on cough highly
abbreviated testing). Compiling takes about 50% longer.
Leaving $REMOVE_SCRIPTS=1 adds 25-50% to the running time.
Remember that we're talking about tenths of a second here. Most of
the delay experienced by the user is from waiting on two network
connections. These performance issues only matter if your server
CPU is getting overloaded. Also, these mostly matter when retrieving
JavaScript and Flash, because modifying those is what takes most of the
time.
If you can, use mod_perl. Starting with version 1.3.1, this should
work under mod_perl, which requires Perl 5.004 or later. If you use
mod_perl, be careful to install this as an NPH script, i.e. set the
"PerlSendHeader Off" configuration directive (or "PerlOptions -ParseHeaders"
if using mod_perl 2.x). For more info, see the mod_perl documentation.
If you can't use mod_perl, try using FastCGI. Configure the section
"FastCGI configuration" below, and run nph-proxy.cgi from the command
line to see a usage message. You'll also need to configure your
Web server to use FastCGI.
If you can't use mod_perl or FastCGI, try running CGIProxy as its own
embedded server. Configure the section "Embedded server configuration",
and run nph-proxy.cgi from the command line to see a usage message.
You'll also need a key pair (certificate and private key).
If you use mod_perl, FastCGI, or the embedded server, and modify this
script, see the note near the "reset 'a-z'" line below, regarding
UPPER_CASE and lower_case variable names. #
If performance on the browser is bad for JS-heavy sites like facebook,
then close other browser windows and other CPU-heavy processes, and
see the comments above the setting of %REDIRECTS below. Also, try
using a browser other than MSIE-- it seems to have the most problems. # #
TO DO:
What I want to hear about:
. Any HTML tags not being converted here.
. Any method of introducing JavaScript or other script, that's not
being handled here.
. Any script MIME types other than those already in @SCRIPT_MIME_TYPES.
. Any MIME types other than text/html that have links that need to
be converted.
plug any other script holes (e.g. MSIE-proprietary, other MIME types?)
more error checking?
find a simple encryption technique for proxy_encode()
For ad filtering, add option to disable images from servers other than
that of the containing HTML page? Is it worth it? # #
BUGS:
Anonymity may not not perfect. In particular, there may be some remaining
JavaScript or Flash holes. Please let me know if you find any.
Since ALL of your cookies are sent to this script (which then chooses
the relevant ones), some cookies could be dropped if you accumulate a
lot, resulting in "Bad Request" errors. To fix this, use a database
server for cookies. # #
I first wrote this in 1996 as an experiment to allow indirect browsing.
The original seed was a program I wrote for Rich Morin's article
in the June 1996 issue of Unix Review, online at
http://www.cfcl.com/tin/P/199606.shtml. #
Confession: I didn't originally write this with the spec for HTTP
proxies in mind, and there are probably some violations of the protocol
(at least for proxies). This whole thing is one big violation of the
proxy model anyway, so I hereby rationalize that the spec can be widely
interpreted here. If there is demand, I can make it more conformant.
The HTTP client and server components should be fine; it's just the
special requirements for proxies that may not be followed. #
#--------------------------------------------------------------------------

use strict ;
use warnings ;
no warnings qw(uninitialized redefine) ; # we use defaults all the time

use Encode ;
use IO::Handle ;
use IO::Select ;
use File::Spec ;
use Time::Local ;
use Getopt::Long ;
use Socket qw(:all) ;
use Net::Domain qw(hostfqdn) ;
use Fcntl qw(:DEFAULT :flock) ;
use POSIX qw(:sys_wait_h setsid);
use Time::HiRes qw(gettimeofday tv_interval) ;
use Errno qw(EINTR EAGAIN EWOULDBLOCK ENOBUFS EPIPE) ;

First block below is config variables, second block is sort-of config
variables, third block is persistent constants, fourth block is would-be
persistent constants (not set until needed), fifth block is constants for
JavaScript processing (mostly regular expressions), and last block is
variables.
Removed $RE_JS_STRING_LITERAL to help with Perl's long-literal-string bug,
but can replace it later if/when that is fixed. Added
$RE_JS_STRING_LITERAL_START, $RE_JS_STRING_REMAINDER_1, and
$RE_JS_STRING_REMAINDER_2 as part of the workaround.
use vars qw(
$PROXY_DIR $SECRET_PATH $LOCAL_LIB_DIR
$FCGI_SOCKET $FCGI_MAX_REQUESTS_PER_PROCESS $FCGI_NUM_PROCESSES
$PRIVATE_KEY_FILE $CERTIFICATE_FILE $RUN_AS_USER $EMB_USERNAME $EMB_PASSWORD
$DB_DRIVER $DB_SERVER $DB_NAME $DB_USER $DB_PASS $USE_DB_FOR_COOKIES
%REDIRECTS %TIMEOUT_MULTIPLIER_BY_HOST
$DEFAULT_LANG
$TEXT_ONLY
$REMOVE_COOKIES $REMOVE_SCRIPTS $FILTER_ADS $HIDE_REFERER
$INSERT_ENTRY_FORM $ALLOW_USER_CONFIG
$ENCODE_DECODE_BLOCK_IN_JS
@ALLOWED_SERVERS @BANNED_SERVERS @BANNED_NETWORKS
$NO_COOKIE_WITH_IMAGE @ALLOWED_COOKIE_SERVERS @BANNED_COOKIE_SERVERS
@ALLOWED_SCRIPT_SERVERS @BANNED_SCRIPT_SERVERS
@BANNED_IMAGE_URL_PATTERNS $RETURN_EMPTY_GIF
$USER_IP_ADDRESS_TEST $DESTINATION_SERVER_TEST
$INSERT_HTML $INSERT_FILE $ANONYMIZE_INSERTION $FORM_AFTER_INSERTION
$INSERTION_FRAME_HEIGHT
$RUNNING_ON_SSL_SERVER $NOT_RUNNING_AS_NPH $USER_FACING_PORT
$HTTP_PROXY $SSL_PROXY $NO_PROXY $PROXY_AUTH $SSL_PROXY_AUTH
$SOCKS_PROXY $SOCKS_USERNAME $SOCKS_PASSWORD
$MINIMIZE_CACHING
$SESSION_COOKIES_ONLY $COOKIE_PATH_FOLLOWS_SPEC $RESPECT_THREE_DOT_RULE
@PROXY_GROUP
$USER_AGENT $USE_PASSIVE_FTP_MODE $SHOW_FTP_WELCOME
$PROXIFY_SCRIPTS $PROXIFY_SWF $ALLOW_RTMP_PROXY $ALLOW_UNPROXIFIED_SCRIPTS
$PROXIFY_COMMENTS
$USE_POST_ON_START $ENCODE_URL_INPUT
$REMOVE_TITLES $NO_BROWSE_THROUGH_SELF $NO_LINK_TO_START $MAX_REQUEST_SIZE
@TRANSMIT_HTML_IN_PARTS_URLS
$QUIETLY_EXIT_PROXY_SESSION
$ALERT_ON_CSP_VIOLATION
$OVERRIDE_SECURITY
@SCRIPT_MIME_TYPES @OTHER_TYPES_TO_REGISTER @TYPES_TO_HANDLE
$NON_TEXT_EXTENSIONS
@RTL_LANG
$PROXY_VERSION

$RUN_METHOD
@MONTH @WEEKDAY %UN_MONTH
%RTL_LANG
@BANNED_NETWORK_ADDRS
$DB_HOSTPORT $DBH $STH_UPD_COOKIE $STH_INS_COOKIE $STH_SEL_COOKIE $STH_SEL_ALL_COOKIES
$STH_DEL_COOKIE $STH_DEL_ALL_COOKIES $STH_UPD_SESSION $STH_INS_SESSION $STH_SEL_IP
$STH_PURGE_SESSIONS $STH_PURGE_COOKIES
$USER_IP_ADDRESS_TEST_H $DESTINATION_SERVER_TEST_H
$RUNNING_ON_IIS
@NO_PROXY
$NO_CACHE_HEADERS
@ALL_TYPES %MIME_TYPE_ID $SCRIPT_TYPE_REGEX $TYPES_TO_HANDLE_REGEX
$THIS_HOST $ENV_SERVER_PORT $ENV_SCRIPT_NAME $THIS_SCRIPT_URL
$SSL_SUPPORTED
$RTMP_SERVER_PORT
%ENV_UNCHANGING $HAS_INITED

%MSG @MSG_KEYS $CUSTOM_INSERTION %IN_CUSTOM_INSERTION

$RE_JS_WHITE_SPACE $RE_JS_LINE_TERMINATOR $RE_JS_COMMENT
$RE_JS_IDENTIFIER_START $RE_JS_IDENTIFIER_PART $RE_JS_IDENTIFIER_NAME
$RE_JS_PUNCTUATOR $RE_JS_DIV_PUNCTUATOR
$RE_JS_NUMERIC_LITERAL $RE_JS_ESCAPE_SEQUENCE
$RE_JS_STRING_LITERAL
$RE_JS_STRING_LITERAL_START $RE_JS_STRING_REMAINDER_1 $RE_JS_STRING_REMAINDER_2
$RE_JS_REGULAR_EXPRESSION_LITERAL
$RE_JS_TOKEN $RE_JS_INPUT_ELEMENT_DIV $RE_JS_INPUT_ELEMENT_REG_EXP
$RE_JS_SKIP $RE_JS_SKIP_NO_LT
%RE_JS_SET_TRAPPED_PROPERTIES %RE_JS_SET_RESERVED_WORDS_NON_EXPRESSION
%RE_JS_SET_ALL_PUNCTUATORS
$JSLIB_BODY $JSLIB_BODY_GZ

$HTTP_VERSION $HTTP_1_X
$URL
$STDIN $STDOUT
$now $session_id $session_id_persistent $session_cookies
$packed_flags $encoded_URL $doing_insert_here $env_accept
$e_remove_cookies $e_remove_scripts $e_filter_ads $e_insert_entry_form
$e_hide_referer
$images_are_banned_here $scripts_are_banned_here $cookies_are_banned_here
$scheme $authority $path $host $port $username $password
$csp $csp_ro $csp_is_supported
$cookie_to_server %auth
$script_url $url_start $url_start_inframe $url_start_noframe $lang $dir
$is_in_frame $expected_type
$base_url $base_scheme $base_host $base_path $base_file $base_unframes
$default_style_type $default_script_type
$status $headers $body $charset $meta_charset $is_html
%in_mini_start_form
$does_write
$swflib $AVM2_BYTECODES
$xhr_origin
$temp_counter
$debug ) ;

#--------------------------------------------------------------------------

user configuration
#--------------------------------------------------------------------------

For certain purposes, CGIProxy may need to create files. This is where
those will go. For example, use "/home/username/cgiproxy", where "username"
is replaced by your username.
This directory has to be readable and writeable by the userID that CGIProxy
runs as; that userID is set in the Web server configuration (if this is running
as a CGI script or under mod_perl), or else it's the userID used to start
the FastCGI server or the embedded server.
This can be either a relative or absolute path. If it's a relative path, it
will be interpreted relative to the home directory of this script file's owner.
If you have root access and can run "./nph-proxy init" as root (which has
advantages), then set this to an absolute path so it doesn't go under
the /root directory.
Note that you need to use "\\" to represent a single backslash.
Leading drive letters (e.g. for Windows) are allowed.
The default will use the directory "cgiproxy" under your home directory (which
varies with your operating system). If it doesn't work, manually set
$PROXY_DIR to an absolute path. You can name it whatever you want.
Also see $RUN_AS_USER, just below. Note that many special users, probably
including your Web server's user, don't have a home directory to put $PROXY_DIR
under. For such a case, you need to set $PROXY_DIR to another directory somewhere
that the Web server's user can read and write.
Note that in Unix or Mac, using a directory on a mounted filesystem (which often
includes home directories) may prevent that filesystem from being unmounted,
which may bother your sysadmin. If so, try setting this to something starting
with "/tmp/", like "/tmp/.your-username/".
If you get "mkdir" permission errors, create the directory yourself with mkdir.
You may also need to "chmod 777 directoryname" to make the directory writable
by the Web server, but note that this makes it readable and writable by
everybody. You might ask your webmaster if they provide a safe way for CGI
scripts to read and write files in your directories. With Apache, the suEXEC
feature is often used to let multiple website owners use the same server
securely: each CGI or mod_perl script is run as the owner of the script file.
$PROXY_DIR= 'cgiproxy' ;

If you have root access and can run "./nph-proxy init" as root, then set this
to either the username or numeric user ID that the script will run as. When
run as a CGI script or under mod_perl, this is usually the Web server's
username, or possibly the script owner's username if using Apache with the
suEXEC feature turned on.
Setting this lets "./nph-proxy init" create the needed directories ($PROXY_DIR
and subdirectories) and a SQLite database file (if using SQLite) with the right
permissions and ownership.
If you run this script as the root user in order to use port 443 with the
embedded server, it's a good idea to change the user ID to something with
fewer permissions. You can also do this by setting $RUN_AS_USER .
In any case, this has to be set to an existing user on the server, i.e. CGIProxy
doesn't create the user if it doesn't already exist.
If this is not set, it will default to the owner of this script file.
Also see $PROXY_DIR, just above. Note that many special users, probably including
your Web server's user, don't have a home directory to put $PROXY_DIR under.
For such a case, you need to set $PROXY_DIR to another directory somewhere that
the Web server's user can read and write.
This probably won't work on Windows, though note that you don't need root
access to use port 443 on Windows.
#$RUN_AS_USER= 'nobody' ;

IMPORTANT: CHANGE THIS IF USING FASTCGI OR THE EMBEDDED SERVER!
If using FastCGI or the embedded server, the path in the URL will begin with a
fixed alphanumeric sequence (string) to help conceal the proxy. You can set
this to any alphanumeric string. The URL of your proxy will be
"https://example.com/secret" (replace "secret" with your actual secret).
If we didn't do this, then a censor could check if a site hosts a proxy by
merely accessing "https://example.com" .
Note that this is not a secret from the users, just from anyone watching
network traffic. Also, it won't be kept secret if your server is insecure.
$SECRET_PATH= 'secret' ;

If you don't have root access on your server, set this so that Perl (CPAN)
modules are installed under your own directory. Be sure to follow the
instructions about the environment variables after you run "./nph-proxy.cgi init".
If this script is not running as your user ID (such as a Web server running
as its own user ID), and you're using the local::lib module, then
set this to the directory where your modules are installed with local::lib .
This is normally just the "perl5" directory under your home directory, unless
you renamed it or configured local::lib to use a different directory.
If you set this before installing modules, then CPAN (Perl) modules will be
installed into this directory.
#$LOCAL_LIB_DIR= '/home/your-username/perl5' ; # this example works for Unix or Mac

If you're running CGIProxy such that the Web server that the user sees is different
from the Web server CGIProxy is running on (though maybe on the same machine),
the SERVER_PORT environment variable might not be set to the port that the
user is connecting to, and so all the generated URLs will have the wrong
port in them. In this case, you can set $USER_FACING_PORT to the port number
that should be in the URLs, i.e. the port that the user connects to.
For example, this would be useful when the user connects to nginx on a server where
nginx then calls an internal Apache process to run this script (perhaps to take
advantage of mod_perl). In such a case, the SERVER_PORT set by Apache will be
the port used for internal nginx-to-Apache communication, not the port the user
connects to nginx with. In this case, you would set $USER_FACING_PORT to the
outward-facing port that nginx listens on.
#$USER_FACING_PORT= 443 ;

#---- FastCGI configuration ---------------------

FastCGI is a mechanism that can speed up CGI-like scripts. It's purely
optional and requires some web server configuration as well, and if you
don't use it you can ignore this section.

FastCGI uses a local Internet socket to communicate between the FastCGI client
(e.g. the web server software) and the FastCGI server (e.g. a CGI script that
has been converted to run as a listening daemon, such as CGIProxy).
Set this to a port number for this script to listen on as a FastCGI script.
You'll need to set it in your HTTP server's configuration file too (e.g. in
httpd.conf or nginx.conf). For details of that, see
http://www.jmarshall.com/tools/cgiproxy/install.html#fastcgi
This used to use a "Unix-domain socket" instead of an Internet socket, but
there was trouble with the FCGI module and Unix-domain sockets, so as of
CGIProxy 2.1.14 we use an Internet socket.
Note that this no longer requires a ":" at the start, though that is allowed.
$FCGI_SOCKET= 8002 ;

FastCGI uses multiple processes to listen on its socket, where each
process can handle one request at a time. This is a performance tuning
parameter, so the optimal number depends on your server environment
(hardware and software).
If you don't understand this, the default should be fine. You can experiment
with different numbers if performance is an issue.
This can be overridden with the "-n" command-line parameter.
$FCGI_NUM_PROCESSES= 100 ;

As a FastCGI process gets used for many requests, it slowly takes more and
more memory, due to the copy-on-write behavior of forked processes. Thus,
it's cleaner if you kill a process and restart a fresh one after it handles
some number of requests. This is a performance tuning parameter, so the
optimal number depends on your server environment (hardware and software).
If you don't understand this, the default should be fine. You can experiment
with different numbers if performance is an issue.
This can be overridden with the "-m" command-line parameter.
$FCGI_MAX_REQUESTS_PER_PROCESS= 1000 ;

#---- End of FastCGI configuration --------------

Much initialization of unchanging values is now in this routine. (Ignore
this if you don't know what it means.)
sub init {

#---- Embedded server configuration -------------

For the embedded server, you need to a) put a certificate and private key,
in PEM format, into the $PROXY_DIR directory, and b) set these two
variables to the two file names. (A "certificate" is the same thing as
a public key.)
You can either pay a certificate authority for a key pair, or you can
generate your own "self-signed" key pair. The disadvantage of using a
self-signed key pair is that your users will see a browser warning about
an untrusted certificate. This is all true of any secure server.
#$CERTIFICATE_FILE= 'plain-cert.pem' ;
#$PRIVATE_KEY_FILE= 'plain-rsa.pem' ;

It's important to use $SECRET_PATH, but you can require a username and
password too. All users must login with whatever you set below, using
HTTP Basic authentication. Leave these commented out to disable
password protection.
This is very simple right now. In the future there will likely be
more authentication methods, including support for multiple users.
#$EMB_USERNAME= 'free' ;
#$EMB_PASSWORD= 'speech' ;

#---- End of embedded server configuration ------

#---- Database configuration --------------------

Database use is optional, and if you don't use one you can ignore this
section. But if you're getting "Bad Request" errors, you can fix it
by using a database; also, see the $USE_DB_FOR_COOKIES option below.

Database use is optional. It's most efficient when this script is running
under mod_perl or FastCGI.
The easiest database to use is SQLite. While normal database engines like
MySQL/MariaDB or Oracle require a constantly running process and some
configuration by the system administrator, SQLite requires none of this--
it reads and writes directly to database files in your own directory, as
protected by the operating system permissions. Because of its ease of
configuration, SQLite is the default database here.
If you're using a database other than SQLite, create a database user account
for this program to use, or ask your database administrator to do it. Set
$DB_USER and $DB_PASS to the username and password, below. This program
will try to create the required database, named $DB_NAME as set below, but
if your DBA isn't willing to grant the permission to create databases to
the CGIProxy user, then you or the DBA will need to create the database.
This can be done with the SQL command "CREATE DATABASE cgiproxy;" (or
whatever you set $DB_NAME to below). #
If you are using a database of any kind, it must be purged periodically. In
Unix or Mac, do this with a cron job. In Windows, use the Task Scheduler.
In Unix or Mac, the command to purge the database is
"/path/to/script/nph-proxy.cgi purge-db". (Replace "/path/to/script/"
with the actual path to the script.) Edit your crontab with "crontab -e",
and add a line like:
"0 * * * * /path/to/script/nph-proxy.cgi purge-db" (without quotes)
to purge the database at the top of every hour, or:
"0 2 * * * /path/to/script/nph-proxy.cgi purge-db" (without quotes)
to purge it every night at 2:00am.

This is the name of the "database driver" for the database software you're using.
Currently supported values are "SQLite", "MySQL" and "Oracle".
The default of "SQLite" is the easiest to use. SQLite lets you have database
functionality by directly reading and writing a database file, without requiring
a full database engine like MySQL/MariaDB or Oracle to run on your server.
Note that it is potentially insecure to use a database if there are other
untrusted people with accounts on the same server, especially if they can read
this script file and the database password below. The easiest way to securely
use a database is to have your own server with no untrusted user having shell
access on it. If this isn't practical, then you need to set file permissions
appropriately on both this script file and any SQLite database file: set
permissions (and file ownership and group ownership) on both files to be
accessible by the web server's userID, but not accessible by anyone else on
the same server. Note that running this on a virtual private server isn't
insecure in this way-- even though a VPS is a shared machine, other people
can't see your files (except the sysadmin).
Set this to "" or comment it out to not use a database. Note that you will
probably see "Bad Request" errors when you accumulate too many cookies; using
a database solves this problem, or you can periodically clear your cookies.
$DB_DRIVER= 'SQLite' ;

If your database (other than SQLite) is running on a remote server, or on a
non-default port, set this to "dbserver:port", where dbserver is the name
or IP address of your database server, and port is the port it is listening
on. If dbserver is empty (as in ":port"), then it defaults to localhost;
if port is empty (as in "dbserver:" or just "dbserver"), then it defaults
to 3306 for MySQL, or 1521 for Oracle.
#$DB_SERVER= "localhost:3306" ;

CGIProxy creates (if possible) and uses its own database. If you want to name
the database something else, change this value. If you need a database
administrator to create the database, tell him or her this database name.
This value must only contain letters, numbers, and the "_" character.
$DB_NAME= 'cgiproxy' ;

These are the username and password of the database account, as described above.
If you're using SQLite, you don't need to set these-- access to the SQLite
database files is controlled by the permissions of the filesystem.
$DB_USER= 'proxy' ;
$DB_PASS= '' ;

If set, then use the server-side database to store cookies. This gets around
the problem of too many total cookies causing "Bad Request" errors.
Set this to 1 to use the database (if it's configured), or to 0 to NOT use
the database.
$USE_DB_FOR_COOKIES= 1 ;

#---- End of database configuration -------------

This is the default language to use for all CGIProxy messages, until the user
clicks on a flag in the start form.
$DEFAULT_LANG= 'en' ;

If set, then proxy traffic will be restricted to text data only, to save
bandwidth (though it can still be circumvented with uuencode, etc.).
To replace images with a 1x1 transparent GIF, set $RETURN_EMPTY_GIF below.
$TEXT_ONLY= 0 ; # set to 1 to allow only text data, 0 to allow all

If set, then prevent all cookies from passing through the proxy. To allow
cookies from some servers, set this to 0 and see @ALLOWED_COOKIE_SERVERS
and @BANNED_COOKIE_SERVERS below. You can also prevent cookies with
images by setting $NO_COOKIE_WITH_IMAGE below.
Note that this only affects cookies from the target server. The proxy
script sends its own cookies for other reasons too, like to support
authentication. This flag does not stop these cookies from being sent.
$REMOVE_COOKIES= 0 ;

If set, then remove as much scripting as possible. If anonymity is
important, this is strongly recommended! Better yet, turn off script
support in your browser.
On the HTTP level:
. prevent transmission of script MIME types (which only works if the server
marks them as such, so a malicious server could get around this, but
then the browser probably wouldn't execute the script).
. remove Link: headers that link to a resource of a script MIME type.
Within HTML resources:
. remove <script>...</script> .
. remove intrinsic event attributes from tags, i.e. attributes whose names
begin with "on".
. remove <style>...</style> where "type" attribute is a script MIME type.
. remove various HTML tags that appear to link to a script MIME type.
. remove script macros (aka Netscape-specific "JavaScript entities"),
i.e. any attributes containing the string "&{" .
. remove "JavaScript conditional comments".
. remove MSIE-specific "dynamic properties".
To allow scripts from some sites but not from others, set this to 0 and
see @ALLOWED_SCRIPT_SERVERS and @BANNED_SCRIPT_SERVERS below.
See @SCRIPT_MIME_TYPES below for a list of which MIME types are filtered out.
I do NOT know for certain that this removes all script content! It removes
all that I know of, but I don't have a definitive list of places scripts
can exist. If you do, please send it to me. EVEN RUNNING A SINGLE
JAVASCRIPT STATEMENT CAN COMPROMISE YOUR ANONYMITY! Just so you know.
Richard Smith has a good test site for anonymizing proxies, at
http://users.rcn.com/rms2000/anon/test.htm
Note that turning this on removes most popup ads! :)
$REMOVE_SCRIPTS= 0 ;

If set, then filter out images that match one of @BANNED_IMAGE_URL_PATTERNS,
below. Also removes cookies attached to images, as if $NO_COOKIE_WITH_IMAGE
is set.
To remove most popup advertisements, also set $REMOVE_SCRIPTS=1 above.
$FILTER_ADS= 0 ;

If set, then don't send a Referer: [sic] header with each request
(i.e. something that tells the server which page you're coming from
that linked to it). This is a minor privacy issue, but a few sites
won't send you pages or images if the Referer: is not what they're
expecting. If a page is loading without images or a link seems to be
refused, then try turning this off, and a correct Referer: header will
be sent.
This is only a problem in a VERY small percentage of sites, so few that
I'm kinda hesitant to put this in the entry form. Other arrangements
have their own problems, though.
$HIDE_REFERER= 0 ;

If set, insert a compact version of the URL entry form at the top of each
page. This will also display the URL currently being viewed.
When viewing a page with frames, then a new top frame is created and the
insertion goes there.
If you want to customize the appearance of the form, modify the routine
mini_start_form() near the end of the script.
If you want to insert something other than this form, see $INSERT_HTML and
$INSERT_FILE below.
Users should realize that options changed via the form only take affect when
the form is submitted by entering a new URL or pressing the "Go" button.
Selecting an option, then following a link on the page, will not cause
the option to take effect.
Users should also realize that anything inserted into a page may throw
off any precise layout. The insertion will also be subject to
background colors and images, and any other page-wide settings.
$INSERT_ENTRY_FORM= 1 ;

If set, then allow the user to control $REMOVE_COOKIES, $REMOVE_SCRIPTS,
$FILTER_ADS, $HIDE_REFERER, and $INSERT_ENTRY_FORM. Note that they
can't fine-tune any related options, such as the various @ALLOWED... and
@BANNED... lists.
$ALLOW_USER_CONFIG= 1 ;

If you want to encode the URLs of visited pages so that they don't show
up within the full URL in your browser bar, then use proxy_encode() and
proxy_decode(). These are Perl routines that transform the way the
destination URL is included in the full URL. You can either use
some combination of the example encodings below, or you can program your
own routines. The encoded form of URLs should only contain characters
that are legal in PATH_INFO. This varies by server, but using only
printable chars and no "?" or "#" works on most servers. Don't let
PATH_INFO contain the strings "./", "/.", "../", or "/..", or else it
may get compressed like a pathname somewhere. Try not to make the
resulting string too long, either.
Of course, proxy_decode() must exactly undo whatever proxy_encode() does.
Make proxy_encode() as fast as possible-- it's a bottleneck for the whole
program. The speed of proxy_decode() is not as important.
If you're not a Perl programmer, you can use the example encodings that are
commented out, i.e. the lines beginning with "#". To use them, merely
uncomment them, i.e. remove the "#" at the start of the line. If you
uncomment a line in proxy_encode(), you MUST uncomment the corresponding
line in proxy_decode() (note that "corresponding lines" in
proxy_decode() are in reverse order of those in proxy_encode()). You
can use one, two, or all three encodings at the same time, as long as
the correct lines are uncommented.
Starting in version 2.1beta9, don't call these functions directly. Rather,
call wrap_proxy_encode() and wrap_proxy_decode() instead, which handle
certain details that you shouldn't have to worry about in these functions.
IMPORTANT: If you modify these routines, and if $PROXIFY_SCRIPTS is set
below (on by default), then you MUST modify $ENCODE_DECODE_BLOCK_IN_JS
below!! (You'll need to write corresponding routines in JavaScript to do
the same as these routines in Perl, used when proxifying JavaScript.)
Because of the simplified absolute URL resolution in full_url(), there may
be ".." segments in the default encoding here, notably in the first path
segment. Normally, that's just an HTML mistake, but please tell me if
you see any privacy exploit with it.
Note that a few sites have embedded applications (like applets or Shockwave)
that expect to access URLs relative to the page's URL. This means they
may not work if the encoded target URL can't be treated like a base URL,
e.g. that it can't be appended with something like "../data/foo.data"
to get that expected data file. In such cases, the default encoding below
should let these sites work fine, as should any other encoding that can
support URLs relative to it.

sub proxy_encode {
my($URL)= @_ ;
$URL=~ s#^([\w+.-]+)://#$1/# ; # http://xxx -> http/xxx

$URL=~ s/(.)/ sprintf('%02x',ord($1)) /ge ; # each char -> 2-hex
$URL=~ tr/a-zA-Z/n-za-mN-ZA-M/ ; # rot-13

return $URL ;
}

sub proxy_decode {
my($enc_URL)= @_ ;

$enc_URL=~ tr/a-zA-Z/n-za-mN-ZA-M/ ; # rot-13
$enc_URL=~ s/([\da-fA-F]{2})/ sprintf("%c",hex($1)) /ge ;
$enc_URL=~ s#^([\w+.-]+)/#$1://# ; # http/xxx -> http://xxx
return $enc_URL ;
}

Encode cookies before they're sent back to the user.
The return value must only contain characters that are legal in cookie
names and values, i.e. only printable characters, and no ";", ",", "=",
or white space.
cookie_encode() is called twice for each cookie: once to encode the cookie
name, and once to encode the cookie value. The two are then joined with
"=" and sent to the user.
cookie_decode() must exactly undo whatever cookie_encode() does.
Also, cookie_encode() must always encode a given input string into the
same output string. This is because browsers need the cookie name to
identify and manage a cookie, so the name must be consistent.
This is not a bottleneck like proxy_encode() is, so speed is not critical.
IMPORTANT: If you modify these routines, and if $PROXIFY_SCRIPTS is set
below (on by default), then you MUST modify $ENCODE_DECODE_BLOCK_IN_JS
below!! (You'll need to write corresponding routines in JavaScript to do
the same as these routines in Perl, used when proxifying JavaScript.)

sub cookie_encode {
my($cookie)= @_ ;

$cookie=~ s/(.)/ sprintf('%02x',ord($1)) /ge ; # each char -> 2-hex
$cookie=~ tr/a-zA-Z/n-za-mN-ZA-M/ ; # rot-13
$cookie=~ s/(\W)/ '%' . sprintf('%02x',ord($1)) /ge ; # simple URL-encoding
return $cookie ;
}

sub cookie_decode {
my($enc_cookie)= @_ ;
$enc_cookie=~ s/%([\da-fA-F]{2})/ pack('C', hex($1)) /ge ; # URL-decode

$enc_cookie=~ tr/a-zA-Z/n-za-mN-ZA-M/ ; # rot-13
$enc_cookie=~ s/([\da-fA-F]{2})/ sprintf("%c",hex($1)) /ge ;
return $enc_cookie ;
}

If $PROXIFY_SCRIPTS is true, and if you modify the routines above that
encode cookies and URLs, then you need to modify $ENCODE_DECODE_BLOCK_IN_JS
here. Explanation: When proxifying JavaScript, a library of JavaScript
functions is used. In that library are a few JavaScript routines that do
the same as their Perl counterparts in this script. Four of those routines
are proxy_encode(), proxy_decode(), cookie_encode(), and cookie_decode().
Thus, unfortunately, when you write your own versions of those Perl routines
(or modify what's already there), you also need to write (or modify) these
corresponding JavaScript routines to do the same thing. Put the routines in
this long variable $ENCODE_DECODE_BLOCK_IN_JS, and it will be included in
the JavaScript library when needed. Prefix the function names with
"_proxy_jslib_", as below.
The commented examples in the JavaScript routines below correspond exactly to
the commented examples in the Perl routines above. Thus, if you modify the
Perl routines by merely uncommenting the examples, you can do the same in
these JavaScript routines. (JavaScript comments begin with "//".)
[If you don't know Perl: Note that everything up until the line "EOB" is one
long string value, called a "here document". $ENCODE_DECODE_BLOCK_IN_JS is
set to the whole thing.]

$ENCODE_DECODE_BLOCK_IN_JS= <<'EOB' ;

function _proxy_jslib_proxy_encode(URL) {
URL= URL.replace(/^([\w\+\.\-]+)\:\/\//, '$1/') ;
// URL= URL.replace(/(.)/g, function (s,p1) { return p1.charCodeAt(0).toString(16) } ) ;
// URL= URL.replace(/([a-mA-M])|[n-zN-Z]/g, function (s,p1) { return String.fromCharCode(s.charCodeAt(0)+(p1?13:-13)) }) ;

return URL ;
}

EOB

Use @ALLOWED_SERVERS and @BANNED_SERVERS to restrict which servers a user
can visit through this proxy. Any URL at a host matching a pattern in
@BANNED_SERVERS will be forbidden. In addition, if @ALLOWED_SERVERS is
not empty, then access is allowed only to servers that match a pattern
in it. In other words, @BANNED_SERVERS means "ban these servers", and
@ALLOWED_SERVERS (if not empty) means "allow only these servers". If a
server matches both lists, it is banned.
These are each a list of Perl 5 regular expressions (aka patterns or
regexes), not literal host names. To turn a hostname into a pattern,
replace every "." with "\.", add "^" to the beginning, and add "$" to the
end. For example, 'www.example.com' becomes '^www\.example\.com$'. To
match every host ending in something, leave out the "^". For example,
'\.example\.com$' matches every host ending in ".example.com". For more
details about Perl regular expressions, see the Perl documentation. (They
may seem cryptic at first, but they're very powerful once you know how to
use them.)
Note: Use single quotes around each pattern, not double qoutes, unless you
understand the difference between the two in Perl. Otherwise, characters
like "$" and "\" may not be handled the way you expect.
@ALLOWED_SERVERS= () ;
@BANNED_SERVERS= () ;

If @BANNED_NETWORKS is set, then forbid access to these hosts or networks.
This is done by IP address, not name, so it provides more certain security
than @BANNED_SERVERS above.
Specify each element as a decimal IP address-- all four integers for a host,
or one to three integers for a network. For example, '127.0.0.1' bans
access to the local host, and '192.168' bans access to all IP addresses
in the 192.168 network. Sorry, no banning yet for subnets other than
8, 16, or 24 bits.
IF YOU'RE RUNNING THIS ON OR INSIDE A FIREWALL, THIS SETTING IS STRONGLY
RECOMMENDED!! In particular, you should ban access to other machines
inside the firewall that the firewall machine itself may have access to.
Otherwise, external users will be able to access any internal hosts that
the firewall can access. Even if that's what you intend, you should ban
access to any hosts that you don't explicitly want to expose to outside
users.
In addition to the recommended defaults below, add all IP addresses of your
server machine if you want to protect it like this.
If you're using this with another proxy on the same machine (like a SOCKS
proxy), you'll need to remove the '127' item below. But see the comments
above $SOCKS_PROXY, below, for a warning.
After you set this, YOU SHOULD TEST to verify that the proxy can't access
the IP addresses you're banning!
NOTE: According to RFC 1918, network address ranges reserved for private
networks are 10.x.x.x, 192.168.x.x, and 172.16.x.x-172.31.x.x, i.e. with
respective subnet masks of 8, 16, and 12 bits. Since we can't currently
do a 12-bit mask, we'll exclude the entire 172 network here. If this
causes a problem, let me know and I'll add subnet masks down to 1-bit
resolution.
Also included are 169.254.x.x (per RFC 3927) and 244.0.0.x (used for
routing), as recommended by Waldo Jaquith.
On some systems, 127.x.x.x all point to localhost, so disallow all of "127".
This feature is simple now but may be more complete in future releases.
How would you like this to be extended? What would be useful to you?
@BANNED_NETWORKS= ('127', '192.168', '172', '10', '169.254', '244.0.0') ;

Settings to fine-tune cookie filtering, if cookies are not banned altogether
(by user checkbox or $REMOVE_COOKIES above).
Use @ALLOWED_COOKIE_SERVERS and @BANNED_COOKIE_SERVERS to restrict which
servers can send cookies through this proxy. They work like
@ALLOWED_SERVERS and @BANNED_SERVERS above, both in how their precedence
works, and that they're lists of Perl 5 regular expressions. See the
comments there for details.

If non-empty, only allow cookies from servers matching one of these patterns.
Comment this out to allow all cookies (subject to @BANNED_COOKIE_SERVERS).
#@ALLOWED_COOKIE_SERVERS= ('\bslashdot\.org$') ;

Reject cookies from servers matching these patterns.
@BANNED_COOKIE_SERVERS= (
'\.doubleclick\.net$',
'\.preferences\.com$',
'\.imgis\.com$',
'\.adforce\.com$',
'\.focalink\.com$',
'\.flycast\.com$',
'\.avenuea\.com$',
'\.linkexchange\.com$',
'\.pathfinder\.com$',
'\.burstnet\.com$',
'\btripod\.com$',
'\bgeocities\.yahoo\.com$',
'\.mediaplex\.com$',
) ;

Set this to reject cookies returned with images. This actually prevents
cookies returned with any non-text resource.
This helps prevent tracking by ad networks, but there are also some
legitimate uses of attaching cookies to images, such as captcha, so
by default this is off.
$NO_COOKIE_WITH_IMAGE= 0 ;

Settings to fine-tune script filtering, if scripts are not banned altogether
(by user checkbox or $REMOVE_SCRIPTS above).
Use @ALLOWED_SCRIPT_SERVERS and @BANNED_SCRIPT_SERVERS to restrict which
servers you'll allow scripts from. They work like @ALLOWED_SERVERS and
@BANNED_SERVERS above, both in how their precedence works, and that
they're lists of Perl 5 regular expressions. See the comments there for
details.
@ALLOWED_SCRIPT_SERVERS= () ;
@BANNED_SCRIPT_SERVERS= () ;

Various options to help filter ads and stop cookie-based privacy invasion.
These are only effective if $FILTER_ADS is set above.
@BANNED_IMAGE_URL_PATTERNS uses Perl patterns. If an image's URL
matches one of the patterns, it will not be downloaded (typically for
ad-filtering). For more information on Perl regular expressions, see
the Perl documentation.
Note that most popup ads will be removed if scripts are removed (see
$REMOVE_SCRIPTS above).
If ad-filtering is your primary motive, consider using one of the many
proxies that specialize in that. The classic is from JunkBusters, at
http://www.junkbusters.com .

Reject images whose URL matches any of these patterns. This is just a
sample list; add more depending on which sites you visit.
@BANNED_IMAGE_URL_PATTERNS= (
'ad\.doubleclick\.net/ad/',
'\b[a-z](\d+)?\.doubleclick\.net(:\d*)?/',
'\.imgis\.com\b',
'\.adforce\.com\b',
'\.avenuea\.com\b',
'\.go\.com(:\d*)?/ad/',
'\.eimg\.com\b',
'\bexcite\.netscape\.com(:\d*)?/.*/promo/',
'/excitenetscapepromos/',
'\.yimg\.com(:\d*)?.*/promo/',
'\bus\.yimg\.com/[a-z]/(\w\w)/\1',
'\bus\.yimg\.com/[a-z]/\d-/',
'\bpromotions\.yahoo\.com(:\d*)?/promotions/',
'\bcnn\.com(:\d*)?/ads/',
'ads\.msn\.com\b',
'\blinkexchange\.com\b',
'\badknowledge\.com\b',
'/SmartBanner/',
'\bdeja\.com/ads/',
'\bimage\.pathfinder\.com/sponsors',
'ads\.tripod\.com',
'ar\.atwola\.com/image/',
'\brealcities\.com/ads/',
'\bnytimes\.com/ad[sx]/',
'\busatoday\.com/sponsors/',
'\busatoday\.com/RealMedia/ads/',
'\bmsads\.net/ads/',
'\bmediaplex\.com/ads/',
'\batdmt\.com/[a-z]/',
'\bview\.atdmt\.com/',
'\bADSAdClient31\.dll\b',
) ;

If set, replace banned images with 1x1 transparent GIF. This also replaces
all images with the same if $TEXT_ONLY is set.
Note that setting this makes the response a little slower, since the browser
must still retrieve the empty GIF.
$RETURN_EMPTY_GIF= 0 ;

To use an external program to decide whether or not a user at a given IP
address may use this proxy (as opposed to using server configuration), set
$USER_IP_ADDRESS_TEST to either the name of a command-line program that
performs this test, or a queryable URL that performs this test (e.g. a CGI
script).
For a command-line program: The program should take a single argument, the
IP address of the user. The output of the program is evaluated as a
number, and if the number is non-zero then the IP address of the user is
allowed; thus, the output is typically either "1" or "0". Note that
depending on $ENV{PATH}, you may need to enter the path here explicitly.
For a queryable URL: Specify the start of the URL here (must begin with
"http://"), and the user's IP address will be appended. For example, the
value here may contain a "?", thus putting the IP address in the
QUERY_STRING; it could also be in PATH_INFO. The response body from the
URL should be a number like for a command line program, above.
$USER_IP_ADDRESS_TEST= '' ;

To use an external program to decide whether or not a destination server is
allowed (as opposed to using @ALLOWED_SERVERS and @BANNED_SERVERS above),
set $DESTINATION_SERVER_TEST to either the name of a command-line program
that performs this test, or a queryable URL that performs this test (e.g. a
CGI script).
For a command-line program: The program should take a single argument, the
destination server's name or IP address (depending on how the user enters
it). The output of the program is evaluated as a number, and if the number
is non-zero then the destination server is allowed; thus, the output is
typically either "1" or "0". Note that depending on $ENV{PATH}, you may
need to enter the path here explicitly.
For a queryable URL: Specify the start of the URL here (must begin with
"http://"), and the destination server's name or IP address will be
appended. For example, the value here may contain a "?", thus putting the
name or address in the QUERY_STRING; it could also be in PATH_INFO. The
response body from the URL should be a number like for a command line
program, above.
$DESTINATION_SERVER_TEST= '' ;

If either $INSERT_HTML or $INSERT_FILE is set, then that HTML text or the
contents of that named file (respectively) will be inserted into any HTML
page retrieved through this proxy. $INSERT_HTML takes precedence over
$INSERT_FILE. $INSERT_FILE is assumed to have contents in UTF-8.
When viewing a page with frames, a new top frame is created and the
insertions go there.
NOTE: Any HTML you insert should not have relative URLs in it! The problem
is that there is no appropriate base URL to resolve them with. So only use
absolute URLs in your insertion. (If you use relative URLs anyway, then
a) if $ANONYMIZE_INSERTION is set, they'll be resolved relative to this
script's URL, which isn't great, or b) if $ANONYMIZE_INSERTION==0,
they'll be unchanged and the browser will simply resolve them relative
to the current page, which is usually worse.)
The frame handling means that it's fairly easy for a surfer to bypass this
insertion, by pretending in effect to be in a frame. There's not much we
can do about that, since a page is retrieved the same way regardless of
whether it's in a frame. This script uses a parameter in the URL to
communicate to itself between calls, but the user can merely change that
URL to make the script think it's retrieving a page for a frame. Also,
many browsers let the user expand a frame's contents into a full window.
[The warning in earlier versions about setting $INSERT_HTML to '' when using
mod_perl and $INSERT_FILE no longer applies. It's all handled elsewhere.]
As with $INSERT_ENTRY_FORM, note that any insertion may throw off any
precise layout, and the insertion is subject to background colors and
other page-wide settings.

#$INSERT_HTML= "<h1>This is an inserted header</h1><hr>" ;
#$INSERT_FILE= 'insert_file_name' ;

If your insertion has links that you don't want anonymized along with the rest
of the downloaded HTML, then set this to 0. Otherwise leave it at 1.
$ANONYMIZE_INSERTION= 1 ;

If there's both a URL entry form and an insertion via $INSERT_HTML or
$INSERT_FILE on the same page, the entry form normally goes at the top.
Set this to put it after the other insertion.
$FORM_AFTER_INSERTION= 0 ;

If the insertion is put in a top frame, then this is how many pixels high
the frame is. If the default of 80 or 50 pixels is too big or too small
for your insertion, change this. You can use percentage of screen height
if you prefer, e.g. "20%". (Unfortunately, you can't just tell the
browser to "make it as high as it needs to be", but at least the frame
will be resizable by the user.)
This affects insertions by $INSERT_ENTRY_FORM, $INSERT_HTML, and $INSERT_FILE.
The default here usually works for the inserted entry form, which varies in
size depending on $ALLOW_USER_CONFIG. It also varies by browser.
$INSERTION_FRAME_HEIGHT= $ALLOW_USER_CONFIG ? 80 : 50 ;

NOTE THAT YOU SHOULD BE RUNNING CGIPROXY ON A SECURE SERVER!
Note also that the meaning of '' has changed-- now, all ports except 80
are assumed to be using SSL.
Set this to 1 if the script is running on an SSL server, i.e. it is
accessed through a URL starting with "https:"; set this to 0 if it's not
running on an SSL server. This is needed to know how to route URLs back
through the proxy. Regrettably, standard CGI does not yet provide a way
for scripts to determine this without help.
If this variable is set to '' or left undefined, then the program will
guess: SSL is assumed if SERVER_PORT is not 80. This fails when using
an insecure server on a port other than 80, or (less commonly) an SSL server
uses port 80, but usually it works. Besides being a good default, it lets
you install the script where both a secure server and a non-secure server
will serve it, and it will work correctly through either server.
This has nothing to do with retrieving pages that are on SSL servers.
$RUNNING_ON_SSL_SERVER= '' ;

If your server doesn't support NPH scripts, then set this variable to true
and try running the script as a normal non-NPH script. HOWEVER, this
won't work as well as running it as NPH; there may be bugs, maybe some
privacy holes, and results may not be consistent. It's a hack.
Try to install the script as NPH before you use this option, because
this may not work. NPH is supported on almost all servers, and it's
usually very easy to install a script as NPH (on Apache, for example,
you just need to name the script something starting with "nph-").
One example of a problem is that Location: headers may get messed up,
because they mean different things in an NPH and a non-NPH script.
You have been warned.
For this to work, your server MUST support the "Status:" CGI response
header.
$NOT_RUNNING_AS_NPH= 0 ;

Set HTTP and SSL proxies if needed. Also see $USE_PASSIVE_FTP_MODE below.
The format of the first two variables is "host:port", with the port being
optional. The format of $NO_PROXY is a comma-separated list of hostnames
or domains: any request for a hostname that ends in one of the strings in
$NO_PROXY will not use the HTTP or SSL proxy; e.g. use ".mycompany.com" to
avoid using the proxies to access any host in the mycompany.com domain.
The environment variables in the examples below are appropriate defaults,
if they are available. Note that earlier versions of this script used
the environment variables directly, instead of the $HTTP_PROXY and
$NO_PROXY variables we use now.
Sometimes you can use the same proxy (like Squid) for both SSL and normal
HTTP, in which case $HTTP_PROXY and $SSL_PROXY will be the same.
$NO_PROXY applies to both SSL and normal HTTP proxying, which is usually
appropriate. If there's demand to differentiate those, it wouldn't be
hard to make a separate $SSL_NO_PROXY option.
#$HTTP_PROXY= $ENV{'http_proxy'} ;
#$SSL_PROXY= 'firewall.example.com:3128' ;
#$NO_PROXY= $ENV{'no_proxy'} ;

If your HTTP and SSL proxies require authentication, this script supports
that in a limited way: you can have a single username/password pair per
proxy to authenticate with, regardless of realm. In other words, multiple
realms aren't supported for proxy authentication (though they are for
normal server authentication, elsewhere).
Set $PROXY_AUTH and $SSL_PROXY_AUTH either in the form of "username:password",
or to the actual base64 string that gets sent in the Proxy-Authorization:
header. Often the two variables will be the same, when the same proxy is
used for both SSL and normal HTTP.
#$PROXY_AUTH= 'Aladdin:open sesame' ;
#$SSL_PROXY_AUTH= $PROXY_AUTH ;

Set SOCKS proxy if needed. The format of $SOCKS_PROXY is "host:port", with
the port being optional (defaults to 1080).
If your SOCKS proxy supports username/password authentication, then set
the username and password below.
Also see @BANNED_NETWORKS above-- you'll need to remove the '127' from the
default list if you use a SOCKS proxy on the machine where this is running,
such as with the example here.
NOTE THAT THE CONNECTION BETWEEN THIS SCRIPT AND YOUR SOCKS PROXY MUST BE
TRUSTED, BECAUSE CURRENTLY ALL DATA IS SENT IN THE CLEAR BETWEEN THEM!
In particular, the username and password below will be sent in the clear.
The solution would be to use the GSSAPI authentication method, which many
SOCKS proxies do not support, and which CGIProxy doesn't support yet either.
#$SOCKS_PROXY= 'localhost:1080' ;
#$SOCKS_USERNAME= '' ;
#$SOCKS_PASSWORD= '' ;

This is one way to handle pages that don't work well, by redirecting to other working
versions of the pages (for example, to a mobile version or another version that
doesn't have much JavaScript). How it works: If the current domain matches one
of the keys of %REDIRECTS, then s/// (string substitution) is done on the URL,
using the match and replacement patterns in the 2-element value array.
The set of sites handled this way is Facebook and Gmail, since they doesn't
always work well, or are slow, through CGIProxy. If you want to access
them normally, then comment out or remove the line(s) below for that site.
If you want to redirect more sites, you can add records to the %REDIRECTS
hash in the following way: Set the hash key to the name of the server you
want to redirect, and the value to a reference to a 2-element array containing
the left and right sides of an s/// string substitution. If that doesn't make
sense, then try to emulate an example below.
As of version 2.1.7, the full facebook.com site works pretty well, so the
redirection below has been commented out.
... aaaand, as of version 2.1.8, the full Gmail site works pretty well, so the
redirection below has been commented out.
To improve performance with facebook or other JS-busy sites, users can:
- close other browser windows
- end other CPU-heavy processes on their browsing machine
- reload the page or restart the browser when it gets too slow
- use a browser other than MSIE (it has the most problems)
If Gmail or facebook is still too slow or crashes a lot, you can remove the
leading "#" on the appropriate lines below to automatically redirect to
Gmail's HTML-only site or facebook's mobile site, which may work better.
%REDIRECTS= (
'www.facebook.com' => [qr#^https?://www\.facebook\.com#i, 'https://m.facebook.com'],
'mail.google.com' => [qr#^https?://mail\.google\.com/.*shva=\w*1.*$#i, 'https://mail.google.com/?ui=html']
) ;

Some JavaScript-busy sites crash when visiting them through CGIProxy. Increasing
the delay times in Window.setTimeout() and Window.setInterval() m

Related issues

Copy

Also available in: Atom PDF

Developer Libraries » Trusted Intents » Checkey

Issues

Custom queries

Bug #7763

nothing works