Bug #7763
nothing works
Status: | New | Start date: | 07/17/2016 | |
---|---|---|---|---|
Priority: | Normal | Due date: | ||
Assignee: | - | % Done: | 0% | |
Category: | - | |||
Target version: | - | |||
Component: |
Description
- CGIProxy 2.1.17 #
- CGIProxy (nph-proxy.cgi): a proxy in the form of a CGI script.
- Retrieves the resource at any HTTP or FTP URL, updating embedded URLs
- in HTML and all other resources to point back through this script. By
- default, no user info is sent to the server. Options include
- text-only proxying to save bandwidth, cookie filtering, ad filtering,
- script removal, user-defined encoding of the target URL, and much more.
- Besides running as a CGI script, can also run under mod_perl, as a
- FastCGI script, or can use its own embedded HTTP server.
- Requires Perl 5. #
- Copyright (C) 1996, 1998-2016 by James Marshall, james@jmarshall.com
- All rights reserved. Free for non-commercial use; commercial use
- requires a license. #
- For the latest, see https://jmarshall.com/tools/cgiproxy/ # #
- IMPORTANT NOTE ABOUT ANONYMOUS BROWSING: #
- CGIProxy was originally made for indirect browsing more than
- anonymity, but since people are using it for anonymity, I've tried
- to make it as anonymous as possible. Suggestions welcome. For best
- anonymity, browse with JavaScript turned off. That said, please notify
- me if you find any privacy holes, even when using JavaScript.
- Anonymity is good, but may not be bulletproof. For example, if even
- a single unchecked JavaScript statement can be run, your anonymity
- can be compromised. I've tried to handle JS in every place it can
- exist, but please tell me if I missed any. Also, browser plugins
- or other executable extensions may be able to reveal you to a server.
- Also, be aware that this script doesn't modify PDF files or other
- third-party document formats that may contain linking ability, so
- you will lose your anonymity if you follow links in such files.
- If you find any other way your anonymity can be compromised, please let
- me know. # #
- INSTALLATION: #
- First, edit this file (nph-proxy.cgi) to configure it-- see the CONFIGURATION
- section just below for certain options that may be required. All
- configuration variables are set in the "user configuration" section starting
- around line 338.
- After copying nph-proxy.cgi to your server, run "./nph-proxy.cgi init"
- from the server command line (on Windows, run "perl nph-proxy.cgi init").
- This creates needed directories, installs all optional Perl (CPAN) modules,
- and creates the database that CGIProxy uses. Ignore the scrolling text,
- and hit <return> if asked any questions. Ideally you can run this command
- as root to set file permissions and ownership optimally, but even if run as
- non-root these will be handled as well as possible and the script should
- still work.
- To see a simple usage message, run "./nph-proxy.cgi -?".
- It's fine to rename this file, as long as your Web server is set up to
- recognize it. All of the documentation refers to "nph-proxy.cgi",
- but replace that with whatever you renamed the file to. #
- For complete installation instructions, see
- https://jmarshall.com/tools/cgiproxy/install.html # #
- CONFIGURATION: #
- . Set $PROXY_DIR and $RUN_AS_USER -- see the comments above those settings
- for details.
- . If you don't have root access on your server, set $LOCAL_LIB_DIR so that
- the Perl (CPAN) modules can be installed under your own directory. Do
- this before running "./nph-proxy.cgi init", as described above.
- . If you're using either a MySQL/MariaDB or Oracle database to store cookies,
- you need to set $DB_DRIVER, $DB_USER, $DB_PASS, and possibly $DB_SERVER .
- See the notes by those settings for more details. Note that you need to
- purge the database periodically by running "./nph-proxy.cgi purge-db",
- with a cron job on Unix or Mac, or with the Task Scheduler in Windows.
- The default database driver is SQLite, which doesn't need a username or
- password or even a running database engine, but still requires periodic
- purging.
- . If you're using another HTTP or SSL proxy, set $HTTP_PROXY,
- $SSL_PROXY, and $NO_PROXY as needed. If those proxies use
- authentication, set $PROXY_AUTH and $SSL_PROXY_AUTH accordingly.
- . If you're using a SOCKS proxy, set $SOCKS_PROXY and possibly
- $SOCKS_USERNAME and $SOCKS_PASSWORD .
- . If this is running on an insecure server that doesn't use port 80, set
- $RUNNING_ON_SSL_SERVER=0 (otherwise, the default of '' is fine).
- . If you plan to run CGIProxy as a FastCGI script, set at least
- $SECRET_PATH and see the configuration section "FastCGI configuration".
- . If you plan to run CGIProxy using its own embedded server, set
- $SECRET_PATH and see the configuration section "Embedded server configuration".
- You'll also need a certificate and private key (key pair) in PEM
- format.
- . See http://www.jmarshall.com/tools/cgiproxy/options.html#env , in the section
- "OPTIONS RELATED TO YOUR SERVER/NETWORK ENVIRONMENT", for other options
- you may need to set. #
- Other options include:
- . Set $TEXT_ONLY, $REMOVE_COOKIES, $REMOVE_SCRIPTS, $FILTER_ADS,
- $HIDE_REFERER, and $INSERT_ENTRY_FORM as desired. Set
- $REMOVE_SCRIPTS if anonymity is important.
- . To let the user choose all of those settings (except $TEXT_ONLY),
- set $ALLOW_USER_CONFIG=1.
- . To change the encoding format of the URL, modify the
- proxy_encode() and proxy_decode() routines. The default
- routines are suitable for simple PATH_INFO compliance.
- . To encode cookies, modify the cookie_encode() and cookie_decode()
- routines.
- . You can restrict which servers this proxy will access, with
- @ALLOWED_SERVERS and @BANNED_SERVERS.
- . Similarly, you can specify allowed and denied server lists for
- both cookies and scripts.
- . For security, you can ban access to private IP ranges, with
- @BANNED_NETWORKS.
- . If filtering ads, you can customize this with a few settings.
- . To insert your own block of HTML into each page, set $INSERT_HTML
- or $INSERT_FILE.
- . As a last resort, if you really can't run this script as NPH,
- you can try to run it as non-NPH by setting $NOT_RUNNING_AS_NPH=1.
- BUT, read the notes and warnings above that line. Caveat surfor.
- . For crude load-balancing among a set of proxies, set @PROXY_GROUP.
- . Other config is possible; see the user configuration section.
- . If heavy use of this proxy puts a load on your server, see the
- "NOTES ON PERFORMANCE" section below. #
- For more info, read the comments above any config options you set. #
- For a full list of options, see https://jmarshall.com/tools/cgiproxy/options.html #
- This script MUST be installed as a non-parsed header (NPH) script.
- In Apache and many other servers, this is done by simply starting the
- filename with "nph-". It MAY be possible to fake it as a non-NPH
- script, MOST of the time, by using the $NOT_RUNNING_AS_NPH feature.
- This is not advised. See the comments by that option for warnings. # #
- TO USE:
- Start a browsing session by visiting the script's URL with no parameters.
- You can bookmark pages you browse to through the proxy, or link to
- the URLs that are generated. # #
- NOTES ON PERFORMANCE:
- Unfortunately, this has gotten slower through the versions, mostly
- because of optional new features. Configured equally, version 1.3
- takes 25% longer to run than 1.0 or 1.1 (based on cough highly
- abbreviated testing). Compiling takes about 50% longer.
- Leaving $REMOVE_SCRIPTS=1 adds 25-50% to the running time.
- Remember that we're talking about tenths of a second here. Most of
- the delay experienced by the user is from waiting on two network
- connections. These performance issues only matter if your server
- CPU is getting overloaded. Also, these mostly matter when retrieving
- JavaScript and Flash, because modifying those is what takes most of the
- time.
- If you can, use mod_perl. Starting with version 1.3.1, this should
- work under mod_perl, which requires Perl 5.004 or later. If you use
- mod_perl, be careful to install this as an NPH script, i.e. set the
- "PerlSendHeader Off" configuration directive (or "PerlOptions -ParseHeaders"
- if using mod_perl 2.x). For more info, see the mod_perl documentation.
- If you can't use mod_perl, try using FastCGI. Configure the section
- "FastCGI configuration" below, and run nph-proxy.cgi from the command
- line to see a usage message. You'll also need to configure your
- Web server to use FastCGI.
- If you can't use mod_perl or FastCGI, try running CGIProxy as its own
- embedded server. Configure the section "Embedded server configuration",
- and run nph-proxy.cgi from the command line to see a usage message.
- You'll also need a key pair (certificate and private key).
- If you use mod_perl, FastCGI, or the embedded server, and modify this
- script, see the note near the "reset 'a-z'" line below, regarding
- UPPER_CASE and lower_case variable names. #
- If performance on the browser is bad for JS-heavy sites like facebook,
- then close other browser windows and other CPU-heavy processes, and
- see the comments above the setting of %REDIRECTS below. Also, try
- using a browser other than MSIE-- it seems to have the most problems. # #
- TO DO:
- What I want to hear about:
- . Any HTML tags not being converted here.
- . Any method of introducing JavaScript or other script, that's not
- being handled here.
- . Any script MIME types other than those already in @SCRIPT_MIME_TYPES.
- . Any MIME types other than text/html that have links that need to
- be converted.
- plug any other script holes (e.g. MSIE-proprietary, other MIME types?)
- more error checking?
- find a simple encryption technique for proxy_encode()
- For ad filtering, add option to disable images from servers other than
- that of the containing HTML page? Is it worth it? # #
- BUGS:
- Anonymity may not not perfect. In particular, there may be some remaining
- JavaScript or Flash holes. Please let me know if you find any.
- Since ALL of your cookies are sent to this script (which then chooses
- the relevant ones), some cookies could be dropped if you accumulate a
- lot, resulting in "Bad Request" errors. To fix this, use a database
- server for cookies. # #
- I first wrote this in 1996 as an experiment to allow indirect browsing.
- The original seed was a program I wrote for Rich Morin's article
- in the June 1996 issue of Unix Review, online at
- http://www.cfcl.com/tin/P/199606.shtml. #
- Confession: I didn't originally write this with the spec for HTTP
- proxies in mind, and there are probably some violations of the protocol
- (at least for proxies). This whole thing is one big violation of the
- proxy model anyway, so I hereby rationalize that the spec can be widely
- interpreted here. If there is demand, I can make it more conformant.
- The HTTP client and server components should be fine; it's just the
- special requirements for proxies that may not be followed.
#
#--------------------------------------------------------------------------
use strict ;
use warnings ;
no warnings qw(uninitialized redefine) ; # we use defaults all the time
use Encode ;
use IO::Handle ;
use IO::Select ;
use File::Spec ;
use Time::Local ;
use Getopt::Long ;
use Socket qw(:all) ;
use Net::Domain qw(hostfqdn) ;
use Fcntl qw(:DEFAULT :flock) ;
use POSIX qw(:sys_wait_h setsid);
use Time::HiRes qw(gettimeofday tv_interval) ;
use Errno qw(EINTR EAGAIN EWOULDBLOCK ENOBUFS EPIPE) ;
- First block below is config variables, second block is sort-of config
- variables, third block is persistent constants, fourth block is would-be
- persistent constants (not set until needed), fifth block is constants for
- JavaScript processing (mostly regular expressions), and last block is
- variables.
- Removed $RE_JS_STRING_LITERAL to help with Perl's long-literal-string bug,
- but can replace it later if/when that is fixed. Added
- $RE_JS_STRING_LITERAL_START, $RE_JS_STRING_REMAINDER_1, and
- $RE_JS_STRING_REMAINDER_2 as part of the workaround.
use vars qw(
$PROXY_DIR $SECRET_PATH $LOCAL_LIB_DIR
$FCGI_SOCKET $FCGI_MAX_REQUESTS_PER_PROCESS $FCGI_NUM_PROCESSES
$PRIVATE_KEY_FILE $CERTIFICATE_FILE $RUN_AS_USER $EMB_USERNAME $EMB_PASSWORD
$DB_DRIVER $DB_SERVER $DB_NAME $DB_USER $DB_PASS $USE_DB_FOR_COOKIES
%REDIRECTS %TIMEOUT_MULTIPLIER_BY_HOST
$DEFAULT_LANG
$TEXT_ONLY
$REMOVE_COOKIES $REMOVE_SCRIPTS $FILTER_ADS $HIDE_REFERER
$INSERT_ENTRY_FORM $ALLOW_USER_CONFIG
$ENCODE_DECODE_BLOCK_IN_JS
@ALLOWED_SERVERS @BANNED_SERVERS @BANNED_NETWORKS
$NO_COOKIE_WITH_IMAGE @ALLOWED_COOKIE_SERVERS @BANNED_COOKIE_SERVERS
@ALLOWED_SCRIPT_SERVERS @BANNED_SCRIPT_SERVERS
@BANNED_IMAGE_URL_PATTERNS $RETURN_EMPTY_GIF
$USER_IP_ADDRESS_TEST $DESTINATION_SERVER_TEST
$INSERT_HTML $INSERT_FILE $ANONYMIZE_INSERTION $FORM_AFTER_INSERTION
$INSERTION_FRAME_HEIGHT
$RUNNING_ON_SSL_SERVER $NOT_RUNNING_AS_NPH $USER_FACING_PORT
$HTTP_PROXY $SSL_PROXY $NO_PROXY $PROXY_AUTH $SSL_PROXY_AUTH
$SOCKS_PROXY $SOCKS_USERNAME $SOCKS_PASSWORD
$MINIMIZE_CACHING
$SESSION_COOKIES_ONLY $COOKIE_PATH_FOLLOWS_SPEC $RESPECT_THREE_DOT_RULE
@PROXY_GROUP
$USER_AGENT $USE_PASSIVE_FTP_MODE $SHOW_FTP_WELCOME
$PROXIFY_SCRIPTS $PROXIFY_SWF $ALLOW_RTMP_PROXY $ALLOW_UNPROXIFIED_SCRIPTS
$PROXIFY_COMMENTS
$USE_POST_ON_START $ENCODE_URL_INPUT
$REMOVE_TITLES $NO_BROWSE_THROUGH_SELF $NO_LINK_TO_START $MAX_REQUEST_SIZE
@TRANSMIT_HTML_IN_PARTS_URLS
$QUIETLY_EXIT_PROXY_SESSION
$ALERT_ON_CSP_VIOLATION
$OVERRIDE_SECURITY@SCRIPT_MIME_TYPES @OTHER_TYPES_TO_REGISTER @TYPES_TO_HANDLE
$NON_TEXT_EXTENSIONS
@RTL_LANG
$PROXY_VERSION$RUN_METHOD
@MONTH @WEEKDAY %UN_MONTH
%RTL_LANG
@BANNED_NETWORK_ADDRS
$DB_HOSTPORT $DBH $STH_UPD_COOKIE $STH_INS_COOKIE $STH_SEL_COOKIE $STH_SEL_ALL_COOKIES
$STH_DEL_COOKIE $STH_DEL_ALL_COOKIES $STH_UPD_SESSION $STH_INS_SESSION $STH_SEL_IP
$STH_PURGE_SESSIONS $STH_PURGE_COOKIES
$USER_IP_ADDRESS_TEST_H $DESTINATION_SERVER_TEST_H
$RUNNING_ON_IIS
@NO_PROXY
$NO_CACHE_HEADERS
@ALL_TYPES %MIME_TYPE_ID $SCRIPT_TYPE_REGEX $TYPES_TO_HANDLE_REGEX
$THIS_HOST $ENV_SERVER_PORT $ENV_SCRIPT_NAME $THIS_SCRIPT_URL
$SSL_SUPPORTED
$RTMP_SERVER_PORT
%ENV_UNCHANGING $HAS_INITED%MSG @MSG_KEYS $CUSTOM_INSERTION %IN_CUSTOM_INSERTION
$RE_JS_WHITE_SPACE $RE_JS_LINE_TERMINATOR $RE_JS_COMMENT
$RE_JS_IDENTIFIER_START $RE_JS_IDENTIFIER_PART $RE_JS_IDENTIFIER_NAME
$RE_JS_PUNCTUATOR $RE_JS_DIV_PUNCTUATOR
$RE_JS_NUMERIC_LITERAL $RE_JS_ESCAPE_SEQUENCE
$RE_JS_STRING_LITERAL
$RE_JS_STRING_LITERAL_START $RE_JS_STRING_REMAINDER_1 $RE_JS_STRING_REMAINDER_2
$RE_JS_REGULAR_EXPRESSION_LITERAL
$RE_JS_TOKEN $RE_JS_INPUT_ELEMENT_DIV $RE_JS_INPUT_ELEMENT_REG_EXP
$RE_JS_SKIP $RE_JS_SKIP_NO_LT
%RE_JS_SET_TRAPPED_PROPERTIES %RE_JS_SET_RESERVED_WORDS_NON_EXPRESSION
%RE_JS_SET_ALL_PUNCTUATORS
$JSLIB_BODY $JSLIB_BODY_GZ$HTTP_VERSION $HTTP_1_X
$URL
$STDIN $STDOUT
$now $session_id $session_id_persistent $session_cookies
$packed_flags $encoded_URL $doing_insert_here $env_accept
$e_remove_cookies $e_remove_scripts $e_filter_ads $e_insert_entry_form
$e_hide_referer
$images_are_banned_here $scripts_are_banned_here $cookies_are_banned_here
$scheme $authority $path $host $port $username $password
$csp $csp_ro $csp_is_supported
$cookie_to_server %auth
$script_url $url_start $url_start_inframe $url_start_noframe $lang $dir
$is_in_frame $expected_type
$base_url $base_scheme $base_host $base_path $base_file $base_unframes
$default_style_type $default_script_type
$status $headers $body $charset $meta_charset $is_html
%in_mini_start_form
$does_write
$swflib $AVM2_BYTECODES
$xhr_origin
$temp_counter
$debug ) ;
- user configuration
#--------------------------------------------------------------------------
- For certain purposes, CGIProxy may need to create files. This is where
- those will go. For example, use "/home/username/cgiproxy", where "username"
- is replaced by your username.
- This directory has to be readable and writeable by the userID that CGIProxy
- runs as; that userID is set in the Web server configuration (if this is running
- as a CGI script or under mod_perl), or else it's the userID used to start
- the FastCGI server or the embedded server.
- This can be either a relative or absolute path. If it's a relative path, it
- will be interpreted relative to the home directory of this script file's owner.
- If you have root access and can run "./nph-proxy init" as root (which has
- advantages), then set this to an absolute path so it doesn't go under
- the /root directory.
- Note that you need to use "\\" to represent a single backslash.
- Leading drive letters (e.g. for Windows) are allowed.
- The default will use the directory "cgiproxy" under your home directory (which
- varies with your operating system). If it doesn't work, manually set
- $PROXY_DIR to an absolute path. You can name it whatever you want.
- Also see $RUN_AS_USER, just below. Note that many special users, probably
- including your Web server's user, don't have a home directory to put $PROXY_DIR
- under. For such a case, you need to set $PROXY_DIR to another directory somewhere
- that the Web server's user can read and write.
- Note that in Unix or Mac, using a directory on a mounted filesystem (which often
- includes home directories) may prevent that filesystem from being unmounted,
- which may bother your sysadmin. If so, try setting this to something starting
- with "/tmp/", like "/tmp/.your-username/".
- If you get "mkdir" permission errors, create the directory yourself with mkdir.
- You may also need to "chmod 777 directoryname" to make the directory writable
- by the Web server, but note that this makes it readable and writable by
- everybody. You might ask your webmaster if they provide a safe way for CGI
- scripts to read and write files in your directories. With Apache, the suEXEC
- feature is often used to let multiple website owners use the same server
- securely: each CGI or mod_perl script is run as the owner of the script file.
$PROXY_DIR= 'cgiproxy' ;
- If you have root access and can run "./nph-proxy init" as root, then set this
- to either the username or numeric user ID that the script will run as. When
- run as a CGI script or under mod_perl, this is usually the Web server's
- username, or possibly the script owner's username if using Apache with the
- suEXEC feature turned on.
- Setting this lets "./nph-proxy init" create the needed directories ($PROXY_DIR
- and subdirectories) and a SQLite database file (if using SQLite) with the right
- permissions and ownership.
- If you run this script as the root user in order to use port 443 with the
- embedded server, it's a good idea to change the user ID to something with
- fewer permissions. You can also do this by setting $RUN_AS_USER .
- In any case, this has to be set to an existing user on the server, i.e. CGIProxy
- doesn't create the user if it doesn't already exist.
- If this is not set, it will default to the owner of this script file.
- Also see $PROXY_DIR, just above. Note that many special users, probably including
- your Web server's user, don't have a home directory to put $PROXY_DIR under.
- For such a case, you need to set $PROXY_DIR to another directory somewhere that
- the Web server's user can read and write.
- This probably won't work on Windows, though note that you don't need root
- access to use port 443 on Windows.
#$RUN_AS_USER= 'nobody' ;
- IMPORTANT: CHANGE THIS IF USING FASTCGI OR THE EMBEDDED SERVER!
- If using FastCGI or the embedded server, the path in the URL will begin with a
- fixed alphanumeric sequence (string) to help conceal the proxy. You can set
- this to any alphanumeric string. The URL of your proxy will be
- "https://example.com/secret" (replace "secret" with your actual secret).
- If we didn't do this, then a censor could check if a site hosts a proxy by
- merely accessing "https://example.com" .
- Note that this is not a secret from the users, just from anyone watching
- network traffic. Also, it won't be kept secret if your server is insecure.
$SECRET_PATH= 'secret' ;
- If you don't have root access on your server, set this so that Perl (CPAN)
- modules are installed under your own directory. Be sure to follow the
- instructions about the environment variables after you run "./nph-proxy.cgi init".
- If this script is not running as your user ID (such as a Web server running
- as its own user ID), and you're using the local::lib module, then
- set this to the directory where your modules are installed with local::lib .
- This is normally just the "perl5" directory under your home directory, unless
- you renamed it or configured local::lib to use a different directory.
- If you set this before installing modules, then CPAN (Perl) modules will be
- installed into this directory.
#$LOCAL_LIB_DIR= '/home/your-username/perl5' ; # this example works for Unix or Mac
- If you're running CGIProxy such that the Web server that the user sees is different
- from the Web server CGIProxy is running on (though maybe on the same machine),
- the SERVER_PORT environment variable might not be set to the port that the
- user is connecting to, and so all the generated URLs will have the wrong
- port in them. In this case, you can set $USER_FACING_PORT to the port number
- that should be in the URLs, i.e. the port that the user connects to.
- For example, this would be useful when the user connects to nginx on a server where
- nginx then calls an internal Apache process to run this script (perhaps to take
- advantage of mod_perl). In such a case, the SERVER_PORT set by Apache will be
- the port used for internal nginx-to-Apache communication, not the port the user
- connects to nginx with. In this case, you would set $USER_FACING_PORT to the
- outward-facing port that nginx listens on.
#$USER_FACING_PORT= 443 ;
#---- FastCGI configuration ---------------------
- FastCGI is a mechanism that can speed up CGI-like scripts. It's purely
- optional and requires some web server configuration as well, and if you
- don't use it you can ignore this section.
- FastCGI uses a local Internet socket to communicate between the FastCGI client
- (e.g. the web server software) and the FastCGI server (e.g. a CGI script that
- has been converted to run as a listening daemon, such as CGIProxy).
- Set this to a port number for this script to listen on as a FastCGI script.
- You'll need to set it in your HTTP server's configuration file too (e.g. in
- httpd.conf or nginx.conf). For details of that, see
- http://www.jmarshall.com/tools/cgiproxy/install.html#fastcgi
- This used to use a "Unix-domain socket" instead of an Internet socket, but
- there was trouble with the FCGI module and Unix-domain sockets, so as of
- CGIProxy 2.1.14 we use an Internet socket.
- Note that this no longer requires a ":" at the start, though that is allowed.
$FCGI_SOCKET= 8002 ;
- FastCGI uses multiple processes to listen on its socket, where each
- process can handle one request at a time. This is a performance tuning
- parameter, so the optimal number depends on your server environment
- (hardware and software).
- If you don't understand this, the default should be fine. You can experiment
- with different numbers if performance is an issue.
- This can be overridden with the "-n" command-line parameter.
$FCGI_NUM_PROCESSES= 100 ;
- As a FastCGI process gets used for many requests, it slowly takes more and
- more memory, due to the copy-on-write behavior of forked processes. Thus,
- it's cleaner if you kill a process and restart a fresh one after it handles
- some number of requests. This is a performance tuning parameter, so the
- optimal number depends on your server environment (hardware and software).
- If you don't understand this, the default should be fine. You can experiment
- with different numbers if performance is an issue.
- This can be overridden with the "-m" command-line parameter.
$FCGI_MAX_REQUESTS_PER_PROCESS= 1000 ;
#---- End of FastCGI configuration --------------
- Much initialization of unchanging values is now in this routine. (Ignore
- this if you don't know what it means.)
sub init {
#---- Embedded server configuration -------------
- For the embedded server, you need to a) put a certificate and private key,
- in PEM format, into the $PROXY_DIR directory, and b) set these two
- variables to the two file names. (A "certificate" is the same thing as
- a public key.)
- You can either pay a certificate authority for a key pair, or you can
- generate your own "self-signed" key pair. The disadvantage of using a
- self-signed key pair is that your users will see a browser warning about
- an untrusted certificate. This is all true of any secure server.
#$CERTIFICATE_FILE= 'plain-cert.pem' ;
#$PRIVATE_KEY_FILE= 'plain-rsa.pem' ;
- It's important to use $SECRET_PATH, but you can require a username and
- password too. All users must login with whatever you set below, using
- HTTP Basic authentication. Leave these commented out to disable
- password protection.
- This is very simple right now. In the future there will likely be
- more authentication methods, including support for multiple users.
#$EMB_USERNAME= 'free' ;
#$EMB_PASSWORD= 'speech' ;
#---- End of embedded server configuration ------
#---- Database configuration --------------------
- Database use is optional, and if you don't use one you can ignore this
- section. But if you're getting "Bad Request" errors, you can fix it
- by using a database; also, see the $USE_DB_FOR_COOKIES option below.
- Database use is optional. It's most efficient when this script is running
- under mod_perl or FastCGI.
- The easiest database to use is SQLite. While normal database engines like
- MySQL/MariaDB or Oracle require a constantly running process and some
- configuration by the system administrator, SQLite requires none of this--
- it reads and writes directly to database files in your own directory, as
- protected by the operating system permissions. Because of its ease of
- configuration, SQLite is the default database here.
- If you're using a database other than SQLite, create a database user account
- for this program to use, or ask your database administrator to do it. Set
- $DB_USER and $DB_PASS to the username and password, below. This program
- will try to create the required database, named $DB_NAME as set below, but
- if your DBA isn't willing to grant the permission to create databases to
- the CGIProxy user, then you or the DBA will need to create the database.
- This can be done with the SQL command "CREATE DATABASE cgiproxy;" (or
- whatever you set $DB_NAME to below). #
- If you are using a database of any kind, it must be purged periodically. In
- Unix or Mac, do this with a cron job. In Windows, use the Task Scheduler.
- In Unix or Mac, the command to purge the database is
- "/path/to/script/nph-proxy.cgi purge-db". (Replace "/path/to/script/"
- with the actual path to the script.) Edit your crontab with "crontab -e",
- and add a line like:
- "0 * * * * /path/to/script/nph-proxy.cgi purge-db" (without quotes)
- to purge the database at the top of every hour, or:
- "0 2 * * * /path/to/script/nph-proxy.cgi purge-db" (without quotes)
- to purge it every night at 2:00am.
- This is the name of the "database driver" for the database software you're using.
- Currently supported values are "SQLite", "MySQL" and "Oracle".
- The default of "SQLite" is the easiest to use. SQLite lets you have database
- functionality by directly reading and writing a database file, without requiring
- a full database engine like MySQL/MariaDB or Oracle to run on your server.
- Note that it is potentially insecure to use a database if there are other
- untrusted people with accounts on the same server, especially if they can read
- this script file and the database password below. The easiest way to securely
- use a database is to have your own server with no untrusted user having shell
- access on it. If this isn't practical, then you need to set file permissions
- appropriately on both this script file and any SQLite database file: set
- permissions (and file ownership and group ownership) on both files to be
- accessible by the web server's userID, but not accessible by anyone else on
- the same server. Note that running this on a virtual private server isn't
- insecure in this way-- even though a VPS is a shared machine, other people
- can't see your files (except the sysadmin).
- Set this to "" or comment it out to not use a database. Note that you will
- probably see "Bad Request" errors when you accumulate too many cookies; using
- a database solves this problem, or you can periodically clear your cookies.
$DB_DRIVER= 'SQLite' ;
- If your database (other than SQLite) is running on a remote server, or on a
- non-default port, set this to "dbserver:port", where dbserver is the name
- or IP address of your database server, and port is the port it is listening
- on. If dbserver is empty (as in ":port"), then it defaults to localhost;
- if port is empty (as in "dbserver:" or just "dbserver"), then it defaults
- to 3306 for MySQL, or 1521 for Oracle.
#$DB_SERVER= "localhost:3306" ;
- CGIProxy creates (if possible) and uses its own database. If you want to name
- the database something else, change this value. If you need a database
- administrator to create the database, tell him or her this database name.
- This value must only contain letters, numbers, and the "_" character.
$DB_NAME= 'cgiproxy' ;
- These are the username and password of the database account, as described above.
- If you're using SQLite, you don't need to set these-- access to the SQLite
- database files is controlled by the permissions of the filesystem.
$DB_USER= 'proxy' ;
$DB_PASS= '' ;
- If set, then use the server-side database to store cookies. This gets around
- the problem of too many total cookies causing "Bad Request" errors.
- Set this to 1 to use the database (if it's configured), or to 0 to NOT use
- the database.
$USE_DB_FOR_COOKIES= 1 ;
#---- End of database configuration -------------
- This is the default language to use for all CGIProxy messages, until the user
- clicks on a flag in the start form.
$DEFAULT_LANG= 'en' ;
- If set, then proxy traffic will be restricted to text data only, to save
- bandwidth (though it can still be circumvented with uuencode, etc.).
- To replace images with a 1x1 transparent GIF, set $RETURN_EMPTY_GIF below.
$TEXT_ONLY= 0 ; # set to 1 to allow only text data, 0 to allow all
- If set, then prevent all cookies from passing through the proxy. To allow
- cookies from some servers, set this to 0 and see @ALLOWED_COOKIE_SERVERS
- and @BANNED_COOKIE_SERVERS below. You can also prevent cookies with
- images by setting $NO_COOKIE_WITH_IMAGE below.
- Note that this only affects cookies from the target server. The proxy
- script sends its own cookies for other reasons too, like to support
- authentication. This flag does not stop these cookies from being sent.
$REMOVE_COOKIES= 0 ;
- If set, then remove as much scripting as possible. If anonymity is
- important, this is strongly recommended! Better yet, turn off script
- support in your browser.
- On the HTTP level:
- . prevent transmission of script MIME types (which only works if the server
- marks them as such, so a malicious server could get around this, but
- then the browser probably wouldn't execute the script).
- . remove Link: headers that link to a resource of a script MIME type.
- Within HTML resources:
- . remove <script>...</script> .
- . remove intrinsic event attributes from tags, i.e. attributes whose names
- begin with "on".
- . remove <style>...</style> where "type" attribute is a script MIME type.
- . remove various HTML tags that appear to link to a script MIME type.
- . remove script macros (aka Netscape-specific "JavaScript entities"),
- i.e. any attributes containing the string "&{" .
- . remove "JavaScript conditional comments".
- . remove MSIE-specific "dynamic properties".
- To allow scripts from some sites but not from others, set this to 0 and
- see @ALLOWED_SCRIPT_SERVERS and @BANNED_SCRIPT_SERVERS below.
- See @SCRIPT_MIME_TYPES below for a list of which MIME types are filtered out.
- I do NOT know for certain that this removes all script content! It removes
- all that I know of, but I don't have a definitive list of places scripts
- can exist. If you do, please send it to me. EVEN RUNNING A SINGLE
- JAVASCRIPT STATEMENT CAN COMPROMISE YOUR ANONYMITY! Just so you know.
- Richard Smith has a good test site for anonymizing proxies, at
- http://users.rcn.com/rms2000/anon/test.htm
- Note that turning this on removes most popup ads! :)
$REMOVE_SCRIPTS= 0 ;
- If set, then filter out images that match one of @BANNED_IMAGE_URL_PATTERNS,
- below. Also removes cookies attached to images, as if $NO_COOKIE_WITH_IMAGE
- is set.
- To remove most popup advertisements, also set $REMOVE_SCRIPTS=1 above.
$FILTER_ADS= 0 ;
- If set, then don't send a Referer: [sic] header with each request
- (i.e. something that tells the server which page you're coming from
- that linked to it). This is a minor privacy issue, but a few sites
- won't send you pages or images if the Referer: is not what they're
- expecting. If a page is loading without images or a link seems to be
- refused, then try turning this off, and a correct Referer: header will
- be sent.
- This is only a problem in a VERY small percentage of sites, so few that
- I'm kinda hesitant to put this in the entry form. Other arrangements
- have their own problems, though.
$HIDE_REFERER= 0 ;
- If set, insert a compact version of the URL entry form at the top of each
- page. This will also display the URL currently being viewed.
- When viewing a page with frames, then a new top frame is created and the
- insertion goes there.
- If you want to customize the appearance of the form, modify the routine
- mini_start_form() near the end of the script.
- If you want to insert something other than this form, see $INSERT_HTML and
- $INSERT_FILE below.
- Users should realize that options changed via the form only take affect when
- the form is submitted by entering a new URL or pressing the "Go" button.
- Selecting an option, then following a link on the page, will not cause
- the option to take effect.
- Users should also realize that anything inserted into a page may throw
- off any precise layout. The insertion will also be subject to
- background colors and images, and any other page-wide settings.
$INSERT_ENTRY_FORM= 1 ;
- If set, then allow the user to control $REMOVE_COOKIES, $REMOVE_SCRIPTS,
- $FILTER_ADS, $HIDE_REFERER, and $INSERT_ENTRY_FORM. Note that they
- can't fine-tune any related options, such as the various @ALLOWED... and
- @BANNED... lists.
$ALLOW_USER_CONFIG= 1 ;
- If you want to encode the URLs of visited pages so that they don't show
- up within the full URL in your browser bar, then use proxy_encode() and
- proxy_decode(). These are Perl routines that transform the way the
- destination URL is included in the full URL. You can either use
- some combination of the example encodings below, or you can program your
- own routines. The encoded form of URLs should only contain characters
- that are legal in PATH_INFO. This varies by server, but using only
- printable chars and no "?" or "#" works on most servers. Don't let
- PATH_INFO contain the strings "./", "/.", "../", or "/..", or else it
- may get compressed like a pathname somewhere. Try not to make the
- resulting string too long, either.
- Of course, proxy_decode() must exactly undo whatever proxy_encode() does.
- Make proxy_encode() as fast as possible-- it's a bottleneck for the whole
- program. The speed of proxy_decode() is not as important.
- If you're not a Perl programmer, you can use the example encodings that are
- commented out, i.e. the lines beginning with "#". To use them, merely
- uncomment them, i.e. remove the "#" at the start of the line. If you
- uncomment a line in proxy_encode(), you MUST uncomment the corresponding
- line in proxy_decode() (note that "corresponding lines" in
- proxy_decode() are in reverse order of those in proxy_encode()). You
- can use one, two, or all three encodings at the same time, as long as
- the correct lines are uncommented.
- Starting in version 2.1beta9, don't call these functions directly. Rather,
- call wrap_proxy_encode() and wrap_proxy_decode() instead, which handle
- certain details that you shouldn't have to worry about in these functions.
- IMPORTANT: If you modify these routines, and if $PROXIFY_SCRIPTS is set
- below (on by default), then you MUST modify $ENCODE_DECODE_BLOCK_IN_JS
- below!! (You'll need to write corresponding routines in JavaScript to do
- the same as these routines in Perl, used when proxifying JavaScript.)
- Because of the simplified absolute URL resolution in full_url(), there may
- be ".." segments in the default encoding here, notably in the first path
- segment. Normally, that's just an HTML mistake, but please tell me if
- you see any privacy exploit with it.
- Note that a few sites have embedded applications (like applets or Shockwave)
- that expect to access URLs relative to the page's URL. This means they
- may not work if the encoded target URL can't be treated like a base URL,
- e.g. that it can't be appended with something like "../data/foo.data"
- to get that expected data file. In such cases, the default encoding below
- should let these sites work fine, as should any other encoding that can
- support URLs relative to it.
my($URL)= @_ ;
$URL=~ s#^([\w+.-]+)://#$1/# ; # http://xxx -> http/xxx
- $URL=~ s/(.)/ sprintf('%02x',ord($1)) /ge ; # each char -> 2-hex
- $URL=~ tr/a-zA-Z/n-za-mN-ZA-M/ ; # rot-13
return $URL ;
}
sub proxy_decode {
my($enc_URL)= @_ ;
- $enc_URL=~ tr/a-zA-Z/n-za-mN-ZA-M/ ; # rot-13
- $enc_URL=~ s/([\da-fA-F]{2})/ sprintf("%c",hex($1)) /ge ;
$enc_URL=~ s#^([\w+.-]+)/#$1://# ; # http/xxx -> http://xxx
return $enc_URL ;
}
- Encode cookies before they're sent back to the user.
- The return value must only contain characters that are legal in cookie
- names and values, i.e. only printable characters, and no ";", ",", "=",
- or white space.
- cookie_encode() is called twice for each cookie: once to encode the cookie
- name, and once to encode the cookie value. The two are then joined with
- "=" and sent to the user.
- cookie_decode() must exactly undo whatever cookie_encode() does.
- Also, cookie_encode() must always encode a given input string into the
- same output string. This is because browsers need the cookie name to
- identify and manage a cookie, so the name must be consistent.
- This is not a bottleneck like proxy_encode() is, so speed is not critical.
- IMPORTANT: If you modify these routines, and if $PROXIFY_SCRIPTS is set
- below (on by default), then you MUST modify $ENCODE_DECODE_BLOCK_IN_JS
- below!! (You'll need to write corresponding routines in JavaScript to do
- the same as these routines in Perl, used when proxifying JavaScript.)
my($cookie)= @_ ;
- $cookie=~ s/(.)/ sprintf('%02x',ord($1)) /ge ; # each char -> 2-hex
- $cookie=~ tr/a-zA-Z/n-za-mN-ZA-M/ ; # rot-13
$cookie=~ s/(\W)/ '%' . sprintf('%02x',ord($1)) /ge ; # simple URL-encoding
return $cookie ;
}
my($enc_cookie)= @_ ;
$enc_cookie=~ s/%([\da-fA-F]{2})/ pack('C', hex($1)) /ge ; # URL-decode
- $enc_cookie=~ tr/a-zA-Z/n-za-mN-ZA-M/ ; # rot-13
- $enc_cookie=~ s/([\da-fA-F]{2})/ sprintf("%c",hex($1)) /ge ;
return $enc_cookie ;
}
- If $PROXIFY_SCRIPTS is true, and if you modify the routines above that
- encode cookies and URLs, then you need to modify $ENCODE_DECODE_BLOCK_IN_JS
- here. Explanation: When proxifying JavaScript, a library of JavaScript
- functions is used. In that library are a few JavaScript routines that do
- the same as their Perl counterparts in this script. Four of those routines
- are proxy_encode(), proxy_decode(), cookie_encode(), and cookie_decode().
- Thus, unfortunately, when you write your own versions of those Perl routines
- (or modify what's already there), you also need to write (or modify) these
- corresponding JavaScript routines to do the same thing. Put the routines in
- this long variable $ENCODE_DECODE_BLOCK_IN_JS, and it will be included in
- the JavaScript library when needed. Prefix the function names with
- "_proxy_jslib_", as below.
- The commented examples in the JavaScript routines below correspond exactly to
- the commented examples in the Perl routines above. Thus, if you modify the
- Perl routines by merely uncommenting the examples, you can do the same in
- these JavaScript routines. (JavaScript comments begin with "//".)
- [If you don't know Perl: Note that everything up until the line "EOB" is one
- long string value, called a "here document". $ENCODE_DECODE_BLOCK_IN_JS is
- set to the whole thing.]
$ENCODE_DECODE_BLOCK_IN_JS= <<'EOB' ;
function _proxy_jslib_proxy_encode(URL) {
URL= URL.replace(/^([\w\+\.\-]+)\:\/\//, '$1/') ;
// URL= URL.replace(/(.)/g, function (s,p1) { return p1.charCodeAt(0).toString(16) } ) ;
// URL= URL.replace(/([a-mA-M])|[n-zN-Z]/g, function (s,p1) { return String.fromCharCode(s.charCodeAt(0)+(p1?13:-13)) }) ;
return URL ;
}
function _proxy_jslib_proxy_decode(enc_URL) {
// enc_URL= enc_URL.replace(/([a-mA-M])|[n-zN-Z]/g, function (s,p1) { return String.fromCharCode(s.charCodeAt(0)+(p1?13:-13)) }) ;
// enc_URL= enc_URL.replace(/([\da-fA-F]{2})/g, function (s,p1) { return String.fromCharCode(eval('0x'+p1)) } ) ;
enc_URL= enc_URL.replace(/^([\w\+\.\-]+)\//, '$1://') ;
return enc_URL ;
}
function _proxy_jslib_cookie_encode(cookie) {
// cookie= cookie.replace(/(.)/g, function (s,p1) { return p1.charCodeAt(0).toString(16) } ) ;
// cookie= cookie.replace(/([a-mA-M])|[n-zN-Z]/g, function (s,p1) { return String.fromCharCode(s.charCodeAt(0)+(p1?13:-13)) }) ;
cookie= cookie.replace(/(\W)/g, function (s,p1) { return '%'+p1.charCodeAt(0).toString(16) } ) ;
return cookie ;
}
function _proxy_jslib_cookie_decode(enc_cookie) {
enc_cookie= enc_cookie.replace(/%([\da-fA-F]{2})/g, function (s,p1) { return String.fromCharCode(eval('0x'+p1)) } ) ;
// enc_cookie= enc_cookie.replace(/([a-mA-M])|[n-zN-Z]/g, function (s,p1) { return String.fromCharCode(s.charCodeAt(0)+(p1?13:-13)) }) ;
// enc_cookie= enc_cookie.replace(/([\da-fA-F]{2})/g, function (s,p1) { return String.fromCharCode(eval('0x'+p1)) } ) ;
return enc_cookie ;
}
EOB
- Use @ALLOWED_SERVERS and @BANNED_SERVERS to restrict which servers a user
- can visit through this proxy. Any URL at a host matching a pattern in
- @BANNED_SERVERS will be forbidden. In addition, if @ALLOWED_SERVERS is
- not empty, then access is allowed only to servers that match a pattern
- in it. In other words, @BANNED_SERVERS means "ban these servers", and
- @ALLOWED_SERVERS (if not empty) means "allow only these servers". If a
- server matches both lists, it is banned.
- These are each a list of Perl 5 regular expressions (aka patterns or
- regexes), not literal host names. To turn a hostname into a pattern,
- replace every "." with "\.", add "^" to the beginning, and add "$" to the
- end. For example, 'www.example.com' becomes '^www\.example\.com$'. To
- match every host ending in something, leave out the "^". For example,
- '\.example\.com$' matches every host ending in ".example.com". For more
- details about Perl regular expressions, see the Perl documentation. (They
- may seem cryptic at first, but they're very powerful once you know how to
- use them.)
- Note: Use single quotes around each pattern, not double qoutes, unless you
- understand the difference between the two in Perl. Otherwise, characters
- like "$" and "\" may not be handled the way you expect.
@ALLOWED_SERVERS= () ;
@BANNED_SERVERS= () ;
- If @BANNED_NETWORKS is set, then forbid access to these hosts or networks.
- This is done by IP address, not name, so it provides more certain security
- than @BANNED_SERVERS above.
- Specify each element as a decimal IP address-- all four integers for a host,
- or one to three integers for a network. For example, '127.0.0.1' bans
- access to the local host, and '192.168' bans access to all IP addresses
- in the 192.168 network. Sorry, no banning yet for subnets other than
- 8, 16, or 24 bits.
- IF YOU'RE RUNNING THIS ON OR INSIDE A FIREWALL, THIS SETTING IS STRONGLY
- RECOMMENDED!! In particular, you should ban access to other machines
- inside the firewall that the firewall machine itself may have access to.
- Otherwise, external users will be able to access any internal hosts that
- the firewall can access. Even if that's what you intend, you should ban
- access to any hosts that you don't explicitly want to expose to outside
- users.
- In addition to the recommended defaults below, add all IP addresses of your
- server machine if you want to protect it like this.
- If you're using this with another proxy on the same machine (like a SOCKS
- proxy), you'll need to remove the '127' item below. But see the comments
- above $SOCKS_PROXY, below, for a warning.
- After you set this, YOU SHOULD TEST to verify that the proxy can't access
- the IP addresses you're banning!
- NOTE: According to RFC 1918, network address ranges reserved for private
- networks are 10.x.x.x, 192.168.x.x, and 172.16.x.x-172.31.x.x, i.e. with
- respective subnet masks of 8, 16, and 12 bits. Since we can't currently
- do a 12-bit mask, we'll exclude the entire 172 network here. If this
- causes a problem, let me know and I'll add subnet masks down to 1-bit
- resolution.
- Also included are 169.254.x.x (per RFC 3927) and 244.0.0.x (used for
- routing), as recommended by Waldo Jaquith.
- On some systems, 127.x.x.x all point to localhost, so disallow all of "127".
- This feature is simple now but may be more complete in future releases.
- How would you like this to be extended? What would be useful to you?
@BANNED_NETWORKS= ('127', '192.168', '172', '10', '169.254', '244.0.0') ;
- Settings to fine-tune cookie filtering, if cookies are not banned altogether
- (by user checkbox or $REMOVE_COOKIES above).
- Use @ALLOWED_COOKIE_SERVERS and @BANNED_COOKIE_SERVERS to restrict which
- servers can send cookies through this proxy. They work like
- @ALLOWED_SERVERS and @BANNED_SERVERS above, both in how their precedence
- works, and that they're lists of Perl 5 regular expressions. See the
- comments there for details.
- If non-empty, only allow cookies from servers matching one of these patterns.
- Comment this out to allow all cookies (subject to @BANNED_COOKIE_SERVERS).
#@ALLOWED_COOKIE_SERVERS= ('\bslashdot\.org$') ;
- Reject cookies from servers matching these patterns.
@BANNED_COOKIE_SERVERS= (
'\.doubleclick\.net$',
'\.preferences\.com$',
'\.imgis\.com$',
'\.adforce\.com$',
'\.focalink\.com$',
'\.flycast\.com$',
'\.avenuea\.com$',
'\.linkexchange\.com$',
'\.pathfinder\.com$',
'\.burstnet\.com$',
'\btripod\.com$',
'\bgeocities\.yahoo\.com$',
'\.mediaplex\.com$',
) ;
- Set this to reject cookies returned with images. This actually prevents
- cookies returned with any non-text resource.
- This helps prevent tracking by ad networks, but there are also some
- legitimate uses of attaching cookies to images, such as captcha, so
- by default this is off.
$NO_COOKIE_WITH_IMAGE= 0 ;
- Settings to fine-tune script filtering, if scripts are not banned altogether
- (by user checkbox or $REMOVE_SCRIPTS above).
- Use @ALLOWED_SCRIPT_SERVERS and @BANNED_SCRIPT_SERVERS to restrict which
- servers you'll allow scripts from. They work like @ALLOWED_SERVERS and
- @BANNED_SERVERS above, both in how their precedence works, and that
- they're lists of Perl 5 regular expressions. See the comments there for
- details.
@ALLOWED_SCRIPT_SERVERS= () ;
@BANNED_SCRIPT_SERVERS= () ;
- Various options to help filter ads and stop cookie-based privacy invasion.
- These are only effective if $FILTER_ADS is set above.
- @BANNED_IMAGE_URL_PATTERNS uses Perl patterns. If an image's URL
- matches one of the patterns, it will not be downloaded (typically for
- ad-filtering). For more information on Perl regular expressions, see
- the Perl documentation.
- Note that most popup ads will be removed if scripts are removed (see
- $REMOVE_SCRIPTS above).
- If ad-filtering is your primary motive, consider using one of the many
- proxies that specialize in that. The classic is from JunkBusters, at
- http://www.junkbusters.com .
- Reject images whose URL matches any of these patterns. This is just a
- sample list; add more depending on which sites you visit.
@BANNED_IMAGE_URL_PATTERNS= (
'ad\.doubleclick\.net/ad/',
'\b[a-z](\d+)?\.doubleclick\.net(:\d*)?/',
'\.imgis\.com\b',
'\.adforce\.com\b',
'\.avenuea\.com\b',
'\.go\.com(:\d*)?/ad/',
'\.eimg\.com\b',
'\bexcite\.netscape\.com(:\d*)?/.*/promo/',
'/excitenetscapepromos/',
'\.yimg\.com(:\d*)?.*/promo/',
'\bus\.yimg\.com/[a-z]/(\w\w)/\1',
'\bus\.yimg\.com/[a-z]/\d-/',
'\bpromotions\.yahoo\.com(:\d*)?/promotions/',
'\bcnn\.com(:\d*)?/ads/',
'ads\.msn\.com\b',
'\blinkexchange\.com\b',
'\badknowledge\.com\b',
'/SmartBanner/',
'\bdeja\.com/ads/',
'\bimage\.pathfinder\.com/sponsors',
'ads\.tripod\.com',
'ar\.atwola\.com/image/',
'\brealcities\.com/ads/',
'\bnytimes\.com/ad[sx]/',
'\busatoday\.com/sponsors/',
'\busatoday\.com/RealMedia/ads/',
'\bmsads\.net/ads/',
'\bmediaplex\.com/ads/',
'\batdmt\.com/[a-z]/',
'\bview\.atdmt\.com/',
'\bADSAdClient31\.dll\b',
) ;
- If set, replace banned images with 1x1 transparent GIF. This also replaces
- all images with the same if $TEXT_ONLY is set.
- Note that setting this makes the response a little slower, since the browser
- must still retrieve the empty GIF.
$RETURN_EMPTY_GIF= 0 ;
- To use an external program to decide whether or not a user at a given IP
- address may use this proxy (as opposed to using server configuration), set
- $USER_IP_ADDRESS_TEST to either the name of a command-line program that
- performs this test, or a queryable URL that performs this test (e.g. a CGI
- script).
- For a command-line program: The program should take a single argument, the
- IP address of the user. The output of the program is evaluated as a
- number, and if the number is non-zero then the IP address of the user is
- allowed; thus, the output is typically either "1" or "0". Note that
- depending on $ENV{PATH}, you may need to enter the path here explicitly.
- For a queryable URL: Specify the start of the URL here (must begin with
- "http://"), and the user's IP address will be appended. For example, the
- value here may contain a "?", thus putting the IP address in the
- QUERY_STRING; it could also be in PATH_INFO. The response body from the
- URL should be a number like for a command line program, above.
$USER_IP_ADDRESS_TEST= '' ;
- To use an external program to decide whether or not a destination server is
- allowed (as opposed to using @ALLOWED_SERVERS and @BANNED_SERVERS above),
- set $DESTINATION_SERVER_TEST to either the name of a command-line program
- that performs this test, or a queryable URL that performs this test (e.g. a
- CGI script).
- For a command-line program: The program should take a single argument, the
- destination server's name or IP address (depending on how the user enters
- it). The output of the program is evaluated as a number, and if the number
- is non-zero then the destination server is allowed; thus, the output is
- typically either "1" or "0". Note that depending on $ENV{PATH}, you may
- need to enter the path here explicitly.
- For a queryable URL: Specify the start of the URL here (must begin with
- "http://"), and the destination server's name or IP address will be
- appended. For example, the value here may contain a "?", thus putting the
- name or address in the QUERY_STRING; it could also be in PATH_INFO. The
- response body from the URL should be a number like for a command line
- program, above.
$DESTINATION_SERVER_TEST= '' ;
- If either $INSERT_HTML or $INSERT_FILE is set, then that HTML text or the
- contents of that named file (respectively) will be inserted into any HTML
- page retrieved through this proxy. $INSERT_HTML takes precedence over
- $INSERT_FILE. $INSERT_FILE is assumed to have contents in UTF-8.
- When viewing a page with frames, a new top frame is created and the
- insertions go there.
- NOTE: Any HTML you insert should not have relative URLs in it! The problem
- is that there is no appropriate base URL to resolve them with. So only use
- absolute URLs in your insertion. (If you use relative URLs anyway, then
- a) if $ANONYMIZE_INSERTION is set, they'll be resolved relative to this
- script's URL, which isn't great, or b) if $ANONYMIZE_INSERTION==0,
- they'll be unchanged and the browser will simply resolve them relative
- to the current page, which is usually worse.)
- The frame handling means that it's fairly easy for a surfer to bypass this
- insertion, by pretending in effect to be in a frame. There's not much we
- can do about that, since a page is retrieved the same way regardless of
- whether it's in a frame. This script uses a parameter in the URL to
- communicate to itself between calls, but the user can merely change that
- URL to make the script think it's retrieving a page for a frame. Also,
- many browsers let the user expand a frame's contents into a full window.
- [The warning in earlier versions about setting $INSERT_HTML to '' when using
- mod_perl and $INSERT_FILE no longer applies. It's all handled elsewhere.]
- As with $INSERT_ENTRY_FORM, note that any insertion may throw off any
- precise layout, and the insertion is subject to background colors and
- other page-wide settings.
#$INSERT_HTML= "<h1>This is an inserted header</h1><hr>" ;
#$INSERT_FILE= 'insert_file_name' ;
- If your insertion has links that you don't want anonymized along with the rest
- of the downloaded HTML, then set this to 0. Otherwise leave it at 1.
$ANONYMIZE_INSERTION= 1 ;
- If there's both a URL entry form and an insertion via $INSERT_HTML or
- $INSERT_FILE on the same page, the entry form normally goes at the top.
- Set this to put it after the other insertion.
$FORM_AFTER_INSERTION= 0 ;
- If the insertion is put in a top frame, then this is how many pixels high
- the frame is. If the default of 80 or 50 pixels is too big or too small
- for your insertion, change this. You can use percentage of screen height
- if you prefer, e.g. "20%". (Unfortunately, you can't just tell the
- browser to "make it as high as it needs to be", but at least the frame
- will be resizable by the user.)
- This affects insertions by $INSERT_ENTRY_FORM, $INSERT_HTML, and $INSERT_FILE.
- The default here usually works for the inserted entry form, which varies in
- size depending on $ALLOW_USER_CONFIG. It also varies by browser.
$INSERTION_FRAME_HEIGHT= $ALLOW_USER_CONFIG ? 80 : 50 ;
- NOTE THAT YOU SHOULD BE RUNNING CGIPROXY ON A SECURE SERVER!
- Note also that the meaning of '' has changed-- now, all ports except 80
- are assumed to be using SSL.
- Set this to 1 if the script is running on an SSL server, i.e. it is
- accessed through a URL starting with "https:"; set this to 0 if it's not
- running on an SSL server. This is needed to know how to route URLs back
- through the proxy. Regrettably, standard CGI does not yet provide a way
- for scripts to determine this without help.
- If this variable is set to '' or left undefined, then the program will
- guess: SSL is assumed if SERVER_PORT is not 80. This fails when using
- an insecure server on a port other than 80, or (less commonly) an SSL server
- uses port 80, but usually it works. Besides being a good default, it lets
- you install the script where both a secure server and a non-secure server
- will serve it, and it will work correctly through either server.
- This has nothing to do with retrieving pages that are on SSL servers.
$RUNNING_ON_SSL_SERVER= '' ;
- If your server doesn't support NPH scripts, then set this variable to true
- and try running the script as a normal non-NPH script. HOWEVER, this
- won't work as well as running it as NPH; there may be bugs, maybe some
- privacy holes, and results may not be consistent. It's a hack.
- Try to install the script as NPH before you use this option, because
- this may not work. NPH is supported on almost all servers, and it's
- usually very easy to install a script as NPH (on Apache, for example,
- you just need to name the script something starting with "nph-").
- One example of a problem is that Location: headers may get messed up,
- because they mean different things in an NPH and a non-NPH script.
- You have been warned.
- For this to work, your server MUST support the "Status:" CGI response
- header.
$NOT_RUNNING_AS_NPH= 0 ;
- Set HTTP and SSL proxies if needed. Also see $USE_PASSIVE_FTP_MODE below.
- The format of the first two variables is "host:port", with the port being
- optional. The format of $NO_PROXY is a comma-separated list of hostnames
- or domains: any request for a hostname that ends in one of the strings in
- $NO_PROXY will not use the HTTP or SSL proxy; e.g. use ".mycompany.com" to
- avoid using the proxies to access any host in the mycompany.com domain.
- The environment variables in the examples below are appropriate defaults,
- if they are available. Note that earlier versions of this script used
- the environment variables directly, instead of the $HTTP_PROXY and
- $NO_PROXY variables we use now.
- Sometimes you can use the same proxy (like Squid) for both SSL and normal
- HTTP, in which case $HTTP_PROXY and $SSL_PROXY will be the same.
- $NO_PROXY applies to both SSL and normal HTTP proxying, which is usually
- appropriate. If there's demand to differentiate those, it wouldn't be
- hard to make a separate $SSL_NO_PROXY option.
#$HTTP_PROXY= $ENV{'http_proxy'} ;
#$SSL_PROXY= 'firewall.example.com:3128' ;
#$NO_PROXY= $ENV{'no_proxy'} ;
- If your HTTP and SSL proxies require authentication, this script supports
- that in a limited way: you can have a single username/password pair per
- proxy to authenticate with, regardless of realm. In other words, multiple
- realms aren't supported for proxy authentication (though they are for
- normal server authentication, elsewhere).
- Set $PROXY_AUTH and $SSL_PROXY_AUTH either in the form of "username:password",
- or to the actual base64 string that gets sent in the Proxy-Authorization:
- header. Often the two variables will be the same, when the same proxy is
- used for both SSL and normal HTTP.
#$PROXY_AUTH= 'Aladdin:open sesame' ;
#$SSL_PROXY_AUTH= $PROXY_AUTH ;
- Set SOCKS proxy if needed. The format of $SOCKS_PROXY is "host:port", with
- the port being optional (defaults to 1080).
- If your SOCKS proxy supports username/password authentication, then set
- the username and password below.
- Also see @BANNED_NETWORKS above-- you'll need to remove the '127' from the
- default list if you use a SOCKS proxy on the machine where this is running,
- such as with the example here.
- NOTE THAT THE CONNECTION BETWEEN THIS SCRIPT AND YOUR SOCKS PROXY MUST BE
- TRUSTED, BECAUSE CURRENTLY ALL DATA IS SENT IN THE CLEAR BETWEEN THEM!
- In particular, the username and password below will be sent in the clear.
- The solution would be to use the GSSAPI authentication method, which many
- SOCKS proxies do not support, and which CGIProxy doesn't support yet either.
#$SOCKS_PROXY= 'localhost:1080' ;
#$SOCKS_USERNAME= '' ;
#$SOCKS_PASSWORD= '' ;
- This is one way to handle pages that don't work well, by redirecting to other working
- versions of the pages (for example, to a mobile version or another version that
- doesn't have much JavaScript). How it works: If the current domain matches one
- of the keys of %REDIRECTS, then s/// (string substitution) is done on the URL,
- using the match and replacement patterns in the 2-element value array.
- The set of sites handled this way is Facebook and Gmail, since they doesn't
- always work well, or are slow, through CGIProxy. If you want to access
- them normally, then comment out or remove the line(s) below for that site.
- If you want to redirect more sites, you can add records to the %REDIRECTS
- hash in the following way: Set the hash key to the name of the server you
- want to redirect, and the value to a reference to a 2-element array containing
- the left and right sides of an s/// string substitution. If that doesn't make
- sense, then try to emulate an example below.
- As of version 2.1.7, the full facebook.com site works pretty well, so the
- redirection below has been commented out.
- ... aaaand, as of version 2.1.8, the full Gmail site works pretty well, so the
- redirection below has been commented out.
- To improve performance with facebook or other JS-busy sites, users can:
- - close other browser windows
- - end other CPU-heavy processes on their browsing machine
- - reload the page or restart the browser when it gets too slow
- - use a browser other than MSIE (it has the most problems)
- If Gmail or facebook is still too slow or crashes a lot, you can remove the
- leading "#" on the appropriate lines below to automatically redirect to
- Gmail's HTML-only site or facebook's mobile site, which may work better.
%REDIRECTS= ( - 'www.facebook.com' => [qr#^https?://www\.facebook\.com#i, 'https://m.facebook.com'],
- 'mail.google.com' => [qr#^https?://mail\.google\.com/.*shva=\w*1.*$#i, 'https://mail.google.com/?ui=html']
) ;
- Some JavaScript-busy sites crash when visiting them through CGIProxy. Increasing
- the delay times in Window.setTimeout() and Window.setInterval() m
Related issues