I’m building a web application, which I’m hoping will support itself through advertising. I’m using planning to use both Google AdSense and Amazon Associates (both links and widgets). Separately, I decided that the most convenient way to serve the “static” help pages, documentation, etc. for my application was to run a separate wiki in parallel to it, and link from my web app into the wiki using custom page themes (I picked MediaWiki). And finally, it occurred to me that some of my help pages are basically introductions to concepts or technology that (Ah-ha!) people might want to read more about in books. Which I should suggest to them….
So, that’s the chain of thinking that led me to the point, the other day, when I was trying to add Amazon text-link ads to text I was writing in MediaWiki. And it turned out to be far trickier than I ever would have guessed.
The HTML that Amazon generates for you to use for a text-link advertisement contains the markup for a link and for the inclusion of a single-pixel image. The link is displayed to the user and how they go from seeing the ad to Amazon’s web site, and the image is used by Amazon’s analytics for impression counting. Because I wanted to embed these ads into a wiki page, I had to convert the HTML generated by Amazon into MediaWiki wiki markup. For example, the following is the HTML for a text advertisement generated by Amazon (spacing modified for clarity):
<a
href="http://www.amazon.com/gp/product/0132350882?ie=UTF8&tag=wontology-20&linkCode=as2&camp=1789&creative=390957&creativeASIN=0132350882">
Clean Code: A Handbook of Agile Software Craftsmanship
</a>
<img src="http://www.assoc-amazon.com/e/ir?t=wontology-20&l=as2&o=1&a=0132350882"
width="1" height="1" border="0" alt=""
style="border:none !important; margin:0px !important;" />
The first tag is the anchor creating the link, and the second causes the browser to fetch the invisible image from Amazon’s servers any time a page containing the link is displayed. The ad is rewritten in wiki markup as:
[href="http://www.amazon.com/gp/product/0132350882?ie=UTF8&tag=wontology-20&linkCode=as2&camp=1789&creative=390957&creativeASIN=0132350882 Clean Code: A Handbook of Agile Software Craftsmanship] http://www.assoc-amazon.com/e/ir?t=wontology-20&l=as2&o=1&a=0132350882
The URL in brackets is rendered using the text following it in the brackets. A bare URL that points to a web page would be displayed using the text of the URL itself as a link, but bare URLs for images are handled differently. By default, MediaWiki requires images that are to be displayed within its pages to first be uploaded to the wiki server, and then incorporated in pages using wiki markup like [[File: ...]] or [[Image: ...]]. However, copying the single-pixel image from Amazon’s servers and uploading it to the wiki would defeat the purpose. The user isn’t supposed to see the image, the important part is that the image be fetched by a user’s browser every time the page containing the image is shown. So, MediaWiki must embed the image into the wiki page with a URL that points to a server other than itself. The MediaWiki documentation refers to this as an “external image.”
When MediaWiki is installed external images are disabled by default. Turning support for them on is accomplished, like most other MediaWiki customizations, by modifying the LocalSettings.php source file. There are two settings that can be added to enable external images in a MediaWiki installation, $wgAllowExternalImages and $wgAllowExternalImagesFrom. The former is simply set to true to enable inline images fetched from any external server, while the latter can be initialized with an array listing the domains from which external image fetches should be permitted. (The preceding links are to the respective MediaWiki manual pages.)
So, I modified my MediaWiki server’s configuration, put my modified ad link into my text, and instead of my pages containing an invisible single-pixel image following each ad link, they displayed the full text of the image’s URL. Hmmmm….
For a while I was puzzled, but eventually figured out that my problem wasn’t that I hadn’t enabled external images correctly, it was that MediaWiki couldn’t identify Amazon’s URL as pointing to an image. When MediaWiki parses the wiki text of a page, it distinguishes image URLs from page URLs based on the presence of a file name extension matching one of the well-known image formats. Amazon’s image link has no extension, and since MediaWiki doesn’t actually fetch from the link, it doesn’t receive the MIME type of the image. It therefore assumes that the URL points to an external web page, and presents it accordingly.
The code that checks this is located in the file includes/parser/Parser.php, and looks like:
function maybeMakeExternalImage( $url ) {
##snip##
if ( $this->mOptions->getAllowExternalImages()
|| ( $imagesexception && $imagematch ) ) {
if ( preg_match( self::EXT_IMAGE_REGEX, $url ) ) {
# Image found
$text = $sk->makeExternalImage( $url );
}
}
##snip##
The set of URLs that are identified as pointing to images is determined by the regular expression EXT_IMAGE_REGEX, defined earlier in the same source file. Modifying the regular expression to identify the Amazon image URLs would have been prohibitively complex—the “or” terms required to identify a host name as an alternative to a file name extension would have more than tripled the length of the expression. Instead, I decided to simply modify the detection code, like this:
function maybeMakeExternalImage( $url ) {
##snip##
if ( $this->mOptions->getAllowExternalImages()
|| ( $imagesexception && $imagematch ) ) {
if ( preg_match( self::EXT_IMAGE_REGEX, $url ) ||
preg_match( "/assoc-amazon\.com/", $url ) ) {
# Image found
$text = $sk->makeExternalImage( $url );
}
}
##snip##
The regular expression I actually use also checks for my Amazon advertiser ID, and perhaps is too specific. If I were going to create a change that could be submitted to the MediaWiki project based on this fix, I’d create a new configuration variable that would contain the regular expression to be used, so that it could be set by wiki site administrators from LocalSettings.php without having to hack a source file.
So, if you’ve been having problems getting external images to work in your MediaWiki installation, whether they’re ads or not, this might be the fix you need.