voidynullness

(mis)adventures in software development...

    
23 January 2014

Upgrading to Pelican 3.3

Share
Category Web Development

Bothersome broken pipes and pesky HTTP headers: my experiences upgrading to Pelican 3.3.

article header image

Maybe it’s just me, or maybe it comes with the territory of being a geek with a blog, but seems I often have to fight the urge to tinker with the blogging technology I’m using — in my case Pelican — in order to get any actual blogging done.

Initially, the tinkering took the form of endlessly tweaking the custom theme I’d hacked together based on Bootstrap 2. Eventually the theme got to the point where I was happy enough with it to actually leave it alone and get some writing done.

Then Bootstrap 3 came out, deliberately breaking backward compatibility with version 2, which meant upgrading my theme would require substantial tinkering. But I resisted, and stayed with the Bootstrap 2 theme for the time being. Then Pelican 3.3 came out, and once again I resisted upgrading, just in case I’d have to deal with time-consuming upgrade issues. Then Font Awesome 4 came out, also breaking compatibility with the previous version.

Eventually, I could resist the urge to tinker no longer, and decided to do the work to upgrade all three in one time-sucking swoop. Starting with the Pelican upgrade.

So it was with the usual slight sense of trepidation that I fired up my virtualenv and gave the command:

pip install --upgrade pelican

Turned out my trepidation was somewhat justified. Things broke.

Pelican 3.3 devserver issues

After doing a make serve, things didn’t look quite right:

127.0.0.1 - - [21/Jan/2014 21:41:51] "GET /theme/css/pygment.css HTTP/1.1" 200 -
WARNING:root:Unable to find /theme/css/pygment.css file.
WARNING:root:Unable to find /theme/css/pygment.css.html file.
WARNING:root:Unable to find /theme/css/pygment.css/index.html file.
127.0.0.1 - - [21/Jan/2014 21:42:24] "GET /blog/2014/01/21/upgrading-pelican-3-3/ HTTP/1.1" 200 -
WARNING:root:Unable to find /blog/2014/01/21/upgrading-pelican-3-3/ file.
WARNING:root:Unable to find /blog/2014/01/21/upgrading-pelican-3-3/.html file.
127.0.0.1 - - [21/Jan/2014 21:42:24] "GET /blog/2014/01/21/upgrading-pelican-3-3/ HTTP/1.1" 200 -
----------------------------------------
Exception happened during processing of request from ('127.0.0.1', 47703)
Traceback (most recent call last):
  File "/usr/lib/python2.7/SocketServer.py", line 295, in _handle_request_noblock
    self.process_request(request, client_address)
  File "/usr/lib/python2.7/SocketServer.py", line 321, in process_request
    self.finish_request(request, client_address)
  File "/usr/lib/python2.7/SocketServer.py", line 334, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/usr/lib/python2.7/SocketServer.py", line 651, in __init__
    self.finish()
  File "/usr/lib/python2.7/SocketServer.py", line 704, in finish
    self.wfile.flush()
  File "/usr/lib/python2.7/socket.py", line 303, in flush
    self._sock.sendall(view[write_offset:write_offset+buffer_size])
error: [Errno 32] Broken pipe
----------------------------------------

So there’s the logging thing for starters. Seems one of the Pelican 3.3 devserver “improvements” is to output 3 “Unable to find” log messages for each file it serves, even if the file is actually found and served successfully. While initially confusing, it’s only a superficial annoyance.

The “broken pipe” errors were more concerning though, if only because it wasn’t clear whether they were being caused by something messed up in my system’s configuration, or they indicated a bug somewhere in Pelican.

Initially, despite the broken pipe errors, it looked like the pages were still being served correctly. Not always, as I soon found out.

Because I was hacking away at my theme, I was not only using the development server quite a bit, but also scrolling to specific parts of pages (like the bottom) to see how things were looking as I made modifications to the theme. I soon noticed things were definitely not looking right, for reasons other than my limited design skills:

Web page with HTTP headers showing

That screenshot shows what should be the bottom of the page. Obviously the HTTP headers should not be there, but neither should the truncated 2nd copy of the page that appears after the HTTP headers.

I noticed these two things — the broken pipe errors and the HTTP headers in the web page — were happening frequently, but not on every page view, so it was one of those annoying intermittent problems.

I googled and found some reports of similar problems, but no where near as many as I’d expected, given that I was experiencing this really often.

Was it something in my setup?

I had a look at the Pelican source on GitHub, and sure enough, there was new stuff added to devserver in version 3.3.

Before version 3.3, the Pelican development server was basically just the standard Python SimpleHTTPServer. But in version 3.3, a small wrapper was added around SimpleHTTPServer to provide some extra functionality. The extra functionality was some basic nginx-style URL re-writing rules, to provide the illusion of “clean” URLs, without actually having to configure clean URLs the “Pelican way” (i.e. using the *_URL and *_SAVE_AS settings so that content ends up in a directory containing an index.html).

Personally I was quite happy with having clean URLs as directories. It seems a bit hacky at first, but works well in practice, and keeps things appropriately simple. While I can see how this new devserver functionality might be occasionally useful, I would question the need to make this the default behaviour, especially since (AFAIK) it’s entirely undocumented.

After a bit of investigation and poking around, I came to the conclusion that both problems I was experiencing with devserver were the result of a bug introduced in the version 3.3 changes to server.py. (And not anything in my setup.) Somewhat ironically, in an attempt to provide “fake” clean URLs, those changes broke handling of “proper” clean URLs (i.e. serving of all URLs that resolve to an actual directory on the server).

I suspect the reason there weren’t as many reports of these problems as I expected was because they are more likely to occur if Pelican is configured to generate “proper” clean URLs. This is something I’ve tried to do consistently in my configuration, but it seems I must be in the minority. The fact they’re intermittent and not exactly show stopping (though annoying/confusing) is no doubt also a factor.

In any case the whole thing bugged me enough that I decided to do a bit of tangential tinkering to try and fix the problem. (BTW, anyone out there know why SimpleHTTPRequestHandler is not a new style class under Python 2.7?)

Workaround

For anyone also regularly experiencing broken pipe errors with the Pelican 3.3 devserver and/or mangled HTML being served, there are a few ways to work around this. Obviously you could just incorporate the patch into your own setup.

But for those who don’t need the new stuff added in 3.3 (and I imagine very few people would), there are simpler options.

One is just to use SimpleHTTPServer on the command line:

cd output; python -m SimpleHTTPServer

Another, for those who want to keep using the devserver, is to change one line in Pelican’s server.py file. If you’re using a virtualenv (and if not, you should be), this file will be located somewhere like this:

~/path/to/virtualenv/lib/python2.7/site-packages/pelican/server.py

Edit server.py, replacing this line:

Handler = ComplexHTTPRequestHandler

With this:

Handler = srvmod.SimpleHTTPRequestHandler

This essentially disables the new new Pelican 3.3 devserver functionality (with the buggy bits) so you essentially get the same behavior as Pelican 3.2 — plain old SimpleHTTPRequestHandler. The makefile serve and devserver targets should now work properly.

Configuration changes for Pelican 3.3

There weren’t many changes to pelicanconf.py required after upgrading to 3.3.

The only significant change was as a result of FILES_TO_COPY being deprecated. It has been replaced with STATIC_PATHS and EXTRA_PATH_METADATA.

So in pelicanconf.py this:

STATIC_PATHS = ['images', ]

FILES_TO_COPY = (('other/robots.txt', 'robots.txt'),
                 ('other/favicon.ico', 'favicon.ico'),
                 ('other/arbitrary.html', 'arbitrary.html'),
                 )

Became this:

STATIC_PATHS = ['images',
                'other/robots.txt',
                'other/favicon.ico',
                'other/arbitrary.html',
                ]

EXTRA_PATH_METADATA = {
    'other/robots.txt': {'path': 'robots.txt'},
    'other/favicon.ico': {'path': 'favicon.ico'},
    'other/arbitrary.html': {'path': 'arbitrary.html'},
    }

The official docs don’t go into a huge amount of detail about EXTRA_PATH_METADATA, and I’m not sure if I’ve got things configured quite right. In the above excerpts, arbitrary.html represents a few arbitrary HTML files I want Pelican to copy to the root output directory. I want them copied unmodified, that’s it. But for reasons I don’t understand Pelican seems to be looking at the contents, possibly wanting to process them in some way, because I keep getting errors like this:

ERROR: Skipping other/arbitrary.html: could not find information about 'title'

I didn’t see anything like this in version 3.2, and I’m not sure what it’s complaining about. Sure, those arbitrary HTML files in question don’t actually have a title — they are just HTML fragments, mostly for use by tools and not for human/browser consumption. I just want the files copied. And they are, in fact, being copied to where I want them, so in that respect things are working. I’m just not sure why Pelican considers this an “error”.

UPDATE: I eventually found a solution to this.

Success

After all that however, with a fixed devserver and image references appropriately tweaked, I did manage to get a functional, updated pelican setup, along with a mostly error/warning free build of my blog.

With that out of the way, it was now time to go ahead with updating my blog to use the latest versions Bootstrap and Font Awesome. Or at least find creative ways to procrastinate.

But that’s possibly a topic for another blog post.


 

Comments