Sitescooper 3.0.2 released -- and about time too ;)

Been a long time since I updated the diary. There's a few reasons:

  • been busy :( -- trying to get up a head of steam to fight software patents in Europe -- Ireland is backing the move, so I'm trying to get some ILUG members (myself included) to fight it. Problem is, I don't know where to start, myself -- letterwriting and political campaigning are not my strong points :(

  • Also, I don't think recentlog.html is scaling, it's too difficult to follow the diaries. Generally if I check my diary the morning after posting, it's already scrolled off. This makes it very tricky to be bothered posting, if there's a 90pc chance no-one's going to read it... after all, who actually goes to a /person page to read their diaries? 's the tragedy of the commons, innit. ;)

But notwithstanding the latter point, I'll throw a few opinions into the ether on what I've read in other diaries. And might as well do an update on WebMake and sitescooper...

---- WebMake

Released 0.7. It works quite well, generates sitemaps, breadcrumb trails, back/forward navigation links, and other nifty metadata things. Not sure what needs to be done next... I have a few non-urgent plans:

generate RDF sitemaps

as suggested in Dan Bricklin's paper, URL on the WebMake todo list. This could be cool, esp. if it can be reused to generate RSS "what's new" lists for My Netscape, Scripting News, oreilly.net, etc.

access to stat() data on links

Allow automatic generation of file size info, by making file size a metadatum on a content item -- this'd be handy for download pages.

come up with an intermediate XML format for EtText

caolan suggested this one, and it's a goodie. If EtText generates an XML format instead of plain XHTML, it may be a neat way of (a) allowing more flexible styling of the HTML, (b) allowing other output formats (WML, DocBook, etc.), (c) some neat XSL tricks.

"edit-in-browser" functionality

Throw in a CGI which can parse and edit WebMake files and EtText, and you've got good ol' "edit-in-browser" as seen on Advogato, editthispage.com, blogger, etc.

Mebbe I'll just let it get stable first though.

---- Sitescooper

Not much here -- need to fix the NYT login problem (again). Lots of hassle with sites blocking us out of their "AvantGo versions"; AG are taking a strong line with the sites to block us out, it looks like. Nasty.

Mandrake caused a bit of a stink recently, with their announcement that Mandrake News and the Mandrake Forum would be made palm-readable with AvantGo, and not a mention of sitescooper or Plucker. So I've made a site file for MF, which AG still can't handle ;).

Michael Nordström from Plucker asked for the URL of their PDA-friendly version, but no response. hmm.

Maybe we should look into making a sitescooper-on-Mandrake RPM for their Cooker distro, and subvert from the inside ;)

---- Comments

lkcl --

i was going to have to send < and friends because of the break-ups in the data flow: jabber has a wrapper around data called a <stream>. this is where things start to get scary.

It's a nasty problem -- you could try using CDATA sections, which act as unreadable blocks of data, XML tags in there won't get parsed. Not sure how well libxml supports 'em though.

mrorganic mentioned:

Personal: got the QNX/RTP stuff loaded and working last night. I haven't done much with it yet, but I already know I like it better than anything I've gotten running on Linux. Photon makes X look like the buggy, bloated hack job that it is. I haven't made much use of PhAB yet (the GUI-builder for Photon), and reports indicate it is still unstable, but I'll probably play around with it a bit tonight and see what it's capable of.

I've always been a fan of OSes like VxWorks and QNX because they seem so much *cleaner* than other architectures.

I've been using QNX4 (the previous version before RTP) for the last year + 1/2. It's not much cleaner than Linux, it just has less functionality. And oh, the bugs, don't get me started ;)

BTW someone mentioned shouldexist.org. There's also halfbakery.com with a similar anti-patents concept.

thomasq, the graduation gowns are colour-coded according to institution and the type of degree (BA, M.Sc etc.) -- just encountered this recently at my GF's Ph.D graduation. The stage looked like someone had gone crazy with the flood-fill.

Great paper from the O'Reilly OSS Convention in Monterey about Salon's CMS system. Looks cool, must nick some ideas ;)

Hey caolan, re: QNX -- don't believe the hype! It's nice, but not that nice... mark it up as a bit like Be.

Released Sitescooper 3.0.1 today, with quite a few bugs fixed and lots of new sites. It's nice to put that one to bed for a few days; maybe I can get back to WebMake for a while and fix a dependencies-with-perl-code problem.

BTW -- sitescooper users -- note that sitescooper.cx will be disappearing soon. It's sitescooper.org from now on. Those cheap sods in the .cx ccTLD registry folded their "free domains for open source projects" less than 6 months after it was first offered, so I'm f---ed if I'm going to pay them for a .cx after that.

Anyway, nothing I like better in the routine code maintainance dept than firing up the profiler, spotting a hotspot, spending 15 minutes refactoring it and getting a 10% speedup. Beauty!

In other news -- I joined FoRK and got a mail from James Casey, who (a) actually is a friend of Rohit Khare, like the list sez, and (b) I haven't seen in ages. He's apparently off in That London at the mo', but pints will be had next time we're in the same city I should hope.

Argh, netscape 4.75 crashed while editing the diary, probably due to some wierdness where AbiWord mucked up my fonts. Looking forward to an X11 where fonts just work :(

Anyway, released WebMake 0.5 last night.

It's pretty nice already for static, informational sites like homepages etc.; I rejigged the Irish Internet Users pages to use it in 5 minutes, which was handy, and it's a big improvement on what I had there previously.

However I need to add more support for sites where the index page is dynamically generated from a list of static story files. Here's how it works currently:

  1. WebMake file indicates location of one or more story archives, containing 1 story per file

  2. each file can also include meta tags to indicate metadata, like its title, one-line abstract, priority (aka score), section, etc.

  3. some perl code gets the names of all the story content items

  4. perl code then sorts them by section, score and title

  5. foreach item, set title, url, abstract, section, score variables, and fill out a user-specified template with them

  6. set a content item to contain that list

  7. list is written to whatever <out> files it's used in.

That's all well and good, but it's not tidy; the Perl code makes it too messy... I think steps 3 to 6 need tidying up, and possibly some kind of no-perl-required way to do it.

Joined FoRK, so now I'm thoroughly snowed ;)

WebMake now has a significant chunk of CMS magic included, in that it can handle metadata and use this to order and query content chunks, in order to generate indices and sitemaps. And better, the dependency checking works with it, so unchanged files do not even need to be read to get their metadata, it's cached in a per-site db file.

BTW the big win of WebMake's dependency support is that it means that WebMake is a CMS which works with web caches nicely. Wes Felter's HtP site brought this point up on the radar last month with a pointer to Resin's caching system.

Anyway, 0.4, just released, does this nicely, and even has some doco ;)

It's getting to the stage where it's satisfied the functionality I needed it to have, so I'll probably be slowing down soon and letting it accumulate some bugfixes and get stable.

One thing first, though: the CVS code now can generate a sitemap using only 3 types of data:

  • an "up" metadatum, pointing to the content item that is "up" from the current node

  • a "root" attribute on a content item, indicating that it's the root of the content tree

  • a pair of content templates which will be filled out with the details of each node, to generate the list

This is a beaut. It means that an RSS site summary file, or even a Slashdot-style "front page", can be generated entirely using a <sitemap> tag. Well, nearly -- I still need to write support for the visibility time range metadata types...

Other thing on the TODO list: allow WebMake to get content from an external command, and write up a doco on how WebMake can be used from within mod_perl to act as a conventional, dynamic-server-pages style system.

Hmm.... wonder what the wiki tag does? BTW still need a project tag ;)

WebMake now boasts dependency checking, so it won't remake a page that does not contain a chunk of content that has changed. It also now shares a link glossary throughout the entire site (if you use the builtin EtText editable-text format), and does a great job of beautifying the output HTML.

Disappointingly though, I seem to be the only user at the moment. Go on, take a look, it really is pretty neat... ;)

Can't work out why no-one's checked it out, though. Is there some definition somewhere stating that CMSes must be built using dynamic, server-page technologies?

Holy shit -- I've just found HyperPerl on the c2.com Wikibase Wiki. What an insane concept... for an example check out Wiki in HyperPerl.

Believe it or not, that is executable -- well, it passes through a preprocessor which generates a normal perl script, but the code is extracted from those pages. So there you are -- your code is (very) heavily documented, and due to the way the function references are hyperlinks, it inherently looks like a LXR.

Incredible! What would you call this? Hypertextual literate programming?

Been on holidays since last Wednesday, so quite a lot of stuff to catch up on. However, MiniNTK sported a link to this beauty:

The hackers are members of a cult based in Finland called The Free Source that, among other things, practices communal ownership of software. Its members release their software under something called the Glorious People's License (or GPL) which basically states that no one can own the software or put restrictions on copying it.

"The Free Source has been recruiting on line for years now," says Ted Phillips, an expert on modern cults, "Their membership probably numbers in the thousands, although it is difficult to tell. They often work by enticing teens and young adults with the promise of free software and beer, before they start encouraging them to read parable-laced screeds that further indoctrinate them into the cult. They have been relatively harmless in the past, but now that they seem to be trying to destroy parents' abilities to protect their children it is clear that they are a danger to our society."

Is this so? If so, where's the beer?! Nobody promised me any beer...

