26 Jan 2008 braden   » (Journeyer)

An RSS parser in PHP for SourceForge feeds

I wrote this for parsing RSS feeds associated with a SourceForge project. It should be reasonably capable for that purpose; though I do not expect it to be generally robust for handling arbitrary feeds. Reed suggested that this might be generally useful; however, I don’t want to put it anywhere that I might feel compelled to maintain it. So this seems like a good spot.

//
// Copyright 2008  Braden McDaniel
//
// This program is free software; you can redistribute it and/or modify it
// under the terms of the GNU General Public License as published by the Free
// Software Foundation; either version 3 of the License, or (at your option)
// any later version.
//
// This program is distributed in the hope that it will be useful, but WITHOUT
// ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
// FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
// more details.
//
// You should have received a copy of the GNU General Public License along
// with this library; if not, see .
//

class RSS2Image
{
    var $url = "";
    var $title = "";
    var $link = "";
    var $width = "";
    var $height = "";
}

class RSS2Item
{
    var $title = "";
    var $link = "";
    var $description = "";
    var $author = "";
    var $category = "";
    var $comments = "";
    var $enclosure = "";
    var $guid = "";
    var $pub_date = "";
    var $source = "";
}

class RSS2Channel
{
    var $title;
    var $link;
    var $description;
    var $copyright;
    var $last_build_date;
    var $generator;
    var $image;
    var $items;

    function RSS2Channel()
    {
        $this->title = "";
        $this->link = "";
        $this->description = "";
        $this->copyright = "";
        $this->last_build_date = "";
        $this->generator = "";
        $this->image = null;
        $this->items = array();
    }
}

class RSS2Parser
{
    var $parser;
    var $channel;

    var $in_channel, $in_image, $in_item;
    var $current_element;

    function RSS2Parser()
    {
        $this->in_channel = false;
        $this->in_image = false;
        $this->in_item = false;
        $this->parser = xml_parser_create();
        xml_set_object($this->parser, $this);
        xml_set_element_handler($this->parser,
                                "start_element",
                                "end_element");
        xml_set_character_data_handler($this->parser,
                                       "character_data");
    }

    function parse($data)
    {
        xml_parse($this->parser, $data);
        return $this->channel;
    }

    function start_element($parser, $name, $attribs)
    {
        $name = strtolower($name);
        if ($name == "channel") {
            $this->in_channel = true;
            $this->channel = new RSS2Channel();
            return true;
        }

        if ($this->in_channel) {
            if ($name == "image") {
                $this->in_image = true;
                $this->channel->image = new RSS2Image();
                return true;
            } elseif ($name == "item") {
                $this->in_item = true;
                $this->channel->items[] = new RSS2Item();
                return true;
            }
        }

        if ($this->in_image) {
            $this->current_element = &$this->channel->image->$name;
        } elseif ($this->in_item) {
            $this->current_element = &end($this->channel->items)->$name;
        }
        return true;
    }

    function end_element($parser, $name)
    {
        $name = strtolower($name);
        if ($name == "channel") {
            $this->in_channel = false;
        } elseif ($name == "image") {
            $this->in_image = false;
        } elseif ($name == "item") {
            $this->in_item = false;
        }
        unset($this->current_element);
        return true;
    }

    function character_data($parser, $data)
    {
        if (isset($this->current_element)) {
            $this->current_element .= $data;
        }
        return true;
    }
}

Syndicated 2008-01-26 20:26:11 (Updated 2008-01-26 20:26:49) from endoframe :: log

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!