Magpie RSS FAQ

  1. General
    1. What is MagpieRSS?
    2. What versions of RSS do you support?
    3. Where can I get more info about MagpieRSS? Where can I get help?
    4. How should I ask a question? What is the best way to get help?
    5. What is RSS? What is Atom?
    6. Is the name Magpie or MagpieRSS?
    7. My question wasn’t answered. I’ve got a better answer.
    8. Can I donate to Magpie? How can I help?
  2. Installation
    1. How do I install MagpieRSS?
  3. MagpieRSS and Caching
    1. How does Magpie caching work?
    2. Why is it important?
    3. Is caching on?
    4. Where is the cache directory?
    5. How do I know if caching is working?
    6. Caching doesn’t seem to be working, whats wrong?
    7. On Debian web-user and web-group by default will be www-data and www-data
    8. On Redhat….. (I don’t know, help?)
    9. On BSD…… (I don’t know, help?)
    10. On OS X web-user and _web-group are www and www
    11. I can’t follow the above example because I don’t have sufficient permissions (or I don’t have a shell account, or I don’t understand)
    12. put your cache in the /tmp directory like so
    13. make your cache directory world writeable.
    14. move your cache into a database
  4. Troubleshooting
    1. Error: “Failed to load PHP’s XML Extension.”
    2. Warning: MagpieRSS: Failed to parse RSS file. (not well-formed (invalid token) at line x, column y)
    3. Fatal error: Call to undefined function: array_change_key_case()
    4. Error: MagpieRSS: Failed to fetch http://example.com/rss.xml. (HTTP Error: connection failed (1)
  5. In Use
    1. How do I display the full HTML of an item? How do I access the content:encoded field?
    2. How do I find out what fields Magpie supports? Does Magpie support foo?
  6. The Cookbook:
    Solutions to common programming challenges.

    1. Limit the number of headlines (aka items) returned.
    2. Display a custom error message if something goes wrong
    3. Generate a new rss feed
    4. Display headlines more recent then a given date
    5. Parse a Local File Containing RSS

General

  1. What is MagpieRSS?
    Okay, this actually hasn’t been asked much, but MagpieRSS (aka Magpie) is an
    RSS and Atom parser for PHP.
  2. What versions of
    RSS do you support?MagpieRSS parses RSS 0.9, RSS 1.0, the various Userland RSS verions (0.9x
    and 2.0). Additionally it supports Atom 0.3, and many custom RSS
    namespaces.
  3. Where can I get more info about MagpieRSS?
    There is Mapgie links page which includeslinks to tutorials, howto, and open source projects using Magpie (good place
    to start if you’re looking for examples)Lastly is a mailing list which can be a good place to get help.
  4. How should I ask a question?
    What is the best way to get help?Okay, no one asks this question, but they should at least ask themselves
    this question.When asking a question:

    1. Check the feed at the feed validator

    2. Include the URL of the RSS feed that is causing problems. Without

    this, we can’t help you, we just can’t.

    3. Explain what problem you’re seeing.

    4. Include which version of PHP and which version of Magpie you’re using.

  5. What is RSS? What is Atom?…
  6. Is the name Magpie or MagpieRSS?
    Officially the name is MagpieRSS, but Magpie is the affectionate nickname,
    and probably more accurate since Atom 0.3 support was added.
  7. My question wasn’t answered. I’ve got a better answer.
    Use the mailing list
  8. Can I donate to Magpie?
    How can I help?I swear, that is a frequently asked question.The other best way to help is answer questions on the mailing list,
    and submit question and answer pairs for the FAQ.

Installation

Magpie consists of 4 files (rss_fetch.inc, rss_parser.inc, rss_cache.inc, and rss_utils.inc), and the directory extlib (which contains a modified version of the Snoopy HTTP client)

Copy these 5 resources to a directory named ‘magpierss’ in the same directory as your PHP script.

At the top of your script add the following line:

require_once('magpierss/rss_fetch.inc');

Now you can use the fetch_rss() method:

$rss = fetch_rss($url);

Done.

IMPORTANT: you’ll probably want to get the cache directory working in order to speed up your application, and not abuse the webserver you’re downloading the RSS from.

Optionally you can install MagpieRSS in your PHP include path in order to make it available server wide.

Lastly you might want to look through the constants in rss_fetch.inc see if there is anything you want to override (the defaults are pretty good)

For more info, or if you have trouble, see TROUBLESHOOTING

See README for more details on using MagpieRSS.

MagpieRSS and Caching

  1. How does Magpie caching work?
    When Magpie successfully fetches and parses a feed, it saves the results PHP object to a file in the “cache directory”. (this is called “serializing”) Next time Magpie is asked to fetch that feed, Magpie will check for a cached version first.
  2. Why is it important?
    1. Pages will load much faster with caching enabled. Rather then having to fetch and parse the feed each time the page is served, you do these slow operations once per hour (for example), and everyone else will see the speed up.2. Many sites will ban you if you fetch their RSS feed too frequently (or at least complain). Caching keeps you from doing this.3. Conditional GETs are an important technique for reducing bandwidth consumption, and only work if the cache system is enabled.

    4. If server is down, or slow enough to time out Magpie can continue to serve the old (stale) version of RSS until the remote server comes back.

  3. Is caching on?
    Magpie ships with caching on by default, so unless you turned it off Magpie will try to use the cache system.
  4. Where is the cache directory?
    By default Magpie will attempt to create a directory named ‘cache’ in the working directory of the PHP script which invoked it. That is to say, if you have a script named blog.php that resides at /var/www/mysite/blog.php that uses Magpie, Magpie will attempt to create the cache directory /var/www/mysite/cachYou can override this default with

    define('MAGPIE_CACHE_DIR', '/var/foo/magpie/cache/dir/for/example');
    
  5. How do I know if caching is working?
    Check inside your cache directory for files with names like ’25cd55bbc2766c84b57a3302daa8ba2e’Alternately if you can’t find a cache directory try turning on debugging (see: How to debug Magpie), and look for an error message
    “Cache couldn’t make dir ….”
  6. Caching doesn’t seem to be working, whats wrong?
    Is is a very frequent question. A number of things could be wrong, the most common is that the your web server does not have permission to write to your working directory. In this case you’ll want to manually create the cache directory and make it web writeable. How to do this varies from platform to platform, and host to host, but the basic idea is:

    mkdir /var/www/mysite/cache;
    chown _web-user_:_web-group_ /var/www/mysite/cache;
    
    • On Debian web-user and web-group by default will be www-data and www-data
    • On Redhat….. (I don’t know, help?)
    • On BSD…… (I don’t know, help?)
    • On OS X web-user and _web-group are www and www
  7. I can’t follow the above example because I don’t have sufficient permissions (or I don’t have a shell account, or I don’t understand)Turning off caching is never a good idea so I recommend figuring out someway to make it work. A few options are:
    • put your cache in the /tmp directory like so
      define(‘MAGPIECACHEDIR’, ‘/tmp/magpie_cache’);
      this approach has some security issues that will be addressed in a future version of Magpie.
    • make your cache directory world writeable.always a bad idea, I’m not going to cover how to do this
    • move your cache into a databasewon’t be as fast, but is one solution, I’ll discuss more in the future

Troubleshooting

  1. Error: “Failed to load PHP’s XML Extension.”Magpie depends on PHP to be compiled with XML support, if it hasn’t been
    you’ll need to rebuild your PHP to support it (or get your ISP to)http://www.php.net/manual/en/ref.xml.php
  2. Warning: MagpieRSS: Failed to parse RSS file. (not well-formed (invalid token) at line x, column y)The RSS feed you’re trying to parse contains an invalid character.Check it at: http://feedvalidator.orgIf the feed validator doesn’t find a problem then send an email to the mailing list with the problem you’re experiencing and the URL of the
    feed which is causing the error.Some RSS parser are based on regular expressions, and can
    parse invalid RSS but they have their own problems.
  3. Fatal error: Call to undefined function: array_change_key_case()Magpie requires at least PHP 4.2.0 (released April, 2002), and has been
    tested to work on all versions of PHP including PHP5If you must use an ancient version of PHP, download the following file, and
    include it in your scripts.

    http://cvs.php.net/pear/PHP_Compat/Compat/Function/array_change_key_case.php

  4. Error: MagpieRSS: Failed to fetch http://example.com/rss.xml. (HTTP Error: connection failed (1)A connection error of type 1 means “permission denied”. This usually means that your
    ISP has configued PHP so that it can’t open outgoing sockets (usually for security reasons).The only solution to this is to ask your ISP for help.

    Sometimes you’ll also get the related connection failed (11) (e.g. on sourceforge.net)
    which also means PHP is configured in such a way that Magpie can’t work.

In Use

How do I display the full HTML of an item? How do I access the content:encoded field?

echo $item['content']['encoded'];
  1. How do I find out what fields Magpie supports? Does Magpie support foo?The simplest way to find out if Magpie can parse a given field is to find a feed with that field and test it using the scripts/magpie_debug.php from a recent version of Magpie. This will display a var_dump() of the parsed RSS object. Look for you fields.For example if we dump the RSS feeds from the Magpie blog we could scroll down until we found:
    ["items"]=>
    
        array(10) {
        [0]=>
            array(9) {
                ["about"]=>
                    string(41) "http://laughingmeme.org/magpie_blog/?p=83"
                ["title"]=>
                    string(32) "Consumer Recall on MagpieRSS 0.7"
                ["link"]=>
                    string(41) "http://laughingmeme.org/magpie_blog/?p=83"
                ["dc"]=>
                   array(3) {
                      ["date"]=>
    
                         string(20) "2004-12-12T18:59:00Z"
                      ["creator"]=>
                         string(34) "kellan (mailto:kellan@protest.net)"
                      ["subject"]=>
                         string(8) "LM"
                   }
                ["description"]=>
                 string(302) "We have reports of certain..."
                 ["content"]=>
                 array(1) {
                   ["encoded"]=>
                   string(595) "We have reports of certain models of MagpieRSS 0.7..."
                 }
                 ["date_timestamp"]=>
    
                 int(1102877940)
               }
    

    From this we can see that Magpie successfully found and parsed the [Dublin Core] and [content] modules, as well as the default fields.

    In general Magpie will support name field of the following form, whether or not it has ever heard of it:

     <field_name>value</field_name>
    

    or

     <namespace:field_name>value</namespace:field_name>
    

The Cookbook

Solutions to common programming challenges.

  1. Limit the number of headlines (aka items) returned.

    Problem

    You want to display the 10 (or 3) most recent headlines, but the RSS feed
    contains 15.

    Solution

    $num_items = 10;
    $rss = fetch_rss($url);
    
    $items = array_slice($rss->items, 0, $num_items);
    

    Discussion

    Rather then trying to limit the number of items Magpie parses, a much
    simpler, and more flexible approach is to take a “slice” of the array of
    items. And array_slice() is smart enough to do the right thing if the
    feed has less items then $num_items.

    See: http://www.php.net/array_slice

  2. Display a custom error message if something goes wrong

    Problem

    You don’t want Magpie’s error messages showing up if something goes wrong.

    Solution

    # Magpie throws USER_WARNINGS only
    # so you can cloak these, by only showing ERRORs
    error_reporting(E_ERROR);
    
    # check the return value of fetch_rss()
    
    $rss = fetch_rss($url);
    
    if ( $rss ) {
        ...display rss feed...
    }
    else {
        echo "An error occured!  " .
            "Consider donating more $$$ for restoration of services." .
            "<br>Error Message: "   . magpie_error();
    }
    

    Discussion

    MagpieRSS triggers a warning in a number of circumstances. The 2 most
    common circumstances are: if the specified RSS file isn’t properly formed
    (usually because it includes illegal HTML), or if Magpie can’t download the
    remote RSS file, and there is no cached version.

    If you don’t want your users to see these warnings change your
    errorreporting settings to only display ERRORs. Another option is to turn
    off display
    error, so that WARNINGs, and NOTICEs still go to the error_log
    but not to the webpages.

    You can do this with:

    ini_set('display_errors', 0);
    

    See:

    • http://www.php.net/error_reporting,
    • http://www.php.net/ini_set,
    • http://www.php.net/manual/en/ref.errorfunc.php
  3. Generate a new rss feed

    Problem

    Create an RSS feed for other people to use.

    Solution

    Use Useful Inc’s RSSWriter

    Discussion

    An example of turning a Magpie parsed RSS object back into an RSS file is
    forth coming. In the meantime RSSWriter has great documentation.

  4. Display headlines more recent then a given date

    PROBLEM

    You only want to display headlines that were published on, or after a
    certain date.

    SOLUTION

    require 'rss_utils.inc';
    
    # get all headlines published today
    $today = getdate();
    
    # today, 12AM
    $date = mktime(0,0,0,$today['mon'], $today['mday'], $today['year']);
    
    $rss = fetch_rss($url);
    
    foreach ( $rss->items as $item ) {
        $published = parse_w3cdtf($item['dc']['date']);
        if ( $published >= $date ) {
            echo "Title: " . $item['title'];
            echo "Published: " . date("h:i:s A", $published);
            echo "<p>";
        }
    }
    

    DISCUSSION

    This recipe only works for RSS 1.0 feeds that include the

    field.
    (which is very good RSS style)

    parsew3cdtf is defined in rssutils.inc, and parses RSS style dates into
    Unix epoch seconds.

    See: http://www.php.net/manual/en/ref.datetime.php

  5. Parse a Local File Containing RSS

    PROBLEM

    MagpieRSS provides fetch_rss() which takes a URL and returns a
    parsed RSS object, but what if you want to parse a file stored locally that
    doesn’t have a URL?

    SOLUTION

    require_once('rss_parse.inc');
    
    $rss_file = 'some_rss_file.rdf';
    $rss_string = read_file($rss_file);
    $rss = new MagpieRSS( $rss_string );
    
    if ( $rss and !$rss->ERROR) {
    ...display rss...
    }
    else {
        echo "Error: " . $rss->ERROR;
    }
    
    # efficiently read a file into a string
    # in php >= 4.3.0 you can simply use file_get_contents()
    #
    function read_file($filename) {
        $fh = fopen($filename, 'r') or die($php_errormsg);
        $rss_string = fread($fh, filesize($filename) );
        fclose($fh);
        return $rss_string;
    }
    

    DISCUSSION

    Here we are using MagpieRSS’s RSS parser directly without the convience wrapper
    of fetch_rss(). We read the contents of the RSS file into a
    string, and pass it to the parser constructor. Notice also that error handling
    is subtly different.

    See: href=”http://www.php.net/manual/en/ref.filesystem.php

Comments are closed.