Wikipedia

From iPodLinux

Jump to: navigation, search

This is about getting Wikipedia downloaded and readable on the iPod. At the moment all progress is in text format, but robamler is planning a ebook viewer that should be superior in that it will have links and compression.

Contents

Howto

Here are instructions on how to convert some or all of Wikipedia. This is for Linux users, but if you are a smart Mac or Windows user, you should be able to figure it out for yourself.

First, download the latest Wikipedia MySQL dump from http://dumps.wikimedia.org (download cur under en.wikipedia).

This will take a while, giving you time to install mediawiki...

Download the latest version of mediawiki to your htdocs folder.

Extract it:

cd /var/www/localhost/htdocs/
tar -xvzpf mediawiki-*.tar.gz

Rename the extracted directory wiki, and set wiki/config/ executable:

mv mediawiki-*/ wiki
chmod +x wiki/config

In your favorite web browser, visit http://localhost/wiki/config/. Fill out everything as you wish (I'm not helping you here), make note of what you name the database. I named my database enwiki, so I'll be using that in my examples. When done, copy config/LocalSettings.php to the main wiki directory.

cp wiki/config/LocalSettings.php wiki/

Create the file wiki/skins/Simple.php with this as it's contents:

<?php
if( !defined( 'MEDIAWIKI' ) )
   die();

class SkinSimple extends Skin {
   function initPage() {}
   function getStylesheet() {return '';}''
   function getSkinName() {return "simple";}
   function doBeforeContent() {
      $s = "\n<div id='content'>\n<div id='topbar'>";
      $s .= $this->pageTitle() . $this->pageSubtitle() . "\n";
      $s .= "<br /><br />\n</div>\n\n<div id='article'>";
      return $s;
   }
   function topLinks() {return '';}''
   function doAfterContent() {return '';}''
   function printSource() {return '';}''
}
?>

Set simple as the default skin in wiki/LocalSettings.php:

...
$wgDefaultSkin = 'simple';
...

When you are finished downloading extract the database dump, and restore it to your database:

gunzip *_cur_table.sql.gz
mysql -uroot -ppassword enwiki < *_cur_table.sql

Visit the wiki at http://localhost/wiki to make sure it's working.

Now for the useful part. Make a new folder somewhere to store everything.

mkdir ~/wikipedia
cd ~/wikipedia

Create a file named dumpwiki.sh with this as it's contents:

#!/bin/sh

MYSQL="mysql -uroot -pkufuku --batch -e"
WIKIPEDIA="http://localhost/wiki/index.php"

LETTER="." #any letter
#LETTER="a"  #only 'A'
#LETTER="[a-c]"  #'A', 'B' or 'C'

$MYSQL "use enwiki; SELECT cur_title from cur" \
| grep -iv "[0-9][0-9]*\.[0-9][0-9]*\.[0-9][0-9]*\.[0-9][0-9]*" \
| grep -iv "\!" \
| grep -iv "Requests_for_" \
| grep -iv "votes_for" \
| grep -iv "votes_on" \
| grep -iv "needing_votes" \
| grep -iv "images_" \
| grep -iv \" \
| grep -iv \' \
| grep -iv "\....$" \
| grep -iv '\%' \
| grep -iv '\$' \
| grep -iv '\&' | grep -iv '\?' \
| grep -iv "cur_title" \
| grep -iv '\-' \
| grep -iv '\*' \
| grep -iv '^\.' \
| grep -iv '\:' \
| grep -iv '\/' \
| grep -i "^$LETTER" \
> wikititles

mkdir -p articles
cd articles

rm -f /tmp/wikiart
for name in `cat ../wikititles`; do
        echo $name
        if [ ! -f $name ]
        then
                if wget -nv "$WIKIPEDIA/$name"
                then
                        html2text -ascii -nobs $name > /tmp/wikiart
                        rm -f $name
                        mv /tmp/wikiart $name
                else
                        rm -f $name; 
                fi
        fi
done

if [[ "$LETTER" == "." ]]; then
        tar -cjf "../all.tar.bz2" .
else
        rm -f ../$LETTER.tar ../$LETTER.tar.bz2
        find ./ -type f | grep -i "^./$LETTER" | tar -cjf ../$LETTER.tar.bz2 -T -
fi

Run the file:

sh dumpwiki.sh

Any questions please email timmyisdaman@gmail.com.

Articles

If you do not want to dump the articles yourself, all of the articles have been dumped and and are available here.

Limitations

iPodlinux

  • No hyperlinks
  • Can't fit all the files in one directory

Rockbox

  • No hyperlinks
  • Can't fit all the files in one directory

Apple Firmware

  • The iPod software only reads the first 4 KB of any text file in the Notes/ directory. The remainder of the file is ignored.
  • The iPod software only reads a total of 1000 text files from the Notes/ directory. While it's easily possible to place a lot more text files on your iPod, the software will only read the first 1000 files and ignore the rest.

These restrictions apply to all iPod models, including the newer fifth and six generation iPods.