wikipedia database is available in XML format here http://download.wikimedia.org/enwiki/latest/

20090116

wikiparse-1.0.pl My working wikipedia XML parser, thanks to the wonderful "Perl Module XML::Node by Chang Liu" and his readme

Invokation: bzcat enwiki-latest-pages-articles.xml.bz2 | head -5000 | ./xmlparse.pl -s "American"

20090117

I like this guys ideas, but it's messy... I want to eliminate the C++ code and the web server, or at least have the whole thing written in perl.

http://users.softlab.ece.ntua.gr/~ttsiod/buildWikipediaOffline.html