| wikipedia database is available in XML format here http://download.wikimedia.org/enwiki/latest/ |
|
20090116 wikiparse-1.0.pl My working wikipedia XML parser, thanks to the wonderful "Perl Module XML::Node by Chang Liu" and his readme Invokation: bzcat enwiki-latest-pages-articles.xml.bz2 | head -5000 | ./xmlparse.pl -s "American" |
|
20090117 I like this guys ideas, but it's messy... I want to eliminate the C++ code and the web server, or at least have the whole thing written in perl.http://users.softlab.ece.ntua.gr/~ttsiod/buildWikipediaOffline.html |