User:Dan Nessett/Technical/Notes on setting up CZ clones: Difference between revisions

Revision as of 13:57, 25 January 2010

Notes

Had to modify createAndPromote.php to check password for validity before creating user. Otherwise, if the password is invalid a "ghost" user is created in mwuser table.
Had to create a dummy user with user_id of 0, so XML dump import would work.
Need to set Xdebug variables for both apache2/php and php-cli versions of php.ini
Had to set xdebug.max_nesting_level=200 in /etc/php5/cli/php.ini so dump import wouldn't croak.
Some useful information on MW XML dumps: http://www.mail-archive.com/wikitech-l@lists.wikimedia.org/msg01712.html, http://www.gossamer-threads.com/lists/wiki/wikitech/180598, http://meta.wikimedia.org/wiki/Data_dumps, http://meta.wikimedia.org/wiki/Xml2sql
Used cloc to count lines of PHP code in CZ:

Directory	Files	Blank Lines	Comments	PHP code statements
CZ phase 3	1005	56590	69544	460125
CZ includes	321	14769	33313	97375
CZ extensions	142	3769	6742	27350
CZ includes+extensions	463	18583	40055	124725

Using importDump.php in /maintenance I populated a version of CZ as a local development environment. The Statistics special page showed in excess of 129,000 pages. The import reported populating 116,400 pages (looking at the pages table, the exact number is 116,486). This checks out, since the daily dump of CZ does not include histories. There are approximately 12,700 live articles, each of which would have a history page. Noting, 116,500 + 12,700 = 129,200, it appears all content pages were loaded. However, it took in excess of 3 1/2 days (about 80 hours) to import the content. This suggests looking at more efficient import strategies (e.g., using mwdumper or converting to SQL with xml2sql and importing directly into the database).
Had trouble getting skins to work. I needed to set $wgScriptPath to /mediawiki/CZ_1_13_2/phase3. Originally had it set to $IP. But, that expands to /usr/local/src/mediawiki/CZ_1_13_2/phase3, which is not accessible through the apache2 server. The correct value uses the /mediawiki apache2 alias.

I now need to run maintenance/runJobs.php. The statistics page shows 272,975 queued jobs, so running all queued jobs is going to take a while. Dan Nessett 22:39, 23 November 2009 (UTC)

Had trouble getting texvc to work:

The message "failed to parse cannot write to or create math temp directory" signals problems with permissions on the images directory in phase3.
Need to ensure images directory has both a /math and /tmp subdirectory with read/write access and the images directory is accessible to the apache2 server (I simply chmod 777 both of them).
Originally had $wgUploadPath to "$IP/images". This is incorrect. This variable must be set to a URL prefix that is accessible to the apache2 server. Set it to "$wgScriptPath/images" and TeX math worked.
Ran into a strange problem where no matter how I changed the permissions on images/math and images/tmp, the message "failed to parse cannot write to or create math temp directory" appeared. Somehow this message stopped showing up. I don't know exactly why, but perhaps you need to clear the browser cache.
I tried putting the directories/files in images into the www-data subgroup and owned by www-data and then changing permissions on everything below images to 775. However, subversion needs to get to locks in this directory tree (even when images has the svn:ignore property). So, while math rendering worked, when I committed changes to the repository, subversion failed on attempting to create a lock in images/.svn. So, I finally gave up and executed sudo chmod -R 777 images. This seems to fix all math rendering and subversion problems, but it is very insecure.

Had trouble getting email to work. Since the installation is intended for local development, I chose to set up only local email. Therefore, every user must have an email address of <username>@localhost. When Ubuntu is installed, the exim4 MTA/MDA is installed by default. It is only necessary to set up an email client to receive emails. I used GNOME evoution (which is also installed by default). In order to set up evolution to receive local email, I had to set up the configuration as follows:

Account name: Local Email Account
Full Name: Dan Nessett
Email Address dnessett@localhost
Server Type: Local delivery
Configuration (path): /var/mail/dnessett
Server Type: Sendmail

When we have a CZ repository set up, need to exclude some directories in phase3 from version control.

In order to exclude all images in phase3/images from version control (other than those preloaded in icons), set property svn:ignore * on that directory.
Svn copy LocalSettings.php into config (after potentially locally deleting any existing version of that file there). Then svn delete LocalSettings.php in phase3. Set properties on phase3 to include "svn:ignore LocalSettings.php". Commit these changes. Then locally (not using svn) copy LocalSettings.php from config to phase3. This effectively removes LocalSettings.php from version control. So, local developers can make modifications to it and commit other changes without saving LocalSettings.php to the repository. If it ever becomes necessary to change the repository version of LocalSettings.php, the developer should merge changes in phase3/LocalSettings.php into config/LocalSettings.php and then commit the changes.

When ftp transferring a file created by svnadmin dump, make sure the transfer type is set to binary. Otherwise, when you attempt to import it, you will get an error like, "svnadmin: Dump stream contains a malformed header (with no ':') at:" Also when loading the dump, use svadmin load --ignore-uuid /path/to/repository < dumpfile. This will ensure the UUID specified in the dump file does not clobber the repository's existing UUID (this will happen if the repository being loaded has no revisions in it).
The command used to dump the cz database is: pg_dump cz | gzip > cz_dump.gz. This resulted (on 1-15-2010) in a 154MB file. Restore with gunzip -c cz_dump.gz | psql cz
The daily CZ data dump is located at: http://en.citizendium.org/wiki/CZ:Downloads
The bz2 version is uncompressed using the following command: bunzip2 cz.dump.current.xml.bz2
To import the current data dump, cd to /usr/local/src/mediawiki/CZ_1_13_2/phase3/maintenance. If the data dump file is in home folder, import the dump using: php importDump.php ~/cz.dump.current.xml.
After importing dump, in maintenance directory execute: php refreshLinks.php. This will create a lot of Jobs. When refreshLinks completes, in maintenance directory execute php runJobs.php 2>&1 ~/runJobs.log. Running this utility will take a very long time. To reduce this run several instances of this utility at once. Here is a shell script that starts up 20 instances.

#!/bin/bash
cd /usr/local/src/mediawiki/CZ_1_13_2/phase3/maintenance
php runJobs.php 2>&1 > ~/runJobs.log1&
php runJobs.php 2>&1 > ~/runJobs.log2&
php runJobs.php 2>&1 > ~/runJobs.log3&
php runJobs.php 2>&1 > ~/runJobs.log4&
php runJobs.php 2>&1 > ~/runJobs.log5&
php runJobs.php 2>&1 > ~/runJobs.log6&
php runJobs.php 2>&1 > ~/runJobs.log7&
php runJobs.php 2>&1 > ~/runJobs.log8&
php runJobs.php 2>&1 > ~/runJobs.log9&
php runJobs.php 2>&1 > ~/runJobs.log10&
php runJobs.php 2>&1 > ~/runJobs.log11&
php runJobs.php 2>&1 > ~/runJobs.log12&
php runJobs.php 2>&1 > ~/runJobs.log13&
php runJobs.php 2>&1 > ~/runJobs.log14&
php runJobs.php 2>&1 > ~/runJobs.log15&
php runJobs.php 2>&1 > ~/runJobs.log16&
php runJobs.php 2>&1 > ~/runJobs.log17&
php runJobs.php 2>&1 > ~/runJobs.log18&
php runJobs.php 2>&1 > ~/runJobs.log19&
php runJobs.php 2>&1 > ~/runJobs.log20&
wait

After running runJobs, in the maintenance directory run initStats.php --update. This will update the Statistics special page.
Importing compressed DB dump took ~21 minutes on Dual 1.8 GHz processor system with 4 GB of storage.
Loading IE6 under wine. First tried directions at:http://www.howtoforge.com/how-to-install-internet-explorer-on-ubuntu8.04. These ended in an error. Then tried: http://ubuntumanual.org/posts/171/install-internet-explorer-in-ubuntu-the-easiest-way. These worked.

@@ Line 77: / Line 77: @@
 * After running runJobs, in the maintenance directory run initStats.php --update. This will update the Statistics special page.
 * Importing compressed DB dump took ~21 minutes on Dual 1.8 GHz processor system with 4 GB of storage.
+* Loading IE6 under wine. First tried directions at:http://www.howtoforge.com/how-to-install-internet-explorer-on-ubuntu8.04. These ended in an error. Then tried: http://ubuntumanual.org/posts/171/install-internet-explorer-in-ubuntu-the-easiest-way. These worked.

User:Dan Nessett/Technical/Notes on setting up CZ clones: Difference between revisions

Revision as of 13:57, 25 January 2010

Notes

Navigation menu

Search