User:Dan Nessett/Technical/Notes on setting up CZ clones: Difference between revisions

From Citizendium
Jump to navigation Jump to search
imported>Dan Nessett
imported>Dan Nessett
Line 102: Line 102:
watch --interval=10 php showJobs.php</pre>
watch --interval=10 php showJobs.php</pre>


* There is a utility called '''wikifind''' that search wiki xml dumps. This can be used to find mediawiki markup. The URL is http://meta.wikimedia.org/wiki/User:Micke/WikiFind.
* There is a utility called '''wikifind''' that searches wiki xml dumps. This can be used to find mediawiki markup. The URL is http://meta.wikimedia.org/wiki/User:Micke/WikiFind.


<pre>sudo su -
<pre>sudo su -

Revision as of 13:47, 12 January 2011

Notes

Creating a CZ clone

Directory Files Blank Lines Comments PHP code statements
CZ phase 3 1005 56590 69544 460125
CZ includes 321 14769 33313 97375
CZ extensions 142 3769 6742 27350
CZ includes+extensions 463 18583 40055 124725
  • Using importDump.php in /maintenance I populated a version of CZ as a local development environment. The Statistics special page showed in excess of 129,000 pages. The import reported populating 116,400 pages (looking at the pages table, the exact number is 116,486). This checks out, since the daily dump of CZ does not include histories. There are approximately 12,700 live articles, each of which would have a history page. Noting, 116,500 + 12,700 = 129,200, it appears all content pages were loaded. However, it took in excess of 3 1/2 days (about 80 hours) to import the content. This suggests looking at more efficient import strategies (e.g., using mwdumper or converting to SQL with xml2sql and importing directly into the database).
  • Had trouble getting skins to work. I needed to set $wgScriptPath to /mediawiki/CZ_1_13_2/phase3. Originally had it set to $IP. But, that expands to /usr/local/src/mediawiki/CZ_1_13_2/phase3, which is not accessible through the apache2 server. The correct value uses the /mediawiki apache2 alias.
I now need to run maintenance/runJobs.php. The statistics page shows 272,975 queued jobs, so running all queued jobs is going to take a while. Dan Nessett 22:39, 23 November 2009 (UTC)
  • Had trouble getting texvc to work:
  • The message "failed to parse cannot write to or create math temp directory" signals problems with permissions on the images directory in phase3.
  • Need to ensure images directory has both a /math and /tmp subdirectory with read/write access and the images directory is accessible to the apache2 server (I simply chmod 777 both of them).
  • Originally had $wgUploadPath to "$IP/images". This is incorrect. This variable must be set to a URL prefix that is accessible to the apache2 server. Set it to "$wgScriptPath/images" and TeX math worked.
  • Ran into a strange problem where no matter how I changed the permissions on images/math and images/tmp, the message "failed to parse cannot write to or create math temp directory" appeared. Somehow this message stopped showing up. I don't know exactly why, but perhaps you need to clear the browser cache.
  • I tried putting the directories/files in images into the www-data subgroup and owned by www-data and then changing permissions on everything below images to 775. However, subversion needs to get to locks in this directory tree (even when images has the svn:ignore property). So, while math rendering worked, when I committed changes to the repository, subversion failed on attempting to create a lock in images/.svn. So, I finally gave up and executed sudo chmod -R 777 images. This seems to fix all math rendering and subversion problems, but it is very insecure.
  • Had trouble getting email to work. Since the installation is intended for local development, I chose to set up only local email. Therefore, every user must have an email address of <username>@localhost. When Ubuntu is installed, the exim4 MTA/MDA is installed by default. It is only necessary to set up an email client to receive emails. I used GNOME evoution (which is also installed by default). In order to set up evolution to receive local email, I had to set up the configuration as follows:
  • Account name: Local Email Account
  • Full Name: Dan Nessett
  • Email Address dnessett@localhost
  • Server Type: Local delivery
  • Configuration (path): /var/mail/dnessett
  • Server Type: Sendmail
  • When we have a CZ repository set up, need to exclude some directories in phase3 from version control.
  • In order to exclude all images in phase3/images from version control (other than those preloaded in icons), set property svn:ignore * on that directory.
  • Svn copy LocalSettings.php into config (after potentially locally deleting any existing version of that file there). Then svn delete LocalSettings.php in phase3. Set properties on phase3 to include "svn:ignore LocalSettings.php". Commit these changes. Then locally (not using svn) copy LocalSettings.php from config to phase3. This effectively removes LocalSettings.php from version control. So, local developers can make modifications to it and commit other changes without saving LocalSettings.php to the repository. If it ever becomes necessary to change the repository version of LocalSettings.php, the developer should merge changes in phase3/LocalSettings.php into config/LocalSettings.php and then commit the changes.
  • When ftp transferring a file created by svnadmin dump, make sure the transfer type is set to binary. Otherwise, when you attempt to import it, you will get an error like, "svnadmin: Dump stream contains a malformed header (with no ':') at:" Also when loading the dump, use svadmin load --ignore-uuid /path/to/repository < dumpfile. This will ensure the UUID specified in the dump file does not clobber the repository's existing UUID (this will happen if the repository being loaded has no revisions in it).
  • The command used to dump the cz database is: pg_dump cz | gzip > cz_dump.gz. This resulted (on 1-15-2010) in a 154MB file. Restore with gunzip -c cz_dump.gz | psql cz
  • The daily CZ data dump is located at: http://en.citizendium.org/wiki/CZ:Downloads
  • The bz2 version is uncompressed using the following command: bunzip2 cz.dump.current.xml.bz2
  • To import the current data dump, cd to /usr/local/src/mediawiki/CZ_1_13_2/phase3/maintenance. If the data dump file is in home folder, import the dump using: php importDump.php ~/cz.dump.current.xml.
  • After importing dump, in maintenance directory execute: php refreshLinks.php. This will create a lot of Jobs. When refreshLinks completes, in maintenance directory execute php runJobs.php 2>&1 ~/runJobs.log. Running this utility will take a very long time. To reduce this run several instances of this utility at once. Here is a shell script that starts up 20 instances.
#!/bin/bash
cd /usr/local/src/mediawiki/CZ_1_13_2/phase3/maintenance
php runJobs.php 2>&1 > ~/runJobs.log1&
php runJobs.php 2>&1 > ~/runJobs.log2&
php runJobs.php 2>&1 > ~/runJobs.log3&
php runJobs.php 2>&1 > ~/runJobs.log4&
php runJobs.php 2>&1 > ~/runJobs.log5&
php runJobs.php 2>&1 > ~/runJobs.log6&
php runJobs.php 2>&1 > ~/runJobs.log7&
php runJobs.php 2>&1 > ~/runJobs.log8&
php runJobs.php 2>&1 > ~/runJobs.log9&
php runJobs.php 2>&1 > ~/runJobs.log10&
php runJobs.php 2>&1 > ~/runJobs.log11&
php runJobs.php 2>&1 > ~/runJobs.log12&
php runJobs.php 2>&1 > ~/runJobs.log13&
php runJobs.php 2>&1 > ~/runJobs.log14&
php runJobs.php 2>&1 > ~/runJobs.log15&
php runJobs.php 2>&1 > ~/runJobs.log16&
php runJobs.php 2>&1 > ~/runJobs.log17&
php runJobs.php 2>&1 > ~/runJobs.log18&
php runJobs.php 2>&1 > ~/runJobs.log19&
php runJobs.php 2>&1 > ~/runJobs.log20&
wait

Setting up a local development environment

  • Netbeans 6.8 plus the Java SE 6 Development Kit (JDK) package is at: NB6.8 + JDK.

Managing subversion source code

Some useful information

  • importDump may create some jobs to run. By default a job is taken from the job queue and run for each page access. However, for a clone this is not going to empty the job queue very quickly. Consequently, the shell script given above. However, running this script will not tell you how many jobs remain. Another script showJobs.php will do this. Run it by executing in a terminal window:
cd /usr/local/src/mediawiki/CZ_1_13_2/phase3/maintenance
php showJobs.php
An even more useful way to show job queue processing progress is to run showJobs periodically using the watch command:
cd /usr/local/src/mediawiki/CZ_1_13_2/phase3/maintenance
watch --interval=10 php showJobs.php
sudo su -
cd /usr/local/src/
mkdir wikifind
cd wikifind
wget http://wikifind.wikispaces.com/space/showimage/wikifind.cpp
To compile:
yum install boost.x86_64 boost-devel.x86_64
g++ wikifind.cpp -o wikifind -lboost_regex
cp wikifind /usr/local/bin
To run:
cd <directory where dumpfile is located>
wikifind

Setting up multiple clones on same machine

First checkout the CZ code for each duplicate and put it in separate directories. For example, suppose you want two duplicate clones, one for work on the existing CZ code and one for working on the refactoring branch. We will call these two duplicates CZ_1_13_2 and CZ_RF_1_13_2. Create two directories in /usr/local/src/mediawiki/. For the purpose of these instructions call them:

/usr/local/src/mediawiki/CZ_1_13_2/
/usr/local/src/mediawiki/CZ_refactor_1_13_2/

Now checkout the appropriate subversion code into each. For example, use the following commands:

cd /usr/local/src/mediawiki/CZ_1_13_2
svn co http://svn.citizendium.org/czrepo/trunk/phase3
cd /usr/local/src/mediawiki/CZ_refactor_1_13_2
svn co  http://svn.citizendium.org/czrepo/branches/CZ_refactor_1_13_2/phase3

Add an alias for each duplicate clone. For example, for the two clones CZ_1_13_2 and CZ_RF_1_13_2, add the following lines to the sites-enabled directory in /etc/apache2/ (Ububntu).

Alias /CZ_1_13_2 "/usr/local/src/mediawiki/CZ_1_13_2/phase3/index.php"
Alias /CZ_RF_1_13_2 "/usr/local/src/mediawiki/CZ_refactor_1_13_2/phase3/index.php"

These should be contained within the <VirtualHost> tag pair.

Follow the instructions on How_to_create_a_CZ_clone (section Configuring the CZ wiki software) for each duplicate, but before testing each, make the following edits to LocalSettings.php:

$wgScriptPath        = "/mediawiki/<directory_name>/phase3";
$wgScript            = "/<alias_name>/wiki";

For example for the CZ_1_13_2 clone the edits would be:

$wgScriptPath        = "/mediawiki/CZ_1_13_2/phase3";
$wgScript            = "/CZ_1_13_2/wiki";

and for the CZ_RF_1_13_2 clone the edits would be:

$wgScriptPath        = "/mediawiki/CZ_refactor_1_13_2/phase3";
$wgScript            = "/CZ_RF_1_13_2/wiki";

Save these edits and access each duplicate by their aliases:

http://localhost/CZ_1_13_2
http://localhost/CZ_RF_1_13_2


Setting up an MW clone is somewhat different. You first install the software and then access the index.php file. You must do this by referencing the directory in which the software is installed. For example, if you checkout the MW software into /usr/local/src/mediawiki/MW_1_13_2, then you must start the install by referencing:

http://localhost/mediawiki/MW_1_13_2/phase3/index.php

This takes you to an install web page telling that you must install Mediawiki. Click on the link and follow the instructions. Then add the alias you want to use for this wiki, For example, if you wish to call it MW_1_13_2, then you must add the following to the /etc/apache2/sites-enabled (Ubuntu) configuration file:

Alias /MW_1_13_2 "/usr/local/src/mediawiki/MW_1_13_2/phase3/index.php"

In your browser you would reference this wiki as:

http://localhost/MW_1_13_2