User:Dan Nessett/Technical/Notes on setting up CZ clones: Difference between revisions
imported>Dan Nessett |
No edit summary |
||
(4 intermediate revisions by one other user not shown) | |||
Line 1: | Line 1: | ||
{{AccountNotLive}} | |||
==Notes== | ==Notes== | ||
Line 102: | Line 103: | ||
watch --interval=10 php showJobs.php</pre> | watch --interval=10 php showJobs.php</pre> | ||
* There is a utility called '''wikifind''' that | * There is a utility called '''wikifind''' that searches wiki xml dumps. This can be used to find mediawiki markup. The URL is http://meta.wikimedia.org/wiki/User:Micke/WikiFind. To install on a CenOS 5.5 system: | ||
<pre>sudo su - | <pre>sudo su - | ||
Line 114: | Line 115: | ||
<pre>yum install boost.x86_64 boost-devel.x86_64 | <pre>yum install boost.x86_64 boost-devel.x86_64 | ||
g++ wikifind.cpp -o wikifind -lboost_regex | g++ wikifind.cpp -o wikifind -lboost_regex | ||
cp wikifind /usr/bin</pre> | cp wikifind /usr/local/bin</pre> | ||
::To run: | ::To run: | ||
<pre> | <pre>wikifind | ||
<wikifind will prompt for language, the xml dump file to search, where to store the results and which string to search for.></pre> | |||
===Setting up multiple clones on same machine=== | ===Setting up multiple clones on same machine=== |
Latest revision as of 02:39, 22 November 2023
The account of this former contributor was not re-activated after the server upgrade of March 2022.
Notes
Creating a CZ clone
- Had to modify createAndPromote.php to check password for validity before creating user. Otherwise, if the password is invalid a "ghost" user is created in mwuser table.
- Had to create a dummy user with user_id of 0, so XML dump import would work.
- Need to set Xdebug variables for both apache2/php and php-cli versions of php.ini
- Had to set xdebug.max_nesting_level=200 in /etc/php5/cli/php.ini so dump import wouldn't croak.
- Some useful information on MW XML dumps: http://www.mail-archive.com/wikitech-l@lists.wikimedia.org/msg01712.html, http://www.gossamer-threads.com/lists/wiki/wikitech/180598, http://meta.wikimedia.org/wiki/Data_dumps, http://meta.wikimedia.org/wiki/Xml2sql
- Used cloc to count lines of PHP code in CZ:
Directory Files Blank Lines Comments PHP code statements CZ phase 3 1005 56590 69544 460125 CZ includes 321 14769 33313 97375 CZ extensions 142 3769 6742 27350 CZ includes+extensions 463 18583 40055 124725
- Using importDump.php in /maintenance I populated a version of CZ as a local development environment. The Statistics special page showed in excess of 129,000 pages. The import reported populating 116,400 pages (looking at the pages table, the exact number is 116,486). This checks out, since the daily dump of CZ does not include histories. There are approximately 12,700 live articles, each of which would have a history page. Noting, 116,500 + 12,700 = 129,200, it appears all content pages were loaded. However, it took in excess of 3 1/2 days (about 80 hours) to import the content. This suggests looking at more efficient import strategies (e.g., using mwdumper or converting to SQL with xml2sql and importing directly into the database).
- Had trouble getting skins to work. I needed to set $wgScriptPath to /mediawiki/CZ_1_13_2/phase3. Originally had it set to $IP. But, that expands to /usr/local/src/mediawiki/CZ_1_13_2/phase3, which is not accessible through the apache2 server. The correct value uses the /mediawiki apache2 alias.
- I now need to run maintenance/runJobs.php. The statistics page shows 272,975 queued jobs, so running all queued jobs is going to take a while. Dan Nessett 22:39, 23 November 2009 (UTC)
- Had trouble getting texvc to work:
- The message "failed to parse cannot write to or create math temp directory" signals problems with permissions on the images directory in phase3.
- Need to ensure images directory has both a /math and /tmp subdirectory with read/write access and the images directory is accessible to the apache2 server (I simply chmod 777 both of them).
- Originally had $wgUploadPath to "$IP/images". This is incorrect. This variable must be set to a URL prefix that is accessible to the apache2 server. Set it to "$wgScriptPath/images" and TeX math worked.
- Ran into a strange problem where no matter how I changed the permissions on images/math and images/tmp, the message "failed to parse cannot write to or create math temp directory" appeared. Somehow this message stopped showing up. I don't know exactly why, but perhaps you need to clear the browser cache.
- I tried putting the directories/files in images into the www-data subgroup and owned by www-data and then changing permissions on everything below images to 775. However, subversion needs to get to locks in this directory tree (even when images has the svn:ignore property). So, while math rendering worked, when I committed changes to the repository, subversion failed on attempting to create a lock in images/.svn. So, I finally gave up and executed sudo chmod -R 777 images. This seems to fix all math rendering and subversion problems, but it is very insecure.
- Had trouble getting email to work. Since the installation is intended for local development, I chose to set up only local email. Therefore, every user must have an email address of <username>@localhost. When Ubuntu is installed, the exim4 MTA/MDA is installed by default. It is only necessary to set up an email client to receive emails. I used GNOME evoution (which is also installed by default). In order to set up evolution to receive local email, I had to set up the configuration as follows:
- Account name: Local Email Account
- Full Name: Dan Nessett
- Email Address dnessett@localhost
- Server Type: Local delivery
- Configuration (path): /var/mail/dnessett
- Server Type: Sendmail
- When we have a CZ repository set up, need to exclude some directories in phase3 from version control.
- In order to exclude all images in phase3/images from version control (other than those preloaded in icons), set property svn:ignore * on that directory.
- Svn copy LocalSettings.php into config (after potentially locally deleting any existing version of that file there). Then svn delete LocalSettings.php in phase3. Set properties on phase3 to include "svn:ignore LocalSettings.php". Commit these changes. Then locally (not using svn) copy LocalSettings.php from config to phase3. This effectively removes LocalSettings.php from version control. So, local developers can make modifications to it and commit other changes without saving LocalSettings.php to the repository. If it ever becomes necessary to change the repository version of LocalSettings.php, the developer should merge changes in phase3/LocalSettings.php into config/LocalSettings.php and then commit the changes.
- When ftp transferring a file created by svnadmin dump, make sure the transfer type is set to binary. Otherwise, when you attempt to import it, you will get an error like, "svnadmin: Dump stream contains a malformed header (with no ':') at:" Also when loading the dump, use svadmin load --ignore-uuid /path/to/repository < dumpfile. This will ensure the UUID specified in the dump file does not clobber the repository's existing UUID (this will happen if the repository being loaded has no revisions in it).
- The command used to dump the cz database is: pg_dump cz | gzip > cz_dump.gz. This resulted (on 1-15-2010) in a 154MB file. Restore with gunzip -c cz_dump.gz | psql cz
- The daily CZ data dump is located at: http://en.citizendium.org/wiki/CZ:Downloads
- The bz2 version is uncompressed using the following command: bunzip2 cz.dump.current.xml.bz2
- To import the current data dump, cd to /usr/local/src/mediawiki/CZ_1_13_2/phase3/maintenance. If the data dump file is in home folder, import the dump using: php importDump.php ~/cz.dump.current.xml.
- After importing dump, in maintenance directory execute: php refreshLinks.php. This will create a lot of Jobs. When refreshLinks completes, in maintenance directory execute php runJobs.php 2>&1 ~/runJobs.log. Running this utility will take a very long time. To reduce this run several instances of this utility at once. Here is a shell script that starts up 20 instances.
#!/bin/bash cd /usr/local/src/mediawiki/CZ_1_13_2/phase3/maintenance php runJobs.php 2>&1 > ~/runJobs.log1& php runJobs.php 2>&1 > ~/runJobs.log2& php runJobs.php 2>&1 > ~/runJobs.log3& php runJobs.php 2>&1 > ~/runJobs.log4& php runJobs.php 2>&1 > ~/runJobs.log5& php runJobs.php 2>&1 > ~/runJobs.log6& php runJobs.php 2>&1 > ~/runJobs.log7& php runJobs.php 2>&1 > ~/runJobs.log8& php runJobs.php 2>&1 > ~/runJobs.log9& php runJobs.php 2>&1 > ~/runJobs.log10& php runJobs.php 2>&1 > ~/runJobs.log11& php runJobs.php 2>&1 > ~/runJobs.log12& php runJobs.php 2>&1 > ~/runJobs.log13& php runJobs.php 2>&1 > ~/runJobs.log14& php runJobs.php 2>&1 > ~/runJobs.log15& php runJobs.php 2>&1 > ~/runJobs.log16& php runJobs.php 2>&1 > ~/runJobs.log17& php runJobs.php 2>&1 > ~/runJobs.log18& php runJobs.php 2>&1 > ~/runJobs.log19& php runJobs.php 2>&1 > ~/runJobs.log20& wait
- After running runJobs, in the maintenance directory run initStats.php --update. This will update the Statistics special page.
- Importing compressed DB dump took ~21 minutes on Dual 1.8 GHz processor system with 4 GB of storage.
- Loading IE6 under wine. First tried directions at:http://www.howtoforge.com/how-to-install-internet-explorer-on-ubuntu8.04. These ended in an error. Then tried: http://ubuntumanual.org/posts/171/install-internet-explorer-in-ubuntu-the-easiest-way. These worked. However, when I tried to install ie5.5 or ie5.0 Pyton seg-faulted.
Setting up a local development environment
- Netbeans 6.8 plus the Java SE 6 Development Kit (JDK) package is at: NB6.8 + JDK.
Managing subversion source code
- To delete all .svn directories (and their contents) from a source tree, execute: find . -name ".svn" -exec rm -rf {} \; This is from http://snippets.dzone.com/posts/show/2486
- Nice subversion cheat sheet: http://www.abbeyworkshop.com/howto/misc/svn01/
Some useful information
- importDump may create some jobs to run. By default a job is taken from the job queue and run for each page access. However, for a clone this is not going to empty the job queue very quickly. Consequently, the shell script given above. However, running this script will not tell you how many jobs remain. Another script showJobs.php will do this. Run it by executing in a terminal window:
cd /usr/local/src/mediawiki/CZ_1_13_2/phase3/maintenance php showJobs.php
- An even more useful way to show job queue processing progress is to run showJobs periodically using the watch command:
cd /usr/local/src/mediawiki/CZ_1_13_2/phase3/maintenance watch --interval=10 php showJobs.php
- There is a utility called wikifind that searches wiki xml dumps. This can be used to find mediawiki markup. The URL is http://meta.wikimedia.org/wiki/User:Micke/WikiFind. To install on a CenOS 5.5 system:
sudo su - cd /usr/local/src/ mkdir wikifind cd wikifind wget http://wikifind.wikispaces.com/space/showimage/wikifind.cpp
- To compile:
yum install boost.x86_64 boost-devel.x86_64 g++ wikifind.cpp -o wikifind -lboost_regex cp wikifind /usr/local/bin
- To run:
wikifind <wikifind will prompt for language, the xml dump file to search, where to store the results and which string to search for.>
Setting up multiple clones on same machine
First checkout the CZ code for each duplicate and put it in separate directories. For example, suppose you want two duplicate clones, one for work on the existing CZ code and one for working on the refactoring branch. We will call these two duplicates CZ_1_13_2 and CZ_RF_1_13_2. Create two directories in /usr/local/src/mediawiki/. For the purpose of these instructions call them:
/usr/local/src/mediawiki/CZ_1_13_2/ /usr/local/src/mediawiki/CZ_refactor_1_13_2/
Now checkout the appropriate subversion code into each. For example, use the following commands:
cd /usr/local/src/mediawiki/CZ_1_13_2 svn co http://svn.citizendium.org/czrepo/trunk/phase3 cd /usr/local/src/mediawiki/CZ_refactor_1_13_2 svn co http://svn.citizendium.org/czrepo/branches/CZ_refactor_1_13_2/phase3
Add an alias for each duplicate clone. For example, for the two clones CZ_1_13_2 and CZ_RF_1_13_2, add the following lines to the sites-enabled directory in /etc/apache2/ (Ububntu).
Alias /CZ_1_13_2 "/usr/local/src/mediawiki/CZ_1_13_2/phase3/index.php" Alias /CZ_RF_1_13_2 "/usr/local/src/mediawiki/CZ_refactor_1_13_2/phase3/index.php"
These should be contained within the <VirtualHost> tag pair.
Follow the instructions on How_to_create_a_CZ_clone (section Configuring the CZ wiki software) for each duplicate, but before testing each, make the following edits to LocalSettings.php:
$wgScriptPath = "/mediawiki/<directory_name>/phase3"; $wgScript = "/<alias_name>/wiki";
For example for the CZ_1_13_2 clone the edits would be:
$wgScriptPath = "/mediawiki/CZ_1_13_2/phase3"; $wgScript = "/CZ_1_13_2/wiki";
and for the CZ_RF_1_13_2 clone the edits would be:
$wgScriptPath = "/mediawiki/CZ_refactor_1_13_2/phase3"; $wgScript = "/CZ_RF_1_13_2/wiki";
Save these edits and access each duplicate by their aliases:
http://localhost/CZ_1_13_2 http://localhost/CZ_RF_1_13_2
Setting up an MW clone is somewhat different. You first install the software and then access the index.php file. You must do this by referencing the directory in which the software is installed. For example, if you checkout the MW software into /usr/local/src/mediawiki/MW_1_13_2, then you must start the install by referencing:
http://localhost/mediawiki/MW_1_13_2/phase3/index.php
This takes you to an install web page telling that you must install Mediawiki. Click on the link and follow the instructions. Then add the alias you want to use for this wiki, For example, if you wish to call it MW_1_13_2, then you must add the following to the /etc/apache2/sites-enabled (Ubuntu) configuration file:
Alias /MW_1_13_2 "/usr/local/src/mediawiki/MW_1_13_2/phase3/index.php"
In your browser you would reference this wiki as:
http://localhost/MW_1_13_2