Difference between revisions of "CleanupTitles.php"

From The System Administrator Zone
(The Corruption =)
(The Cleanup)
Line 69: Line 69:
 
equoria_gz 2012-03-10 19:34:18:  85.05% done on page; ETA 2012-03-10 19:34:18 [700/823] 1489.30/sec <33.00% updated>
 
equoria_gz 2012-03-10 19:34:18:  85.05% done on page; ETA 2012-03-10 19:34:18 [700/823] 1489.30/sec <33.00% updated>
 
Finished page... 231 of 720 rows updated
 
Finished page... 231 of 720 rows updated
 +
</pre>
 +
 +
The script believes that if worked.
 +
 +
<pre>
 +
# php cleanupTitles.php --dry-run
 +
Checking for bad titles...
 +
Processing page...
 +
equoria_gz 2012-03-10 19:35:53:  12.15% done on page; ETA 2012-03-10 19:35:53 [100/823] 3793.76/sec <0.00% updated>
 +
equoria_gz 2012-03-10 19:35:53:  24.30% done on page; ETA 2012-03-10 19:35:53 [200/823] 4460.29/sec <0.00% updated>
 +
equoria_gz 2012-03-10 19:35:53:  36.45% done on page; ETA 2012-03-10 19:35:53 [300/823] 4192.17/sec <0.00% updated>
 +
equoria_gz 2012-03-10 19:35:53:  48.60% done on page; ETA 2012-03-10 19:35:53 [400/823] 4179.25/sec <0.00% updated>
 +
equoria_gz 2012-03-10 19:35:53:  60.75% done on page; ETA 2012-03-10 19:35:53 [500/823] 3772.68/sec <0.00% updated>
 +
equoria_gz 2012-03-10 19:35:53:  72.90% done on page; ETA 2012-03-10 19:35:53 [600/823] 3567.41/sec <0.00% updated>
 +
equoria_gz 2012-03-10 19:35:53:  85.05% done on page; ETA 2012-03-10 19:35:53 [700/823] 3796.47/sec <0.00% updated>
 +
Finished page... 0 of 720 rows updated
 +
#
 
</pre>
 
</pre>

Revision as of 14:36, 10 March 2012


The Problem

The Game Zone wiki was attached by Chinese Language SPAMMERS who left over two hundred corrupted pages that could not be edited or deleted, because the Titles contained invalid UTF-8 sequences.

UncategorizedPages.PNG

cleanupTitles.php

See: Manual:cleanupTitles.php

Running the program with the --help option, will show you how to use it.

# php cleanupTitles.php --help

Script to clean up broken, unparseable titles

Usage: php cleanupTitles.php [--conf|--dbpass|--dbuser|--dry-run|--globals|--help|--memory-limit|--quiet|--server|--wiki]

Generic maintenance parameters:
    --help (-h): Display this help message
    --quiet (-q): Whether to supress non-error output
    --conf: Location of LocalSettings.php, if not default
    --wiki: For specifying the wiki ID
    --globals: Output globals at the end of processing for debugging
    --memory-limit: Set a specific memory limit for the script, "max"
        for no limit or "default" to avoid changing it
    --server: The protocol and server name to use in URLs, e.g.
        http://en.wikipedia.org. This is sometimes necessary because server name
        detection may fail in command line scripts.

Script dependant parameters:
    --dbuser: The DB user to use for this script
    --dbpass: The password to use for this script

Script specific parameters:
    --dry-run: Perform a dry run

#

The Corruption

Running

# php cleanupTitles.php --dry-run|more

didn't look too promising.

page 446 (同城异性交友网站www.7moo.info最好的同城交友网站) doesn't match self.
DRY RUN: would rename 446 (0,'同城异性交友网站www.7moo.info最好的同城交友网站') to (0,'ŐŒåŸŽå¼‚性交友网站www.7moo.info最好的å
�城交友网站')

The Cleanup

First; a backup was performed!

The running

# php cleanupTitles.php

I saw the following.

[snip]
page 935 (アラド戦記_rmt) doesn't match self.
renaming 935 (0,'アラド戦記_rmt') to (0,'¢ラド戦記_rmt')
page 936 (アラド戦記_rmt_コンビニ) doesn't match self.
renaming 936 (0,'アラド戦記_rmt_コンビニ') to (0,'¢ラド戦記_rmt_コンビニ')
page 937 (アラド_rmt_最安値) doesn't match self.
renaming 937 (0,'アラド_rmt_最安値') to (0,'¢ラド_rmt_最安値')
page 938 (戦記) doesn't match self.
renaming 938 (0,'戦記') to (0,'ƈ¦è¨˜')
equoria_gz 2012-03-10 19:34:18:  72.90% done on page; ETA 2012-03-10 19:34:19 [600/823] 1324.11/sec <38.50% updated>
equoria_gz 2012-03-10 19:34:18:  85.05% done on page; ETA 2012-03-10 19:34:18 [700/823] 1489.30/sec <33.00% updated>
Finished page... 231 of 720 rows updated

The script believes that if worked.

# php cleanupTitles.php --dry-run
Checking for bad titles...
Processing page...
equoria_gz 2012-03-10 19:35:53:  12.15% done on page; ETA 2012-03-10 19:35:53 [100/823] 3793.76/sec <0.00% updated>
equoria_gz 2012-03-10 19:35:53:  24.30% done on page; ETA 2012-03-10 19:35:53 [200/823] 4460.29/sec <0.00% updated>
equoria_gz 2012-03-10 19:35:53:  36.45% done on page; ETA 2012-03-10 19:35:53 [300/823] 4192.17/sec <0.00% updated>
equoria_gz 2012-03-10 19:35:53:  48.60% done on page; ETA 2012-03-10 19:35:53 [400/823] 4179.25/sec <0.00% updated>
equoria_gz 2012-03-10 19:35:53:  60.75% done on page; ETA 2012-03-10 19:35:53 [500/823] 3772.68/sec <0.00% updated>
equoria_gz 2012-03-10 19:35:53:  72.90% done on page; ETA 2012-03-10 19:35:53 [600/823] 3567.41/sec <0.00% updated>
equoria_gz 2012-03-10 19:35:53:  85.05% done on page; ETA 2012-03-10 19:35:53 [700/823] 3796.47/sec <0.00% updated>
Finished page... 0 of 720 rows updated
#