CZ Talk:The Big Cleanup
Suggestions
- Questions that it would be useful to have answered. Is this a good idea? Is it worth the effort? Is this something we can expect to implement on a large scale? Is it too confusing to be implemented on a large scale? (If the answer is "yes," please be honest.) Should we add, or delete, any fields to the checklist? Should we add, or delete, any items to the cleanup "to do" list?
Particularly if you have been doing some testing, please give feedback here. Are there Article Checklist fields that you'd like to see added? Would you like to see new categories tracked? New things to put in the checklist?
- If basic cleanup includes removing underlinking, will the links have to be reinserted as and when those new (related) articles come up in CZ? But will anyone be tracking them at that time? On the other hand, if the red tags remain, that may stimulate some of the contributors to start articles on those - at least in WP I had created many new pages from the underlinks. Supten 22:56, 8 March 2007 (CST)
- Basic cleanup does not involve removing red links to articles, but only red links to nonexistent templates, pictures, categories, and interwiki links. --Larry Sanger 21:56, 11 March 2007 (CDT)
Hmmm. The workflow feels broken... or maybe I just didn't quite fall into one. Might do a bit more somewhere (there were four actual articles and eight redirects to Baha'i Faith in my dozen!) 'Dragon' Dave McKee 14:00, 9 March 2007 (CST)
- Well, as the veteran of countless hours of busywork, my advice is to pick one workflow and stick to it. As you practice exactly that workflow, you get better and more efficient at it quite quickly. --Larry Sanger 14:57, 9 March 2007 (CST)
There seems to be some problems with template implementation. Not specifying either "y" or "n" in the "underlinked" attribute leads to a "Not specifiedNo" output. As other attributes have no problem with not specifying any entry that seems strange. Also specifying "status = 2" leads to the article (resp. the talk page) becoming member in "Computers Developing Articles" as well as "Computers Nonstub articles" (and if one specifies the status as 1, one additionally gains the "developed article" category. Shouldn't it be only one category? Otherwise "Nonstub article will become quite full. --Markus Baumeister 15:04, 9 March 2007 (CST)
- I'll investigate and fix the first bug. "Developing Articles" are a subset of "Nonstub Articles," which is 1 + 2--a useful category, because it shows us how many (and which) articles are under active development and beyond stub stage. --Larry Sanger 16:02, 9 March 2007 (CST)
Another thing: It would be useful if the Template:Cite needed template would exist. I just removed it several times from Computer Science because it was red. That seems wasteful. If WP thinks some citations are lacking, we should not remove those hints during cleanup. --Markus Baumeister 15:53, 9 March 2007 (CST)
- I disagree. Articles are written for the end-user, not for contributors, and the Template:Cite needed template is a hint strictly for the use of contributors. Users should take everything in the article with great caution, regardless of any template, if it hasn't been approved by an expert. Besides, Wikipedians do not seem to be particularly good at deciding what does and does not require citations. --Larry Sanger 15:59, 9 March 2007 (CST)
- I agree with Larry. In practice, many "POV-pushing" Wikipedians use Template:Fact or Template:Cite needed (or the worse Template:Dubious) just as a mark for the content they do not like (or a first step to removing something if the requested source is not given). So it needs a general cleanup. While sometimes it is well-intended and, in Wikipedia, perfectly OK, it looks useless here, as I guess we make the well-intended requests on talk pages. --AlekStos 04:05, 10 March 2007 (CST)
Hi Guys - I missed out canis familiaris and Canonical Gospels for now, and did an extra 2 instead. The canis article seems to have been developed a lot from the original Wikipedia version, so "External article: from another source, with little change" didn't seem right. And I wasn't sure about Canonical Gospels. --luke 19:25, 11 March 2007 (CDT)
- Hi Luke, thanks, and as to canis f.--it's a redirection page and therefore needs no checklist. As to the gospel article, it's definitely beyond a stub, but with no intro at all you can't bold the title and therefore cleanup is not done. --Larry Sanger 11:54, 14 March 2007 (CDT)
Some feedback. It looks like a good idea. However, it was not as simple to do as I thought before. For example, it is not always clear whether the "WP checkbox" should be checked (see e.g. FCLB). Sometimes it was difficult to judge the status (is the article nearly complete or not that much??), especially when it concerned topics outside my domain (so virtually all articles given by the "standard" alphabetical range). I assessed as "almost complete" a viper article; at the same time a much longer article on another viper was labeled "developing" by a more qualified editor (BTW, the serpents are somehow special).
Is it worth the effort? I think so. It gives a good general framework. It can be useful for workgroups and improve some "management" of the project. I see an analogy with bookkeeping in a business - while sometimes it can be excessive or "virtual", basically it reflects the state of the project.
Is this something we can expect to implement on a large scale? I guess it's not impossible. I'd prefer to deal with my domain. This would at least double the speed and divide the number of doubts by two. However, if we put "cleanup your home" as the general rule, a problem arises, since not all workgroups are really active at the moment. So, maybe we could start the cleanup by domains where we have some active users and then pass to what remains.
Now, how many articles per author on average? As far as my scripts can tell, we have about 100 users visible on recent changes from March 1; about 40 of them have made more than 10 edits. So, roughly 30-40 articles per author. Looks feasible, not effortless, though. Not sure how many articles here on board; nor how many authors would be interested in "accountancy".
Other remarks/suggestions.
- Almost all our articles - as far as I can tell - are "underlinked". I guess it still will be the case a few months from now. So the checklist entry looks superfluous.
- technical issue observed: The "Show changes" button do not work properly when editing a section (if editing the whole page it works well). BTW, glad to see "WP content" checkbox working properly ;-)
- Serpents articles are somehow special... These are often of status 2 or 1 CZ live _internal_ articles - and at the same time not very different from its Wikipedia version, so, formally, of status 4 _external_. I think this would be more frequently the case when we have more former WP authors on board. It is not clear to me whether the "content-from-WP" checkbox applies in such cases. After all it uses the author's own knowledge and not that much the content of WP. Maybe a declaration of the author would do?
Sorry, it was long (just some thoughts). --AlekStos 16:21, 12 March 2007 (CDT)
Thank you, Alek, for the best and most comprehensive comments yet. You're right that it isn't easy, it requires many judgment calls, and we need to expand our rule set carefully. But I also have become convinced that it will help tremendously to have all the various categories that the checklist creates--and to preliminarily assigned all of our articles to workgroups.
You can deal with your domain, but first we need to assign articles to workgroups, so that you can review the articles in your domain.
As to the snake articles, they should be 4, I think. They are special in that the main author of those articles has declared that he wants to edit/maintain them here on CZ (well, perhaps pending a Forum discussion--I haven't checked in on that). But that doesn't make them any less 4s: they have not been significantly changed from the WP copies. I could be mistaken (a check of the page history is in order), but I doubt any of them have three or more edits in three different places, which is the minimum criterion for something to move out of the 4 category.
I believe you are mistaken about the proportion of underlinked articles. It's true that most articles are underlinked, but at present writing, 46 of 79 articles are underlinked. I would hope that this proportion will decline over time, and this is important to work on--particularly in particular domains. For example, I can more easily find a "home" for an orphan philosophy article.
--Larry Sanger 20:27, 13 March 2007 (CDT)
- Right. Assigning workgroups seems absolutely necessary; perhaps some snake checklists should be reconsidered; I have no big problem with "underlinked" entry - just had no chance to put there "no". As a side note, please do not think that the term "accountancy" was used in a pejorative way. To make it clear, I think it is needed as necessary for efficient management. --AlekStos 09:21, 14 March 2007 (CDT)
Any last objections, suggestions, etc. before we begin??? --Larry Sanger 13:53, 14 March 2007 (CDT)
Some stats. As of today, we have 1584 articles found e.g. on special pages->most viewed pages (and counting!). 1101 of them are marked CZ-Live, not too bad. 159 are marked as content-from-wikipedia. 832 have no workgroup assigned (I can provide lists if anyone is interested in). I can not believe that only 159 articles come from WP, I think that virtually every article should have its workgroup. This is why the Big Cleanup is needed. --AlekStos 05:01, 15 March 2007 (CDT)
Questions?
Please list any questions you have below, and Larry (or someone) will answer them.
What if I don't know what category to put an article in?
- Choose a category from our list of workgroups that seems most likely to you, and then make sure that on the checklist you set cat_check to "yes" (so, one line in the checklist template looks like this:
cat_check = y
- If none of the categories look right, then add Category:Needs Workgroup to the article.
An article is not linked from other expected articles not because the links were not made, but because those other articles do not exist yet. Expected links from existing articles instead exist. Does such an article qualify as underlinked? --Nereo Preto 14:19, 8 March 2007 (CST)
- Yes, it does. The point of tracking "underlinked" articles is that we want to encourage the development of the important conceptual pathways, as it were, to our relatively specialized articles. The more of these "in demand" articles we create, the more sense CZ will make to the end user. "Underlinked" articles are a superset of orphaned articles (articles to which no other articles link), but reason for caring about the concept is roughly the same.
Does Topic Informant Workgroup exist? I assigned it once and it was red. --AlekStos 16:47, 12 March 2007 (CDT)
- It does. Check the link: Category:Topic Informant Workgroup.
- I see. It works in the article (=mainspace); in the checklist, however, it is red.
Maybe I did something wrong, maybe it's a problem with the template -- see F. Albert Cotton, perhaps the first TI Workgroup's checklisted.BTW, I believe this Workgroup should appear somewhere over there. I put it under "Humanities", but do not know whether it is the right place (easy to revert). --AlekStos 09:55, 14 March 2007 (CDT)- Ok, I solved my problem: TI Workgroup Home Page does not exist yet (and this is what is linked from the checklist). --AlekStos 10:12, 14 March 2007 (CDT)
- I see. It works in the article (=mainspace); in the checklist, however, it is red.
What is the workgroup for Acetabulum? -Versuri 05:56, 16 March 2007 (CDT)
- Anything about ancient Greece or Rome goes in Category:Classics Workgroup. If you believe this might also properly belong in some other workgroup as well, you can always set 'cat_check = y'. --Larry Sanger 08:56, 16 March 2007 (CDT)
I have not removed the red link images of vipers. I think that Jaap will use it. -Versuri 07:54, 17 March 2007 (CDT)
- All right; I would have removed them, but I don't think it matters much either way in this case. You could ask him. --Larry Sanger 08:40, 17 March 2007 (CDT)
A number of the "V" sets that people can sign up for contain only redirects to snake articles. I assume this means we can just cross those sets off as done? --Joe Quick | Talk 04:31, 19 March 2007 (CDT)
- Yep! You get credit for discovering that there's nothing to do. --Larry Sanger 20:52, 20 March 2007 (CDT)
How should we deal with approved pages and their draft pages? It would seem like the approved version should be marked as "approved" while the draft page is marked as "developed." Does that make sense or are we not even going to add the checklist to both versions? --Joe Quick (Talk) 20:04, 20 March 2007 (CDT)
- Add the checklist only to the draft talk page, not the talk page of the approved article, and call it approved (status = 0). The article talk page is supposed to redirect to the draft talk page, actually. --Larry Sanger 20:52, 20 March 2007 (CDT)
What are your techniques for searching WP content in CZ articles? Any tools for doing this effectively? Yuval Langer 17:57, 24 March 2007 (CDT)
- You should probably make a trip to the page history in most if not all articles. If you want to determine whether an article is sourced from Wikipedia, then just look at the first version in the edit history. Virtually all Wikipedia articles left in the database have templates and images (that we have not uploaded, and thus are distinctive red links). That should be enough for us to tell whether to check the "Content is from Wikipedia?" box. If you want to determine how much an article has been changed from its Wikipedia original, go to the page history and press the radio buttons next to the oldest and the newest edits, and hit "compare". You'll be able to see the differences there. We have mostly been assuming that the original-uploaded version is identical to a Wikipedia original. --Larry Sanger 18:20, 24 March 2007 (CDT)
Should I be deleting categories in an article that are not directly one of the workgroups? (ie delete neurology and replace it with health sciences) David Martin 09:40, 29 March 2007 (CDT)
- Yep. Particularly if a category is red. Delete all red categories; leave the blue categories so that we don't lose any potentially useful info. --Larry Sanger 12:26, 29 March 2007 (CDT)
After an article has had 3 major edits and has been tagged CZ Live, should the flag be removed stating that it is from Wikipedia? I was unsure if this flag is to give credit to Wikipedia for the basis of the article or if it is just to track those articles that need editing. David Martin 20:16, 2 April 2007 (CDT)
How much information is needed for an article to be flagged as containing content from Wikipedia? If there are only a few sentences or one section, does this constitute "containing content"?David Martin 22:10, 2 April 2007 (CDT)
update this sentence
I noticed a difference in this sentence: "We divide our body of articles into five categories: approved, developed, developing, stub, and "external" (i.e., borrowed from Wikipedia but not significantly changed). Furthermore, since every article is also marked with its..." and the categories listed in the template you are developing. -Tom Kelly (Talk) 12:58, 17 March 2007 (CDT)
Well, I recently updated the template. The sentence was updated to reflect the template update. Clear enough? --Larry Sanger 14:39, 17 March 2007 (CDT)
I did not see advanced or internal in the sentence but it is listed further in the article. I added internal and advanced to the sentence - feel free to remove if that was not the right thing to do. I think internal and advanced could be defined here like external is. what do you think? is it defined somewhere else? -Tom Kelly (Talk) 03:04, 18 March 2007 (CDT)
"Internal" refers to all articles in categories 0-3. "Advanced" refers to all articles in categories 0-1. Yes, it is defined somewhere else, namely, CZ:The Article Checklist (section "Article Status"). --Larry Sanger 08:10, 19 March 2007 (CDT)
Don't forget to include Health Sciences Workgroup in some Biology / Chemistry articles
I noticed that the articles on Vitamins where not tagged with Health Sciences Workgroup. I would like to stress that it is important that articles related to medicine, even though their main focus may be biology, chemistry, or biochemistry, should be included in the Health Sciences Workgroup. -Tom Kelly (Talk) 21:55, 22 March 2007 (CDT)
Cleaning up "Articles for deletion"
In my first group of articles was the article "Tcf", which is marked "Articles for deletion." Should I bother doing the cleanup tasks to articles that are so marked? Bruce M.Tindall 18:18, 30 March 2007 (CDT)
Should we bother cleaning up big unchanged external articles?
Yesterday's Notice Board announcement said that external articles are being deleted "pretty aggressively." So if, during a Big Cleanup run, I find large articles copied verbatim from Wikipedia without significant changes, should I even bother removing the images, adding workgroup categories, etc., or should I just fill out a checklist with the category "External" and assume that the article will be deleted?