CZ:Proposals/Create a page for all notable genes in the human genome: Difference between revisions

Latest revision as of 02:45, 8 March 2024

I suppose it's time that this proposal be official withdrawn. Archiving the text below in case the proposal ever gets resurrected in the future. Andrew Su 20:10, 14 July 2008 (CDT)

This proposal is presently driverless. Why not become its driver?
You can sign up on its proposal record, which may be found on the driverless proposals page.

At first sight, this seems to be something that the relevant workgroups (Biology and Health Sciences) can decide by themselves. However, the proposal may easily create a precedent with wide-ranging implications, for instance on what type of stubs are acceptable, and whether we want a bot write a large number of articles. For that reason I think it's best that the full Editorial Council decides.

Driver: None

Complete explanation

This proposal will entail the automated creation of some number of pages relating to human genes and proteins. The exact number of pages and the criteria for selection are open to discussion (but generally this effort will only be worth it if that number is greater than 1000). These stubs will be seeded with content from the public domain. The types of information currently available to use in the stubs can be seen in the WP links below, and additional information can be added if we've missed any relevant databases.

Formatting of stubs is also up for discussion. The WP links below can serve as one template to work from. We also prototyped a layout that uses subpages at APP. The bot can be programmed to run every quarter or so to keep the gene pages in sync with the underlying databases. Importantly, all bot-maintained content will be clearly contained in bot templates, so no human edits to the rest of the page are in any danger of being overwritten. (See source of the WP pages for examples.)

Past discussions

This proposal was previously discussed in late 2007, so please read the threads below for context. (If there are questions which are not addressed there, please let me know and I will expand here.) The proposal was tabled at that time due to uncertainty around the licensing of CZ content.

Forum: [1]
Mailing list: [2] (all seem to be confined to September 2007 as linked here)

Previously-raised issues

~~License~~ solved...
Does the existence of similar WP pages reduce (or increase) enthusiasm?
Does creation of up to 10k pages render the "Random page" link useless (or at least horribly biased)?
Are there enough biologists and geneticists so that this effort won't stagnate?
The driver is employed by a pharma company, so is impartiality a concern?
Would this effort be sufficiently different from the array of other "gene portals" out there?
Are gene stubs (which individually fall far short of encyclopedic articles) useful to CZ?

Reasoning

This would be a relatively unique tool in biology. All existing gene portals are "top-down" from centralized "gene annotation authorities". Of course, those authorities are a bottleneck for sharing new findings in a gene-centric web database. Looking several years down the road, I hope each of these stubs evolves into a gene-specific review article on every gene in the human genome.

I say relatively unique since one version of this project is now nearing completion at WP. Approximately 9500 pages are currently in existence (listed here: [3]). Some CZ users previously expressed concern about whether CZ should embark on this project given the similar/parallel WP effort , and that certainly should be discussed again below. (But note that this rationale was not the reason we did not move forward previously.)

Implementation

As I see it:

Discuss to decide if this project is appropriate for CZ. (If no, end here.)
Finalize a format, perhaps continuing the use of APP as the model.
Find a willing programmer (ideally an undergrad or master's student interested in a cool project).
Make the pages.

Assuming Step 3 doesn't drag on too long, I'd estimate that the whole project take approximately six months.

Discussion

A discussion section, to which anyone may contribute.

I am in no way knowledgeable in this area, but two things come to my mind after reading this. First, the Recent Changes issue. I agree with Larry Sanger that we should have some sort of exclusion code to prevent these from showing up. The whole purpose of the link (to me at least) is to get an interesting page on a different topic than one would normally read.

The second idea question is whether this would be better suited for a catalog. I know that would be a lot of pages, but having a central area to pick out what gene you want to look at doesn't seem like such a bad idea.

Well, there's my opinion on the matter. John Dvorak 16:08, 16 April 2008 (CDT)

I have the same arguments against this as previously.

Almost nothing is known about most genes, so most pages will essentially be blank.
People are much more interested in gene products, proteins, than the gene.

For example, one might look up insulin, but who knows what the name of the gene is for making it? How do you write an article about a gene? Gene so and so makes the protein TNF-alpha, then what? You might say it is upregulated or suppressed by other proteins, but what else? I will, however, admit that my interests and work may have clouded my thinking into one that is protein-centric. To me it seems natural to write articles about proteins, and add a link to GeneBank or another gene bank data set, much like we have been doing with the drug articles where we add links to DrugBank, MedMaster and an FDA site. David E. Volk 14:44, 17 April 2008 (CDT)

See the wikipedia version of the insulin gene. While proteins are the functional unit biologists often consider when thinking about physiology and biochemistry, if you're a developmental biologist or interested in human disease then the regulation of genes, and the various different alleles of a particular gene are of great interest. Bear in mind too that this project can be tailored to our needs.

While I agree that many of the articles will initially have minimal content will this not serve as a nucleus for content related to human biology? Also, i would imagine that most articles on proteins will not be specific to humans. Possibly these human gene articles could be more and have protein information specific to human diseases and physiology too, while directing readers to our articles that discuss the broader picture for a given protein. Chris Day 15:15, 17 April 2008 (CDT)

Regarding David's first comment, let me provide a few links to pages at WP as they were created by our bot, just to give people an idea of the range of stub content. Obviously, we can set the bar wherever we want here at CZ:

http://en.wikipedia.org/wiki/PMM2 -- brief summary, protein structure
http://en.wikipedia.org/wiki/WASF1 -- brief summary, no protein structure
http://en.wikipedia.org/wiki/POLQ -- no summary, no protein structure

Regarding David's second comment, it's pretty rare over at WP to have separate pages about proteins and the genes that encode them. For example, consider this page. I would be happy to change all references from the "gene wiki" to "protein wiki" ;) and/or incorporate any more protein-centric content that can be systematically harvested. As Chris alludes to, we are completely open to modifying the format according to what the CZ community feels is appropriate. Cheers, Andrew Su 15:45, 17 April 2008 (CDT)

Those articles are great examples of why I am against this proposal in general. It seems very odd to have a 1-3 sentence articles with 30 references listed. It looks and feels like an automated fact picker, but not an encyclopedia article. Most of the information in Gene Ontology in fact refers to what the protein does. Same for the Biological Process parts. The few things that can be said regarding up/down regulation could easily be put in a "Gene" subsection of the protein article. Finally, I notice that WP does not italicize genes like they should. Thus, the MutY protein is made by the MutY gene. David E. Volk 16:14, 17 April 2008 (CDT)

As I said above the articles could/should be about the human proteins and the gene. Re: MutY/MutY, does it matter what wikipedia does wrong with respect to our own proposal? Another thing to consider is that much of this information can be distributed into different subpages so potentially the creation of subpage content is the desired goal? Do we need an article to start a subpage? I ask this in all seriousness as we do have some subpages that exist without an article (related articles pages to date). Possibly this is a model to work with? Chris Day 16:22, 17 April 2008 (CDT)

Let me be more clear on the subject. Originally, this proposal started out as an automated procedure to make thousands of stub articles (now the word notable is added). That is my main point of contention. I have no objection to an author writing in great detail about a particular gene. For example, they might include a family tree for that gene showing some evolutionary fact, or discuss particularly important alleles and list the nucleotide changes that cause a particular disease. My previous analogy was to ask, "why don't we then just create a stub of every word in the largest dictionary available and wait for folks to come along later to fill in the data?" We could even list 30 dictionaries as references but still leave a stub article with just the word. Would that look like an encyclopedia article? I think the automated approach is the biggest problem.

If I ever write an article about the Ford Mustang, it will be about the car, not the blueprints at the factory, or even the older blueprints showing the evolution of the Ford Mustang over the years. I do realize that genes are very important to evolutionary biologists, but it is the end product that makes or breaks the baby. David E. Volk 16:45, 17 April 2008 (CDT)

But isn't this about populating subpage type material? While this is automated it seems like a helpful start. For me the blueprint analogy would be like having articles with the sequence of the gene alone. I agree that would not be useful. Also, I'm not sure the dictionary analogy is comparable as the links in this cases are to resources and information about the specific human gene/protein. Chris Day 17:35, 17 April 2008 (CDT)

David, am I correct in reading that your objection, at least primarily, comes down to whether or not stubs are useful to CZ? I agree with all of your comments above (except for the dictionary analogy). This proposal does in many ways boil down to "an automated fact picker", and no question that the output is not an encyclopedic article. In my mind, the question is whether the CZ community feels that those output pages (or something like them) are useful in the CZ context. The stub proponent in me says they are (since having minimally useful content draws experts who then contribute less-minimally useful content which then eventually evolves into an encyclopedic article), but I'm also open to the conclusion that CZ doesn't want to encourage mass stub creation. Andrew Su 17:59, 17 April 2008 (CDT)

Andrew, why is the dictionary analogy not correct? It would just require a larger group of experts to come and fill in the details. Likewise, we could write a bot to make stubs for 1 million known chemical compounds that are listed in the Chemical Abstract Service (CAS) and automatically pick up the melting points, molecular mass and so forth, and then wait for experts to fill in the data. We could have 1 million articles in a few days, all stubs.

I would be against the chemical bot too. Can you explain, in any way, why these two analogies are not the same as your proposal other than these analogies do not fit your area of knowledge? I would hate to see 90% of the content at CZ be stubs. I know you feel this proposal will bring in experts, but frankly, the experts already know to go to the sites that you will be copying data from. CZ will not be creating any knowledge with this proposal, just regurgitating it, at least for the very near term. The experts you seek are the people depositing their data into GenBank, etc. Your best bet would be to get some students involved so that a few sentences at least could be written for each article.

CZ is trying for quality over quantity, and this proposal is the opposite of our overall goal. Perhaps we could make a list of the hundred or so notable human genes, and then some of us in the biology/chemistry groups could then split the list and start writing articles aided by the bot. David E. Volk 08:25, 18 April 2008 (CDT)

Please bear in mind that if we move forward with this proposal, we will almost certainly have to twiddle with article lists, e.g., with Category:CZ Live, as well as with Random Page, etc. But suppose that all that content is in CZ, but we don't count it (the articles are not included in our "Live Articles" count). Does that affect the above argument(s)? --Larry Sanger 11:14, 18 April 2008 (CDT)

David, no disrespect intended, but I feel like we need more voices in this debate. It sounds like both you and I are very familiar with the pros and cons here. And while interesting, I don’t think the discussion above is bringing up any new issues that are likely to sway our individual views. I think we need more people in decision-making capacities (Editorial Council?) to chime in here to ask questions, express concerns, etc etc. Based on that input, we can summarize relevant pros and cons in a focused way. Absent interest (and ultimately enthusiasm) by others in the community, I’m happy to let this proposal die by neglect. (On that note, thanks Larry for chiming in. I'll let David reply as to whether that change would assuage his concern, but I think they have more to do with whether the stubs individually are sufficiently useful to be included in CZ on their own.) Cheers, Andrew Su 11:28, 18 April 2008 (CDT)

I think it is accurate to say that this work is being done better elsewhere, and the only absolute way to make this even remotely viable would to somehow convince all of the work being done on the databanks by all the contributors and the maintainers to shift their operations over to CZ, which has a very low probability of success. --Robert W King 11:40, 18 April 2008 (CDT)

Robert, there are two big limitations of current gene portals. First, while they are great at storing and displaying tag/value data ("structured content"), they are not so good at displaying free-text interspersed with images and graphics and tables ("unstructured data"). Second, existing gene portals are pretty exclusively one-way communication. If a reader notices an error or omission, there is no way for you to actively fix that problem. Of course, those two weaknesses dovetail nicely with two of the primary strengths of a wiki. So, we wouldn't be promoting a competitor to existing resources, but a complementary tool. Does that answer your concern? Andrew Su 15:23, 18 April 2008 (CDT)

Andrew, no disrespect taken, I agree that more voices are needed regarding this proposal, but like last time, it would seem only a handful of people care enough either way to chime in. Making this material subpages, rather than Main domain articles, certainly improves its favorability. I do wonder about maintainability. Are you suggesting the bot runs out every few weeks for updates? If so, will it kill the intervening human adjustments? Do you envision a 1-time grab of data? Perhaps the proposal description can be updated to reflect exactly what the actual plans are. A catalog of genes might inspire people to write about the genes, afterwhich a specific page could be promoted up to the Main domain. David E. Volk 09:55, 21 April 2008 (CDT)

All, I've added a few more details up above, but the key point to emphasize is that this very general proposal can be tailored however the CZ community sees fit. Of course, as David alludes to, this requires some degree of enthusiasm by the biology workgroup. I don't want to invest lots of time on this proposal without a clear consensus and community backing (which I agree with David, have yet to materialize). Andrew Su 12:56, 21 April 2008 (CDT)

Why this initiative requires a recruitment component

Andrew, a few comments. First, assuming that there is a reasonable chance (however we define that) that this initiative can be made to work, I am in favor of it. So, what are the chances that it can be made to work here? I do not put much stock in arguments like Robert's above (basically, others are doing this already and it's hard to get them to move here), for the reasons you cite: if we can perform a service better than others, then we'll gain support. The mere fact that others have started something is no reason not to start a competitor, if you can clearly do better. That is the principle that motivates CZ in the first place. But CZ already has a lot of articles that people started some time ago, and they have not been edited since; I think of the large number of "asp" articles Jaap Winius uploaded last year. There is one and only one reason that we would want to serve as host to such inactive articles, and that is that the same (or nearly equivalent) information is not as easily available elsewhere. But my impression (correct me if I'm wrong) is that the information you want to upload is available elsewhere (in multiple locations?). If that's correct, the entire value and purpose of CZ's hosting this information is that it can be systematically improved here. And then the issue obviously devolves to this: can we reasonably hope to get enough people interested in systematically improving these articles?

If you are relying on the enthusiasm of the current Biology Workgroup, then our answer is obviously "Nope, we can't do it." Because obviously the enthusiasm isn't there. This is just not how self-selecting open-ended collaborative projects work: in such projects, people pursue what interests them, and so it would be an amazing coincidence if among the biologists who happened to show up in CZ there was a strong interest in gene articles. More to the point, the fact that there is a group and that group members do a remarkable amount of work (over 800 articles in the Biology Workgroup, including 16 approved, 73 developed, and 266 developing) does not entail that they are an organized and assignable group. This is something that a lot of people who look at Wikipedia and CZ and similar projects never perfectly understand: we don't have a staff that we told what to do!

If you want to interest enough people in systematically improving the gene articles, there are two ways to do this, it seems to me. The first is to wait for a few years when, hopefully, CZ is the main game in town for credible encyclopedic information, and we've got zillions of biologists already involved or willing to get involved. The second is to recruit geneticists (i.e., people who are willing to work on the gene articles) systematically. You might be able to use the Workgroup Weeks framework I've set up (perhaps you already had that thought?) to accomplish that. We could discuss the details of that. But I would caution that you can't expect people to jump on the bandwagon even of getting the Workgroup Week started. As you can see on Archive:Workgroup Weeks, not even many of the most active Citizens are interested in helping. If you wanted to guarantee that a Gene (or Genetics) Week would happen--in which case I would probably support uploading many gene articles for people to work on--you'd either have to show that there were many people willing to help you out, or you'd have to tell us that you'd do all the work yourself!

I don't mean to be discouraging and it looks ungrateful (looking a gift horse in the mouth, as it were), but it all comes down to keeping CZ relatively free of "cruft" that will never be maintained. I can imagine various ways that the project could be maintained. It's just that it would take considerable leadership--and, since no one else is currently in a position to take that on, it would have to be leadership from you, I'm afraid! --Larry Sanger 10:48, 19 May 2008 (CDT)

Larry, thanks for the thoughtful reply. Unfortunately, I don't think I'm comfortable combining this proposal with any sort of recruitment effort. It's just not my area of expertise or interest. And like many people here, I only have so much time to devote here so I better stick to things I'm good at and passionate about. So with that as a backdrop...

I think the question for the ed council is whether this is potentially a recruitment tool. Not necessarily as part of a formal recruitment effort, but imagine for a moment that you're a CZ-naive molecular biologist coming here for the first time. (Read about CZ in a blog, or heard about from a friend, etc.) As a molecular biologist, the first thing I'd do is search for my favorite gene, and if the search came up empty, I'd probably say "oh well" and browse away without too much thought. *But*, if I did find a page on my gene, I'd think "oh cool", then probably "oh, this page is crap", and then "let me add one sentence of this critical information". You've sucked in one more editor.

So obviously, you don't know what the next molecular biologist's favorite gene is, so I'm arguing one thing to do is to create these pages en masse. Clearly the vast majority of pages won't get touched anytime in the near term. But I think the question for the ed council whether the few pages that do get edited (and the new CZ editors who make those edits) are worth having a bunch of "cruft" that admittedly will just sit there. (Implied references to the Long Tail go here...)

Anyway, I don't want to be a single-issue nagging voice here. Larry advises that the next step would be for an ed council member to formally pose this as a resolution. If there is such a person who buys the argument above and wants to propose it, please do. If there are more questions, I'll be happy to answer. If there's no other action here in the next week or so, I'll go ahead and withdraw the proposal. Cheers, Andrew Su 20:23, 19 May 2008 (CDT)

Well, that's an interesting argument: if you make a lot of basic gene pages, pre-filled with facts, that will by itself attract a lot of new geneticists. Is that true? Maybe, but I'm sorry to say I'm skeptical. I hope you'll indulge me a little bit as I try to reason this through. (Many people seem not to have patience for this boring habit of mine, but I can't seem to stop.)

Now, as a general rule, content can of course attract more people and content. But I don't know if in our case that will be enough. I've found that most academics and scientists just won't contribute to a project in their own field, if they think it won't enhance their CV. This means the project has to be "academically distinguished"--it has to boost their CV. Well, CZ is great, as we all know, and for various reasons deserves the support of academics, but it is not yet particularly academically distinguished, if it ever will be. Our editors tend to work for the love of it (and I think that has more to do with a love of the beauty of knowledge than with ), and for other reasons, but not so much because they think it will advance their careers (unless they have or want a career having to do with Internet projects like this).

There's another way to get new people on board, and that is to get them excited about the project in general. I think your story, above, relies on people doing that. They're predisposed to get excited about things like CZ, you seem to assume, if they see content that is to their liking. But most academics and scientists aren't predisposed to get excited about things like CZ. Trust me--I know. The only reason that CZ is doing as well as it is, is that we are using enormous economies of scale: we allow people to create reference content about anything, and we let people from the high school level on up to contribute, and we let English speakers from all over the world contribute. The potential body of contributors for this project is hence many orders of magnitude larger than the potential body of contributors for a Gencyclopedia. This again is why CZ is growing while zillions of specialist expert wikis, even many of the best known of them, are floundering.

That's why I say it's crucial that this project have a recruitment component. It just is not the case that if you build it, they will come--again, trust me, I know. In fact, I suspect that even if we held a Gene Week or Genetics Week (or something like that), there's still a rather high chance that the project would never reach the critical mass needed to take off.

Biology Week, though, is going to be a going concern, and it will get a lot of new biologists involved. People are already lining up for that, I'm delighted to say. I think it would be a good idea to return after we've got a bigger group of biologists involved, and then combine a Gene Week (and you would probably get help with that then) with the whole auto-creation project.

Again, I'm sorry to seem to put you off--I really don't want to, because I love all reliable content... --Larry Sanger 13:30, 20 May 2008 (CDT)

Larry, I certainly trust your real-world experience more than my hypothetical assertions. But two additional points as food for thought. First, it seems like right now you're targeting to get a few people very excited about CZ. This gene wiki effort would be aimed at getting lots of people a little bit excited. (The Long Tail says that both populations are useful to cultivate.) Second, even though CZ has a role for experts, it doesn't mean that all (or even most) of the content has to be written by experts. In fact, although you seem to be a bit disappointed by it, I personally believe for CZ to be successful the vast majority of content will be written by lay people, undergraduates, and high school students. Experts are there to ensure completeness and objectivity and to resolve disputes, but as you allude to above, it's hard to get academics to directly contribute. My two cents... Andrew Su 16:04, 20 May 2008 (CDT)

Bot-assisted article creation?

Andrew (or any of the tech-minded folks here), would it be possible to insert such bot-generated content "on the fly" (similar to the metadata) along with the manual creation of the respective CZ article? I envision a list of topics (currently, but not in the long term, restricted to genes) on which a template, e.g. {{Gla}} for "Gene List Assistant", is applied to the individual entries (much like {{Rpl}}) is used on CZ:Core_Articles/Biology and thelike), thus signaling (e.g. by icons like those in the series) that bot assistance for fact picking is available for the creation of an article on this particular topic. Such a treatment could perhaps lure new contributors in, especially if applied to lists like Special:Wantedpages or Category:Definition Only. Similarly, one could think of a corresponding {{Cla}} template for chemicals, {{Sla}} for stars, {{Spla}} for species and so on as new bots become available. -- Daniel Mietchen 10:45, 20 May 2008 (CDT)

Hi Daniel, interesting idea... I definitely see the advantages of this type of system. (Although truth be told, I'd still be in favor of creating the stubs outright, since I don't think this template system will reel in my hypothetical naive molecular biologist above.) Feasibility would definitely depend on the input of a MediaWiki expert (which I am not), but I'm confident that if those issues were resolved, we could adapt things on the bot side to make it work. Cheers, Andrew Su 12:32, 20 May 2008 (CDT)

As a biologist i think that Andrew is correct that this would draw in biologists to edit citizendium. Some may only make a few edits but others may stay. Especially attractive is the fact that CZ has eduzendium which serves as an excellent platform to aid teaching in the new active learning envirnoment that is becoming very popular and supported by Howard Hughes and the National Academy. For me this whole proposal is about the future. Right now it might not look great on paper for some of the reasons above. But citizendium has a chance to be at the front of the next big wave here. At worst these pages lie dormant, potentially as orphaned subpages until their time in the limelight arrives. Chris Day 13:27, 20 May 2008 (CDT)

Good point there Chris--that's an angle I hadn't approached. Maybe it's just not that damaging to CZ to have "cruft" in the form of dormant gene articles. I think I'd buy that. The whole strength of my own arguments rests on the assumption that unmaintained gene articles would be cruft. But if we can reconcile ourselves to the view that they aren't, then I'd support it.

I don't mean to deny that we would get some more biologists on board. I'm sure we would; I just doubt it would be enough to develop a lot of gene articles en masse. But if that's not a problem, importing the articles as a means of attracting some more biologists might not be a bad idea. --Larry Sanger 15:07, 20 May 2008 (CDT)

Another technical question: Would it be possible to have such high numbers of subpages for few articles, like Gene? If so, we could place the contents there and make it easy to start an article on any given entry (e.g. by having an "update bot text" button). Still, Random pages and perhaps other things would have to be remodeled, with side effects to be expected. -- Daniel Mietchen 03:07, 21 May 2008 (CDT)

If we're really going to have tens of thousands of these pages, why not give them their own namespace? It seems to me that a clearly defined major project like this might be a perfect application for that. I'd have to look at the code, but I suspect that RandomPages, etc would probably not go into non-main namespace pages. J. Noel Chiappa 07:31, 21 May 2008 (CDT)

Daniel, I'm not sure this answers your question, but I think I forgot to direct people to APP, which is a page I created manually that shows all the different possible types of auto-generated content. Not exactly sure what you mean by "make it easy to start an article on any given entry". Are you proposing that we have this content available on subpages but not visible until a human editors "activates" it? Noel, how would you propose handling these articles as they evolve? For example, insulin and innexin appear to be developing quite well in the main namespace. Would those get moved to the alternate namespace? Thanks for the feedback and ideas.... Andrew Su 08:47, 21 May 2008 (CDT)

Yes, this means "that we have this content available somewhere but not visible in the main namespace of CZ until a human editor activates it." What I meant by "make it easy to start an article on any given entry" is that bot contents (brief summary, protein structure, references or whatever is available) can be generated ("on the fly" scenario) or retrieved (subpage/namespace scenario) upon creation of such an article in the main namespace, perhaps along with the creation of pre-formatted metadata, including a definition. It would also be cool if intrawiki links (e.g. to proteins or other genes, to diseases or species) could be suggested by the bot. -- Daniel Mietchen 10:53, 21 May 2008 (CDT)

Hi Daniel, I'm less enthusiastic about hidden-until-activated pages because it partially detracts from the potential benefit of my hypothetical naive biologist. Personally, I think the vast majority of editors get hooked because they found an existing page and wanted to improve it, rather than finding nothing and wanting to create. Your proposal sounds like something in between those two extremes... Andrew Su 14:49, 22 May 2008 (CDT)

I agree that hidden is less attractive. It seems to me that there are three main reasons for not doing this, 1) maintainability, 2) the random page problem and 3) not encyclopedic (i.e. could do it for every chemical). Personally I disagree with these three issues as reasons not to proceed. The first two appear to be more short term oriented objections. For me, in the long term, these will not be major problems, even if they are at the present. The last one is trickier but I don't find the analogy exactly true. Biology is so young compared to chemistry and the nature of genes in development and disease is a massively important question in biology and we have barely scraped the surface. Could this be said for chemicals? (not rhetorical) Also the number of chemicals is far greater than the number of genes, and this proposal is only suggesting notable genes.

Who said "Build it and they will come"? That seems like the right attitude for a project like this that is trying to establish it's place in the i-world. The worst case scenario is we have a lot of clusters that sit dormant for a while as the project gets rolling. But are they really dormant? How many scientists will take a peek? Possibly pass on the word of an interesting new forum to bring together data that is spread all over web? With no pages we know this will not happen. With the pages something might happen. Persoanlly I see no reason not to take a risk and see what develops. Chris Day 15:05, 22 May 2008 (CDT)

How about a test page?

Given the current stage of the discussion, I assume it would be insightful to test the bot assistance on a small scale, i.e. one page limited to 400 gene names, formatted by {{Gla}}. I am currently drafting an initial version for this template, based on {{Rpl}}. Help is very welcome. Also, if any of you have a suggestion for an icon that signals availability of bot assistance (or flavours thereof - e.g. short or extended summary, with or without protein structure), please post the links to those image files here. Ideally, the template would then accept a click on the relevant icon as input and generate (or retrieve) the associated content. -- Daniel Mietchen 10:53, 21 May 2008 (CDT)

Hi Daniel, pardon my inexperience, but I'm not exactly sure how {{Gla}} and {{Rpl}} work. I can see the test page APP linked here, but I'm not sure how you'd envision this all working in the end. Would this page evolve/expand into an index of gene pages? I think I'm missing something here... Cheers, Andrew Su 14:53, 22 May 2008 (CDT)

Hmm, okay, a bit more poking around helped... This would be a visual view of available content which would enhance your proposal above for the create-on-demand bot. (One doesn't seem to depend on the other, right?) Oh, and fyi, we have something planned that would be have like the Diberri tool, but that's still quite some ways off. And, I'm not sure how we'd get it to work with subpages. Anyway, that I think is also a very good create-on-demand option. Andrew Su 15:01, 22 May 2008 (CDT)

I am basically quite sympathetic to bot-assisted fact-picking from any suitable database, and genes could be a good start. However, I agree with David in that bot-assisted content (though it can quickly get to high standards nowadays, as highlighted by diberri, Eureka science news and other sites) is not necessarily what people look for in an encyclopedia, and I think an overwhelming mass of anything (genes, airplanes or sandpaper types) in the Random Articles might deter more non-specialists than it brings in specialists (not sure how much this feature is used, though, and by whom). Both the on the fly creation and transfer from a different namespace would alleviate this problem. I would be fine either way, and so would the test page. What I imagine in the automated part is that we have a list containing nothing else than {{gla|Gene_001}} ... to {{Gla|Gene_400}} (that's the quick part) to which the template (once functional) could then add contents that the bot has provided somewhere on CZ, perhaps not in the main namespace. This content would include data for the page-to-be plus a definition like "A gene encoding a cell surface receptor and transmembrane precursor protein." for APP. Once we have agreed on whether to put what bot-assisted information where, the template can be told to gather it there. More documentation on the template is at {{Gla}}. -- Daniel Mietchen 18:49, 22 May 2008 (CDT)

Does anyone know how many people get to CZ via a web search to a specific article page, and how many people go directly to CZ and click the Random Article link? This seems to be a point of great concern, and I'm curious what the numbers say. Andrew Su 19:16, 22 May 2008 (CDT)

What distinguishes genes from all other categories

I just don't get it. Am I missing something here? I just don't see why genes, or "notable genes" are in any way distinguised from hundreds of other categories, each of which have publicly available databases. Examples being proteins, chemicals (or notable human metabolites), Nobel prize winners, dictionary words, types of pistons, types of airplanes, sandpaper types, Kings and Queens, and on and on. What do those in favor of this project have to say that distinguishes this list of things from all of the miriad lists of other things? I simply think this is just a list of things. The fact that the science is young and lacks detail, as someone alluded to earlier, is not very convincing.David E. Volk 17:06, 22 May 2008 (CDT)

Hi David, no, I don't think you're missing anything. I think we just have a different idea of how CZ would or would not benefit from this effort. To speak to your examples above, I actually do think that many of those categories would be reasonable to create stubs for. For example, a stub for all Nobel Prize winners would be fantastic (if there were a database of biographic information we could use to populate it). Chemicals? Yeah, I say do it for notable ones (maybe all FDA-approved drugs), again assuming there's enough to make the stub useful. What differentiates the genes/proteins effort that I'm proposing is that I'm interested in genes and proteins, I'm spelling out exactly what kind of stub content is available, and I'm offering to do the work. That in my book is the biggest difference. (Again, I don't think the dictionary analogy holds. CZ is an encyclopedia, not a dictionary.)

On slightly tangential note, I'm happy that there is some activity here again. But if this is going to take an EC resolution to resolve this one way or another, it would help if there were actually EC members here to discuss and ask questions. Then, "regular folks" like us can state our cases one way or another so the EC can make an informed decision. So any EC members watching? If not, then I think this proposal is destined to die by neglect... (And scanning through the recent EC activity, gotta say, if EC has time to vote on a systematic recipe effort and not one on genes/proteins, yikes...) Cheers, Andrew Su 19:06, 22 May 2008 (CDT) Apologies, just noticed that both David and Chris are EC members. I withdraw my plea for greater EC involvement, and I'll leave it up to you guys when/if to formally propose it... Andrew Su 19:12, 22 May 2008 (CDT)

Andrew, what I am asking is why should we devote progammer/developer time for this initiative vs others. I think if we accept this proposal, there is no justification to deny all of the other lists. This would seem to entail, at least initially, the create of a new name space so as to drop our average word count to 30. There are so many data bases available these days. As for the drug articles, I am actually doing them one at a time, up to perhaps 120 or more so far, with structures, etc, but there are at least 4000 drugs out there now. So my arguments so far to date have basically been "why this subject". Is it more worthy than others. Are you going to do the programming?

Could we do 100 at a time and then have you develope each one, then do the next 100? David E. Volk 22:02, 22 May 2008 (CDT)

Nope, we can do bot programming, but not MW programming. If the EC feels like the namespace change is necessary, then I agree it significantly detracts from the desirability of this proposal on both ends. As to the question "why this subject", again, it's a subject that I'm interested in and that I am willing to devote time and resources to. As to the proposal of doing 100 at a time, I'd suggest the minimum is 1000 to make it worth everyone's effort. Again, the point is not that I'm going to develop these articles, but that these are stubs to draw new editors in. If the EC feels that this effort is not likely to bring new editors in, or that it's not worth having the "cruft" laying around, then that would be a good reason to pass on the proposal. Andrew Su 23:31, 22 May 2008 (CDT)

To get an idea what kind of information could be similarly harvested by article creation bots (and on what scale), WP's size comparison of information collections on the web might be of interest. -- Daniel Mietchen 04:52, 23 May 2008 (CDT)

Straw poll

In an effort to gauge support and opposition (and indirectly, ambivalence), please sign below if you have an opinion on this proposal...

Support

Chris Day

Oppose

David E. Volk (note, we could mine ChemSpider for >20 million chemicals. Is this better analogy than dictionary?)

Proposals System Navigation (advanced users only)
Help/policy New proposals List of all proposals Driverless proposals: proposals that lack a champion Discarded proposals: poorly formed proposals Templates: {{Proposal}} \| {{Proposal assignment}} \| {{Proposals navigation}}	Proposal lists (some planned pages are still blank): Editorial Council: Active \| Finished \| Declined Approval and Feedback: Active \| Finished \| Declined Eduzendium: Active \| Finished \| Declined Recruitment: Active \| Finished \| Declined Constabulary: Active \| Finished \| Declined Executive: Active \| Finished \| Declined Technical: Active \| Finished \| Declined PR: Active \| Finished \| Declined Ad hoc management: Active \| Finished \| Declined

@@ Line 1: / Line 1: @@
-{{proposal assignment|Edit}}
+<div class="boilerplate metadata" style="background-color: #dedaca; margin: 2em 0 0 0; padding: 0 10px 0 10px; border: 1px solid #AAAAAA;">
+:''I suppose it's time that this proposal be official withdrawn.  Archiving the text below in case the proposal ever gets resurrected in the future. [[User:Andrew Su|Andrew Su]] 20:10, 14 July 2008 (CDT) ''
+{{proposal assignment|Dless}}
 At first sight, this seems to be something that the relevant workgroups (Biology and Health Sciences) can decide by themselves. However, the proposal may easily create a precedent with wide-ranging implications, for instance on what type of stubs are acceptable, and whether we want a bot write a large number of articles. For that reason I think it's best that the full Editorial Council decides.
-'''Driver:''' [[User:Andrew Su|Andrew Su]]
+'''Driver:''' None
 == Complete explanation ==
@@ Line 94: / Line 99: @@
 If you are relying on the enthusiasm of the current Biology Workgroup, then our answer is obviously "Nope, we can't do it."  Because obviously the enthusiasm isn't there.  This is just not how self-selecting open-ended collaborative projects work: in such projects, people pursue what interests them, and so it would be an amazing coincidence if among the biologists who happened to show up in CZ there was a strong interest in gene articles.  More to the point, the fact that there is a group and that group members do a remarkable amount of work (over 800 articles in the Biology Workgroup, including 16 approved, 73 developed, and 266 developing) does not entail that they are an ''organized'' and ''assignable'' group.  This is something that a lot of people who look at Wikipedia and CZ and similar projects never perfectly understand: we don't have a staff that we told what to do!
-If you want to interest enough people in systematically improving the gene articles, there are two ways to do this, it seems to me.  The first is to wait for a few years when, hopefully, CZ is the main game in town for credible encyclopedic information, and we've got zillions of biologists already involved or willing to get involved.  The second is to recruit geneticists (i.e., people who are willing to work on the gene articles) systematically.  You ''might'' be able to use the [[CZ:Workgroup Weeks|Workgroup Weeks]] framework I've set up (perhaps you already had that thought?) to accomplish that.  We could discuss the details of that.  But I would caution that you can't expect people to jump on the bandwagon even of getting the Workgroup Week started.  As you can see on [[CZ:Workgroup Weeks]], not even many of the most active Citizens are interested in helping.  If you wanted to guarantee that a Gene (or Genetics) Week would happen--in which case I would probably support uploading many gene articles for people to work on--you'd either have to show that there were many people willing to help you out, or you'd have to tell us that you'd do all the work yourself!
+If you want to interest enough people in systematically improving the gene articles, there are two ways to do this, it seems to me.  The first is to wait for a few years when, hopefully, CZ is the main game in town for credible encyclopedic information, and we've got zillions of biologists already involved or willing to get involved.  The second is to recruit geneticists (i.e., people who are willing to work on the gene articles) systematically.  You ''might'' be able to use the [[Archive:Workgroup Weeks|Workgroup Weeks]] framework I've set up (perhaps you already had that thought?) to accomplish that.  We could discuss the details of that.  But I would caution that you can't expect people to jump on the bandwagon even of getting the Workgroup Week started.  As you can see on [[Archive:Workgroup Weeks]], not even many of the most active Citizens are interested in helping.  If you wanted to guarantee that a Gene (or Genetics) Week would happen--in which case I would probably support uploading many gene articles for people to work on--you'd either have to show that there were many people willing to help you out, or you'd have to tell us that you'd do all the work yourself!
 I don't mean to be discouraging and it looks ungrateful (looking a gift horse in the mouth, as it were), but it all comes down to keeping CZ relatively free of "cruft" that will never be maintained.   I can imagine various ways that the project ''could'' be maintained.  It's just that it would take considerable leadership--and, since no one else is currently in a position to take that on, it would have to be leadership from you, I'm afraid! --[[User:Larry Sanger|Larry Sanger]] 10:48, 19 May 2008 (CDT)
@@ Line 146: / Line 151: @@
 ::Hmm, okay, a bit more poking around helped...  This would be a visual view of available content which would enhance your proposal above for the create-on-demand bot.  (One doesn't seem to depend on the other, right?)  Oh, and fyi, we have something planned that would be have like the [http://diberri.dyndns.org/cgi-bin/templatefiller/index.cgi?ddb=&type=hgnc_id&id=2475 Diberri tool], but that's still quite some ways off.  And, I'm not sure how we'd get it to work with subpages.  Anyway, that I think is also a very good create-on-demand option. [[User:Andrew Su|Andrew Su]] 15:01, 22 May 2008 (CDT)
 :::I am basically quite sympathetic to bot-assisted fact-picking from any suitable database, and genes could be a good start. However, I agree with David in that bot-assisted content (though it can quickly get to high standards nowadays, as highlighted by diberri, [http://esciencenews.com/ Eureka science news] and other sites) is not necessarily what people look for in an encyclopedia, and I think an overwhelming mass of anything (genes, airplanes or sandpaper types) in the Random Articles might deter more non-specialists than it brings in specialists (not sure how much this feature is used, though, and by whom). Both the on the fly creation and transfer from a different namespace would alleviate this problem. I would be fine either way, and so would the test page. What I imagine in the automated part is that we have a list containing nothing else than <nowiki>{{gla|Gene_001}} ... to {{Gla|Gene_400}}</nowiki> (that's the quick part) to which the template (once functional) could then add contents that the bot has provided ''somewhere on CZ, perhaps not in the main namespace''. This content would include data for the page-to-be plus a definition like "''{{def|APP}}''" for [[APP]]. Once we have agreed on whether to put what bot-assisted information where, the template can be told to gather it there. More documentation on the template is at {{tl|Gla}}. -- [[User:Daniel Mietchen|Daniel Mietchen]] 18:49, 22 May 2008 (CDT)
+::::Does anyone know how many people get to CZ via a web search to a specific article page, and how many people go directly to CZ and click the Random Article link?  This seems to be a point of great concern, and I'm curious what the numbers say.  [[User:Andrew Su|Andrew Su]] 19:16, 22 May 2008 (CDT)
 == What distinguishes genes from all other categories ==
@@ Line 153: / Line 160: @@
 :On slightly tangential note, I'm happy that there is some activity here again.  But if this is going to take an EC resolution to resolve this one way or another, it would help if there were actually EC members here to discuss and ask questions.  Then, "regular folks" like us can state our cases one way or another so the EC can make an informed decision.  So any EC members watching?  If not, then I think this proposal is destined to die by neglect...  (And scanning through the recent EC activity, gotta say, if EC has time to vote on a systematic recipe effort and not one on genes/proteins, yikes...)  Cheers, [[User:Andrew Su|Andrew Su]] 19:06, 22 May 2008 (CDT)  Apologies, just noticed that both David and Chris are EC members.  I withdraw my plea for greater EC involvement, and I'll leave it up to you guys when/if to formally propose it...  [[User:Andrew Su|Andrew Su]] 19:12, 22 May 2008 (CDT)
+::Andrew, what I am asking is why should we devote progammer/developer time for this initiative vs others.  I think if we accept this proposal, there is no justification to deny all of the other lists.  This would seem to entail, at least initially, the create of a new name space so as to drop our average word count to 30.  There are so many data bases available these days.  As for the drug articles, I am actually doing them one at a time, up to perhaps 120 or more so far, with structures, etc, but there are at least 4000 drugs out there now.  So my arguments so far to date have basically been "why this subject".  Is it more worthy than others.  Are you going to do the programming?
+::Could we do 100 at a time and then have you develope each one, then do the next 100? [[User:David E. Volk|David E. Volk]] 22:02, 22 May 2008 (CDT)
+:::Nope, we can do bot programming, but not MW programming.  If the EC feels like the namespace change is necessary, then I agree it significantly detracts from the desirability of this proposal on both ends.  As to the question "why this subject", again, it's a subject that I'm interested in and that I am willing to devote time and resources to.  As to the proposal of doing 100 at a time, I'd suggest the minimum is 1000 to make it worth everyone's effort.  Again, the point is not that ''I'm'' going to develop these articles, but that these are stubs to draw new editors in.  If the EC feels that this effort is not likely to bring new editors in, or that it's not worth having the "cruft" laying around, then that would be a good reason to pass on the proposal.  [[User:Andrew Su|Andrew Su]] 23:31, 22 May 2008 (CDT)
+::::To get an idea what kind of information could be similarly harvested by article creation bots (and on what scale), [http://en.wikipedia.org/wiki/Wikipedia:Size_comparisons#Size_of_other_information_collections WP's size comparison of information collections on the web] might be of interest. -- [[User:Daniel Mietchen|Daniel Mietchen]] 04:52, 23 May 2008 (CDT)
 ==Straw poll==
@@ Line 164: / Line 178: @@
 {{Proposals navigation}}
+</div>

CZ:Proposals/Create a page for all notable genes in the human genome: Difference between revisions

Latest revision as of 02:45, 8 March 2024

Contents

Complete explanation

Past discussions

Previously-raised issues

Reasoning

Implementation

Discussion

Why this initiative requires a recruitment component

Bot-assisted article creation?

How about a test page?

What distinguishes genes from all other categories

Straw poll

Support

Oppose

Navigation menu

CZ:Proposals/Create a page for all notable genes in the human genome: Difference between revisions

Latest revision as of 02:45, 8 March 2024

Complete explanation

Past discussions

Previously-raised issues

Reasoning

Implementation

Discussion

Why this initiative requires a recruitment component

Bot-assisted article creation?

How about a test page?

What distinguishes genes from all other categories

Straw poll

Support

Oppose

Navigation menu

Search