CZ:Proposals/Create a page for all notable genes in the human genome: Difference between revisions

From Citizendium
Jump to navigation Jump to search
imported>Andrew Su
imported>Jitse Niesen
(remove driver)
(22 intermediate revisions by 5 users not shown)
Line 1: Line 1:
{{proposal assignment|Edit}}
<div class="boilerplate metadata" style="background-color: #dedaca; margin: 2em 0 0 0; padding: 0 10px 0 10px; border: 1px solid #AAAAAA;">
 
:''I suppose it's time that this proposal be official withdrawn.  Archiving the text below in case the proposal ever gets resurrected in the future. [[User:Andrew Su|Andrew Su]] 20:10, 14 July 2008 (CDT) ''
 
 
{{proposal assignment|Dless}}
At first sight, this seems to be something that the relevant workgroups (Biology and Health Sciences) can decide by themselves. However, the proposal may easily create a precedent with wide-ranging implications, for instance on what type of stubs are acceptable, and whether we want a bot write a large number of articles. For that reason I think it's best that the full Editorial Council decides.
At first sight, this seems to be something that the relevant workgroups (Biology and Health Sciences) can decide by themselves. However, the proposal may easily create a precedent with wide-ranging implications, for instance on what type of stubs are acceptable, and whether we want a bot write a large number of articles. For that reason I think it's best that the full Editorial Council decides.




'''Driver:''' [[User:Andrew Su|Andrew Su]]
'''Driver:''' None


== Complete explanation ==
== Complete explanation ==
Line 119: Line 124:
=== Bot-assisted article creation? ===
=== Bot-assisted article creation? ===
Andrew (or any of the tech-minded folks here), would it be possible to insert such bot-generated content "on the fly" (similar to the metadata) along with the manual creation of the respective CZ article? I envision a list of topics (currently, but not in the long term, restricted to genes) on which a template, e.g. {{tl|Gla}} for "Gene List Assistant", is applied to the individual entries (much like {{tl|Rpl}}) is used on [[CZ:Core_Articles/Biology]] and thelike), thus signaling (e.g. by icons like those in the [[Image:Level4.jpg|10px]] [[Image:Level3.jpg|10px]] [[Image:Level2.jpg|10px]] [[Image:Level1.jpg|10px]] [[Image:Level0.jpg|10px]] series) that '''bot assistance for fact picking is available for the creation of an article on this particular topic'''. Such a treatment could perhaps lure new contributors in, especially if applied to lists like [[Special:Wantedpages]] or [[:Category:Definition Only]]. Similarly, one could think of a corresponding {{tl|Cla}} template for chemicals, {{tl|Sla}} for stars, {{tl|Spla}} for species and so on as new bots become available. -- [[User:Daniel Mietchen|Daniel Mietchen]] 10:45, 20 May 2008 (CDT)
Andrew (or any of the tech-minded folks here), would it be possible to insert such bot-generated content "on the fly" (similar to the metadata) along with the manual creation of the respective CZ article? I envision a list of topics (currently, but not in the long term, restricted to genes) on which a template, e.g. {{tl|Gla}} for "Gene List Assistant", is applied to the individual entries (much like {{tl|Rpl}}) is used on [[CZ:Core_Articles/Biology]] and thelike), thus signaling (e.g. by icons like those in the [[Image:Level4.jpg|10px]] [[Image:Level3.jpg|10px]] [[Image:Level2.jpg|10px]] [[Image:Level1.jpg|10px]] [[Image:Level0.jpg|10px]] series) that '''bot assistance for fact picking is available for the creation of an article on this particular topic'''. Such a treatment could perhaps lure new contributors in, especially if applied to lists like [[Special:Wantedpages]] or [[:Category:Definition Only]]. Similarly, one could think of a corresponding {{tl|Cla}} template for chemicals, {{tl|Sla}} for stars, {{tl|Spla}} for species and so on as new bots become available. -- [[User:Daniel Mietchen|Daniel Mietchen]] 10:45, 20 May 2008 (CDT)
:Hi Daniel, interesting idea...  I definitely see the advantages of this type of system.  (Although truth be told, I'd still be in favor of creating the stubs outright, since I don't think this template system will reel in my hypothetical naive molecular biologist above.)  Feasibility would definitely depend on the input of a MediaWiki expert (which I am not), but I'm confident that if those issues were resolved, we could adapt things on the bot side to make it work.  Cheers, [[User:Andrew Su|Andrew Su]] 12:32, 20 May 2008 (CDT)
:Hi Daniel, interesting idea...  I definitely see the advantages of this type of system.  (Although truth be told, I'd still be in favor of creating the stubs outright, since I don't think this template system will reel in my hypothetical naive molecular biologist above.)  Feasibility would definitely depend on the input of a MediaWiki expert (which I am not), but I'm confident that if those issues were resolved, we could adapt things on the bot side to make it work.  Cheers, [[User:Andrew Su|Andrew Su]] 12:32, 20 May 2008 (CDT)
::As a biologist i think that Andrew is correct that this would draw in biologists to edit citizendium. Some may only make a few edits but others may stay.  Especially attractive is the fact that CZ has eduzendium which serves as an excellent platform to aid teaching in the new active learning envirnoment that is becoming very popular and supported by [http://www.hhmi.org/news/college20080422.html Howard Hughes] and the [http://www.lifescied.org/cgi/content/full/3/4/215 National Academy]. For me this whole proposal is about the future.  Right now it might not look great on paper for some of the reasons above.  But citizendium has a chance to be at the front of the next big wave here. At worst these pages lie dormant, potentially as orphaned subpages until their time in the limelight arrives. [[User:Chris Day|Chris Day]] 13:27, 20 May 2008 (CDT)
::As a biologist i think that Andrew is correct that this would draw in biologists to edit citizendium. Some may only make a few edits but others may stay.  Especially attractive is the fact that CZ has eduzendium which serves as an excellent platform to aid teaching in the new active learning envirnoment that is becoming very popular and supported by [http://www.hhmi.org/news/college20080422.html Howard Hughes] and the [http://www.lifescied.org/cgi/content/full/3/4/215 National Academy]. For me this whole proposal is about the future.  Right now it might not look great on paper for some of the reasons above.  But citizendium has a chance to be at the front of the next big wave here. At worst these pages lie dormant, potentially as orphaned subpages until their time in the limelight arrives. [[User:Chris Day|Chris Day]] 13:27, 20 May 2008 (CDT)
:::Good point there Chris--that's an angle I hadn't approached.  Maybe it's just not that damaging to CZ to have "cruft" in the form of dormant gene articles.  I think I'd buy that.  The whole strength of my own arguments rests on the assumption that unmaintained gene articles would be cruft.  But if we can reconcile ourselves to the view that they aren't, then I'd support it.
:::Good point there Chris--that's an angle I hadn't approached.  Maybe it's just not that damaging to CZ to have "cruft" in the form of dormant gene articles.  I think I'd buy that.  The whole strength of my own arguments rests on the assumption that unmaintained gene articles would be cruft.  But if we can reconcile ourselves to the view that they aren't, then I'd support it.
:::I don't mean to deny that we would get some more biologists on board.  I'm sure we would; I just doubt it would be enough to develop a lot of gene articles ''en masse.''  But if that's not a problem, importing the articles as a means of attracting ''some'' more biologists might not be a bad idea. --[[User:Larry Sanger|Larry Sanger]] 15:07, 20 May 2008 (CDT)
:::I don't mean to deny that we would get some more biologists on board.  I'm sure we would; I just doubt it would be enough to develop a lot of gene articles ''en masse.''  But if that's not a problem, importing the articles as a means of attracting ''some'' more biologists might not be a bad idea. --[[User:Larry Sanger|Larry Sanger]] 15:07, 20 May 2008 (CDT)
:::Another technical question: Would it be possible to have such high numbers of subpages for few articles, like [[Gene]]? If so, we could place the contents there and make it easy to start an article on any given entry (e.g. by having an "update bot text" button). Still, Random pages and perhaps other things would have to be remodeled, with side effects to be expected. -- [[User:Daniel Mietchen|Daniel Mietchen]] 03:07, 21 May 2008 (CDT)
:::: If we're really going to have tens of thousands of these pages, why not give them their own [[CZ:Namespaces|namespace]]? It seems to me that a clearly defined major project like this might be a perfect application for that. I'd have to look at the code, but I suspect that RandomPages, etc would probably not go into non-main namespace pages. [[User:J. Noel Chiappa|J. Noel Chiappa]] 07:31, 21 May 2008 (CDT)
:::::<u>Daniel</u>, I'm not sure this answers your question, but I think I forgot to direct people to [[APP]], which is a page I created manually that shows all the different possible types of auto-generated content.  Not exactly sure what you mean by "make it easy to start an article on any given entry".  Are you proposing that we have this content available on subpages but not visible until a human editors "activates" it?  <u>Noel</u>, how would you propose handling these articles as they evolve?  For example, [[insulin]] and [[innexin]] appear to be developing quite well in the main namespace.  Would those get moved to the alternate namespace?  Thanks for the feedback and ideas.... [[User:Andrew Su|Andrew Su]] 08:47, 21 May 2008 (CDT)
::::::Yes, this means "that we have this content available ''somewhere'' but not visible ''in the main namespace of CZ'' until a human editor ''activates'' it." What I meant by "make it easy to start an article on any given entry" is that bot contents (brief summary, protein structure, references or whatever is available) can be generated ("on the fly" scenario) or retrieved (subpage/namespace scenario) upon creation of such an article in the main namespace, perhaps along with the creation of pre-formatted metadata, including a definition. It would also be cool if intrawiki links (e.g. to proteins or other genes, to diseases or species) could be suggested by the bot. -- [[User:Daniel Mietchen|Daniel Mietchen]] 10:53, 21 May 2008 (CDT)
:::::::Hi Daniel, I'm less enthusiastic about hidden-until-activated pages because it partially detracts from the potential benefit of my hypothetical naive biologist.  Personally, I think the vast majority of editors get hooked because they found an existing page and wanted to improve it, rather than finding nothing and wanting to create.  Your proposal sounds like something in between those two extremes...  [[User:Andrew Su|Andrew Su]] 14:49, 22 May 2008 (CDT)
::::::::I agree that hidden is less attractive. It seems to me that there are three main reasons for not doing this, 1)  maintainability, 2) the random page problem and 3) not encyclopedic (i.e. could do it for every chemical).  Personally I disagree with these three issues as reasons not to proceed.  The first two appear to be more short term oriented objections.  For me, in the long term, these will not be major problems, even if they are at the present. The last one is trickier but I don't find the analogy exactly true. Biology is so young compared to chemistry and the nature of genes in development and disease is a massively important question in biology and we have barely scraped the surface. Could this be said for chemicals? (not rhetorical)  Also the number of chemicals is far greater than the number of genes, and this proposal is only suggesting notable genes.
::::::::Who said "''Build it and they will come''"? That seems like the right attitude for a project like this that is trying to establish it's place in the i-world. The worst case scenario is we have a lot of clusters that sit dormant for a while as the project gets rolling.  But are they really dormant?  How many scientists will take a peek? Possibly pass on the word of an interesting new forum to bring together data that is spread all over web? With no pages we know this will not happen. With the pages something might happen. Persoanlly I see no reason not to take a risk and see what develops. [[User:Chris Day|Chris Day]] 15:05, 22 May 2008 (CDT)
=== How about a test page? ===
Given the current stage of the discussion, I assume it would be insightful to test the bot assistance on a small scale, i.e. [[CZ:Proposals/Create_a_page_for_all_notable_genes_in_the_human_genome/Testpage|one page]] limited to 400 gene names, formatted by {{tl|Gla}}. I am currently drafting an initial version for this template, based on {{tl|Rpl}}. Help is very welcome. Also, if any of you have a suggestion for an icon that signals availability of bot assistance (or flavours thereof - e.g. short or extended summary, with or without protein structure), please post the links to those image files here. Ideally, the template would then accept a click on the relevant icon as input and generate (or retrieve) the associated content. -- [[User:Daniel Mietchen|Daniel Mietchen]] 10:53, 21 May 2008 (CDT)
:Hi Daniel, pardon my inexperience, but I'm not exactly sure how {{tl|Gla}} and {{tl|Rpl}} work.  I can see the test page [[APP]] linked [[CZ:Proposals/Create_a_page_for_all_notable_genes_in_the_human_genome/Testpage|here]], but I'm not sure how you'd envision this all working in the end.  Would this page evolve/expand into an index of gene pages?  I think I'm missing something here...  Cheers, [[User:Andrew Su|Andrew Su]] 14:53, 22 May 2008 (CDT)
::Hmm, okay, a bit more poking around helped...  This would be a visual view of available content which would enhance your proposal above for the create-on-demand bot.  (One doesn't seem to depend on the other, right?)  Oh, and fyi, we have something planned that would be have like the [http://diberri.dyndns.org/cgi-bin/templatefiller/index.cgi?ddb=&type=hgnc_id&id=2475 Diberri tool], but that's still quite some ways off.  And, I'm not sure how we'd get it to work with subpages.  Anyway, that I think is also a very good create-on-demand option. [[User:Andrew Su|Andrew Su]] 15:01, 22 May 2008 (CDT)
:::I am basically quite sympathetic to bot-assisted fact-picking from any suitable database, and genes could be a good start. However, I agree with David in that bot-assisted content (though it can quickly get to high standards nowadays, as highlighted by diberri, [http://esciencenews.com/ Eureka science news] and other sites) is not necessarily what people look for in an encyclopedia, and I think an overwhelming mass of anything (genes, airplanes or sandpaper types) in the Random Articles might deter more non-specialists than it brings in specialists (not sure how much this feature is used, though, and by whom). Both the on the fly creation and transfer from a different namespace would alleviate this problem. I would be fine either way, and so would the test page. What I imagine in the automated part is that we have a list containing nothing else than <nowiki>{{gla|Gene_001}} ... to {{Gla|Gene_400}}</nowiki> (that's the quick part) to which the template (once functional) could then add contents that the bot has provided ''somewhere on CZ, perhaps not in the main namespace''. This content would include data for the page-to-be plus a definition like "''{{def|APP}}''" for [[APP]]. Once we have agreed on whether to put what bot-assisted information where, the template can be told to gather it there. More documentation on the template is at {{tl|Gla}}. -- [[User:Daniel Mietchen|Daniel Mietchen]] 18:49, 22 May 2008 (CDT)
::::Does anyone know how many people get to CZ via a web search to a specific article page, and how many people go directly to CZ and click the Random Article link?  This seems to be a point of great concern, and I'm curious what the numbers say.  [[User:Andrew Su|Andrew Su]] 19:16, 22 May 2008 (CDT)
== What distinguishes genes from all other categories ==
I just don't get it.  Am I missing something here?  I just don't see why genes, or "notable genes" are in any way distinguised from hundreds of other categories, each of which have publicly available databases.  Examples being  proteins, chemicals (or notable human metabolites), Nobel prize winners, dictionary words, types of pistons, types of airplanes, sandpaper types, Kings and Queens, and on and on.  What do those in favor of this project have to say that distinguishes this list of things from all of the miriad lists of other things?  I simply think this is just a list of things.  The fact that the science is young and lacks detail, as someone alluded to earlier, is not very convincing.[[User:David E. Volk|David E. Volk]] 17:06, 22 May 2008 (CDT)
:Hi David, no, I don't think you're missing anything.  I think we just have a different idea of how CZ would or would not benefit from this effort.  To speak to your examples above, I actually do think that many of those categories would be reasonable to create stubs for.  For example, a stub for all Nobel Prize winners would be fantastic (if there were a database of biographic information we could use to populate it).  Chemicals?  Yeah, I say do it for notable ones (maybe all FDA-approved drugs), again assuming there's enough to make the stub useful.  What differentiates the genes/proteins effort that I'm proposing is that I'm interested in genes and proteins, I'm spelling out exactly what kind of stub content is available, and I'm offering to do the work.  That in my book is the biggest difference.  (Again, I don't think the dictionary analogy holds.  CZ is an encyclopedia, not a dictionary.) 
:On slightly tangential note, I'm happy that there is some activity here again.  But if this is going to take an EC resolution to resolve this one way or another, it would help if there were actually EC members here to discuss and ask questions.  Then, "regular folks" like us can state our cases one way or another so the EC can make an informed decision.  So any EC members watching?  If not, then I think this proposal is destined to die by neglect...  (And scanning through the recent EC activity, gotta say, if EC has time to vote on a systematic recipe effort and not one on genes/proteins, yikes...)  Cheers, [[User:Andrew Su|Andrew Su]] 19:06, 22 May 2008 (CDT)  Apologies, just noticed that both David and Chris are EC members.  I withdraw my plea for greater EC involvement, and I'll leave it up to you guys when/if to formally propose it...  [[User:Andrew Su|Andrew Su]] 19:12, 22 May 2008 (CDT)
::Andrew, what I am asking is why should we devote progammer/developer time for this initiative vs others.  I think if we accept this proposal, there is no justification to deny all of the other lists.  This would seem to entail, at least initially, the create of a new name space so as to drop our average word count to 30.  There are so many data bases available these days.  As for the drug articles, I am actually doing them one at a time, up to perhaps 120 or more so far, with structures, etc, but there are at least 4000 drugs out there now.  So my arguments so far to date have basically been "why this subject".  Is it more worthy than others.  Are you going to do the programming?
::Could we do 100 at a time and then have you develope each one, then do the next 100? [[User:David E. Volk|David E. Volk]] 22:02, 22 May 2008 (CDT)
:::Nope, we can do bot programming, but not MW programming.  If the EC feels like the namespace change is necessary, then I agree it significantly detracts from the desirability of this proposal on both ends.  As to the question "why this subject", again, it's a subject that I'm interested in and that I am willing to devote time and resources to.  As to the proposal of doing 100 at a time, I'd suggest the minimum is 1000 to make it worth everyone's effort.  Again, the point is not that ''I'm'' going to develop these articles, but that these are stubs to draw new editors in.  If the EC feels that this effort is not likely to bring new editors in, or that it's not worth having the "cruft" laying around, then that would be a good reason to pass on the proposal.  [[User:Andrew Su|Andrew Su]] 23:31, 22 May 2008 (CDT)
::::To get an idea what kind of information could be similarly harvested by article creation bots (and on what scale), [http://en.wikipedia.org/wiki/Wikipedia:Size_comparisons#Size_of_other_information_collections WP's size comparison of information collections on the web] might be of interest. -- [[User:Daniel Mietchen|Daniel Mietchen]] 04:52, 23 May 2008 (CDT)


==Straw poll==
==Straw poll==
Line 134: Line 178:


{{Proposals navigation}}
{{Proposals navigation}}
</div>

Revision as of 05:39, 15 July 2008