CZ:Proposals/Create a page for all notable genes in the human genome: Difference between revisions

From Citizendium
Jump to navigation Jump to search
imported>Andrew Su
m (→‎Discussion: reformat -- hope that's okay)
imported>Andrew Su
Line 84: Line 84:


: Andrew, no disrespect taken, I agree that more voices are needed regarding this proposal, but like last time, it would seem only a handful of people care enough either way to chime in.  Making this material subpages, rather than Main domain articles, certainly improves its favorability.    I do wonder about maintainability.  Are you suggesting the bot runs out every few weeks for updates?  If so, will it kill the intervening human adjustments?  Do you envision a 1-time grab of data?  Perhaps the proposal description can be updated to reflect exactly what the actual plans are.  A catalog of genes might inspire people to write about the genes, afterwhich a specific page could be promoted up to the Main domain. [[User:David E. Volk|David E. Volk]] 09:55, 21 April 2008 (CDT)
: Andrew, no disrespect taken, I agree that more voices are needed regarding this proposal, but like last time, it would seem only a handful of people care enough either way to chime in.  Making this material subpages, rather than Main domain articles, certainly improves its favorability.    I do wonder about maintainability.  Are you suggesting the bot runs out every few weeks for updates?  If so, will it kill the intervening human adjustments?  Do you envision a 1-time grab of data?  Perhaps the proposal description can be updated to reflect exactly what the actual plans are.  A catalog of genes might inspire people to write about the genes, afterwhich a specific page could be promoted up to the Main domain. [[User:David E. Volk|David E. Volk]] 09:55, 21 April 2008 (CDT)
::All, I've added a few more details up above, but the key point to emphasize is that this very general proposal can be tailored however the CZ community sees fit.  Of course, as David alludes to, this requires some degree of enthusiasm by the biology workgroup.  I don't want to invest lots of time on this proposal without a clear consensus and community backing. [[User:Andrew Su|Andrew Su]] 12:56, 21 April 2008 (CDT)


{{Proposals navigation}}
{{Proposals navigation}}

Revision as of 12:56, 21 April 2008

This proposal has not yet been assigned to any decisionmaking group or decisionmaker(s).
The Proposals Manager will do so soon if and when the proposal or issue is "well formed" (including having a driver).
For now, the proposal record can be found in the new proposals queue.


Driver: Andrew Su

Complete explanation

This proposal will entail the automated creation of some number of pages relating to human genes and proteins. The exact number of pages and the criteria for selection are open to discussion (but generally this effort will only be worth it if that number is greater than 1000). These stubs will be seeded with content from the public domain. The types of information currently available to use in the stubs can be seen in the WP links below, and additional information can be added if we've missed any relevant databases.

Formatting of stubs is also up for discussion. The WP links below can serve as one template to work from. We also prototyped a layout that uses subpages at APP. The bot can be programmed to run every quarter or so to keep the gene pages in sync with the underlying databases. Importantly, all bot-maintained content will be clearly contained in bot templates, so no human edits to the rest of the page are in any danger of being overwritten. (See source of the WP pages for examples.)

Past discussions

This proposal was previously discussed in late 2007, so please read the threads below for context. (If there are questions which are not addressed there, please let me know and I will expand here.) The proposal was tabled at that time due to uncertainty around the licensing of CZ content.

  • Forum: [1]
  • Mailing list: [2] (all seem to be confined to September 2007 as linked here)

Previously-raised issues

  • License solved...
  • Does the existence of similar WP pages reduce (or increase) enthusiasm?
  • Does creation of up to 10k pages render the "Random page" link useless (or at least horribly biased)?
  • Are there enough biologists and geneticists so that this effort won't stagnate?
  • The driver is employed by a pharma company, so is impartiality a concern?
  • Would this effort be sufficiently different from the array of other "gene portals" out there?
  • Are gene stubs (which individually fall far short of encyclopedic articles) useful to CZ?

Reasoning

This would be a relatively unique tool in biology. All existing gene portals are "top-down" from centralized "gene annotation authorities". Of course, those authorities are a bottleneck for sharing new findings in a gene-centric web database. Looking several years down the road, I hope each of these stubs evolves into a gene-specific review article on every gene in the human genome.

I say relatively unique since one version of this project is now nearing completion at WP. Approximately 9500 pages are currently in existence (listed here: [3]). Some CZ users previously expressed concern about whether CZ should embark on this project given the similar/parallel WP effort , and that certainly should be discussed again below. (But note that this rationale was not the reason we did not move forward previously.)

Implementation

As I see it:

  1. Discuss to decide if this project is appropriate for CZ. (If no, end here.)
  2. Finalize a format, perhaps continuing the use of APP as the model.
  3. Find a willing programmer (ideally an undergrad or master's student interested in a cool project).
  4. Make the pages.

Assuming Step 3 doesn't drag on too long, I'd estimate that the whole project take approximately six months.

Discussion

A discussion section, to which anyone may contribute.

I am in no way knowledgeable in this area, but two things come to my mind after reading this. First, the Recent Changes issue. I agree with Larry Sanger that we should have some sort of exclusion code to prevent these from showing up. The whole purpose of the link (to me at least) is to get an interesting page on a different topic than one would normally read.

The second idea question is whether this would be better suited for a catalog. I know that would be a lot of pages, but having a central area to pick out what gene you want to look at doesn't seem like such a bad idea.

Well, there's my opinion on the matter. John Dvorak 16:08, 16 April 2008 (CDT)

I have the same arguments against this as previously.
  1. Almost nothing is known about most genes, so most pages will essentially be blank.
  2. People are much more interested in gene products, proteins, than the gene.
For example, one might look up insulin, but who knows what the name of the gene is for making it? How do you write an article about a gene? Gene so and so makes the protein TNF-alpha, then what? You might say it is upregulated or suppressed by other proteins, but what else? I will, however, admit that my interests and work may have clouded my thinking into one that is protein-centric. To me it seems natural to write articles about proteins, and add a link to GeneBank or another gene bank data set, much like we have been doing with the drug articles where we add links to DrugBank, MedMaster and an FDA site. David E. Volk 14:44, 17 April 2008 (CDT)
See the wikipedia version of the insulin gene. While proteins are the functional unit biologists often consider when thinking about physiology and biochemistry, if you're a developmental biologist or interested in human disease then the regulation of genes, and the various different alleles of a particular gene are of great interest. Bear in mind too that this project can be tailored to our needs.
While I agree that many of the articles will initially have minimal content will this not serve as a nucleus for content related to human biology? Also, i would imagine that most articles on proteins will not be specific to humans. Possibly these human gene articles could be more and have protein information specific to human diseases and physiology too, while directing readers to our articles that discuss the broader picture for a given protein. Chris Day 15:15, 17 April 2008 (CDT)
Regarding David's first comment, let me provide a few links to pages at WP as they were created by our bot, just to give people an idea of the range of stub content. Obviously, we can set the bar wherever we want here at CZ:
Regarding David's second comment, it's pretty rare over at WP to have separate pages about proteins and the genes that encode them. For example, consider this page. I would be happy to change all references from the "gene wiki" to "protein wiki"  ;) and/or incorporate any more protein-centric content that can be systematically harvested. As Chris alludes to, we are completely open to modifying the format according to what the CZ community feels is appropriate. Cheers, Andrew Su 15:45, 17 April 2008 (CDT)
Those articles are great examples of why I am against this proposal in general. It seems very odd to have a 1-3 sentence articles with 30 references listed. It looks and feels like an automated fact picker, but not an encyclopedia article. Most of the information in Gene Ontology in fact refers to what the protein does. Same for the Biological Process parts. The few things that can be said regarding up/down regulation could easily be put in a "Gene" subsection of the protein article. Finally, I notice that WP does not italicize genes like they should. Thus, the MutY protein is made by the MutY gene. David E. Volk 16:14, 17 April 2008 (CDT)
As I said above the articles could/should be about the human proteins and the gene. Re: MutY/MutY, does it matter what wikipedia does wrong with respect to our own proposal? Another thing to consider is that much of this information can be distributed into different subpages so potentially the creation of subpage content is the desired goal? Do we need an article to start a subpage? I ask this in all seriousness as we do have some subpages that exist without an article (related articles pages to date). Possibly this is a model to work with? Chris Day 16:22, 17 April 2008 (CDT)

Let me be more clear on the subject. Originally, this proposal started out as an automated procedure to make thousands of stub articles (now the word notable is added). That is my main point of contention. I have no objection to an author writing in great detail about a particular gene. For example, they might include a family tree for that gene showing some evolutionary fact, or discuss particularly important alleles and list the nucleotide changes that cause a particular disease. My previous analogy was to ask, "why don't we then just create a stub of every word in the largest dictionary available and wait for folks to come along later to fill in the data?" We could even list 30 dictionaries as references but still leave a stub article with just the word. Would that look like an encyclopedia article? I think the automated approach is the biggest problem.

If I ever write an article about the Ford Mustang, it will be about the car, not the blueprints at the factory, or even the older blueprints showing the evolution of the Ford Mustang over the years. I do realize that genes are very important to evolutionary biologists, but it is the end product that makes or breaks the baby. David E. Volk 16:45, 17 April 2008 (CDT)

But isn't this about populating subpage type material? While this is automated it seems like a helpful start. For me the blueprint analogy would be like having articles with the sequence of the gene alone. I agree that would not be useful. Also, I'm not sure the dictionary analogy is comparable as the links in this cases are to resources and information about the specific human gene/protein. Chris Day 17:35, 17 April 2008 (CDT)
David, am I correct in reading that your objection, at least primarily, comes down to whether or not stubs are useful to CZ? I agree with all of your comments above (except for the dictionary analogy). This proposal does in many ways boil down to "an automated fact picker", and no question that the output is not an encyclopedic article. In my mind, the question is whether the CZ community feels that those output pages (or something like them) are useful in the CZ context. The stub proponent in me says they are (since having minimally useful content draws experts who then contribute less-minimally useful content which then eventually evolves into an encyclopedic article), but I'm also open to the conclusion that CZ doesn't want to encourage mass stub creation. Andrew Su 17:59, 17 April 2008 (CDT)


Andrew, why is the dictionary analogy not correct? It would just require a larger group of experts to come and fill in the details. Likewise, we could write a bot to make stubs for 1 million known chemical compounds that are listed in the Chemical Abstract Service (CAS) and automatically pick up the melting points, molecular mass and so forth, and then wait for experts to fill in the data. We could have 1 million articles in a few days, all stubs.
I would be against the chemical bot too. Can you explain, in any way, why these two analogies are not the same as your proposal other than these analogies do not fit your area of knowledge? I would hate to see 90% of the content at CZ be stubs. I know you feel this proposal will bring in experts, but frankly, the experts already know to go to the sites that you will be copying data from. CZ will not be creating any knowledge with this proposal, just regurgitating it, at least for the very near term. The experts you seek are the people depositing their data into GenBank, etc. Your best bet would be to get some students involved so that a few sentences at least could be written for each article.
CZ is trying for quality over quantity, and this proposal is the opposite of our overall goal. Perhaps we could make a list of the hundred or so notable human genes, and then some of us in the biology/chemistry groups could then split the list and start writing articles aided by the bot. David E. Volk 08:25, 18 April 2008 (CDT)

Please bear in mind that if we move forward with this proposal, we will almost certainly have to twiddle with article lists, e.g., with Category:CZ Live, as well as with Random Page, etc. But suppose that all that content is in CZ, but we don't count it (the articles are not included in our "Live Articles" count). Does that affect the above argument(s)? --Larry Sanger 11:14, 18 April 2008 (CDT)

David, no disrespect intended, but I feel like we need more voices in this debate. It sounds like both you and I are very familiar with the pros and cons here. And while interesting, I don’t think the discussion above is bringing up any new issues that are likely to sway our individual views. I think we need more people in decision-making capacities (Editorial Council?) to chime in here to ask questions, express concerns, etc etc. Based on that input, we can summarize relevant pros and cons in a focused way. Absent interest (and ultimately enthusiasm) by others in the community, I’m happy to let this proposal die by neglect. (On that note, thanks Larry for chiming in. I'll let David reply as to whether that change would assuage his concern, but I think they have more to do with whether the stubs individually are sufficiently useful to be included in CZ on their own.) Cheers, Andrew Su 11:28, 18 April 2008 (CDT)
I think it is accurate to say that this work is being done better elsewhere, and the only absolute way to make this even remotely viable would to somehow convince all of the work being done on the databanks by all the contributors and the maintainers to shift their operations over to CZ, which has a very low probability of success. --Robert W King 11:40, 18 April 2008 (CDT)
Robert, there are two big limitations of current gene portals. First, while they are great at storing and displaying tag/value data ("structured content"), they are not so good at displaying free-text interspersed with images and graphics and tables ("unstructured data"). Second, existing gene portals are pretty exclusively one-way communication. If a reader notices an error or omission, there is no way for you to actively fix that problem. Of course, those two weaknesses dovetail nicely with two of the primary strengths of a wiki. So, we wouldn't be promoting a competitor to existing resources, but a complementary tool. Does that answer your concern? Andrew Su 15:23, 18 April 2008 (CDT)
Andrew, no disrespect taken, I agree that more voices are needed regarding this proposal, but like last time, it would seem only a handful of people care enough either way to chime in. Making this material subpages, rather than Main domain articles, certainly improves its favorability. I do wonder about maintainability. Are you suggesting the bot runs out every few weeks for updates? If so, will it kill the intervening human adjustments? Do you envision a 1-time grab of data? Perhaps the proposal description can be updated to reflect exactly what the actual plans are. A catalog of genes might inspire people to write about the genes, afterwhich a specific page could be promoted up to the Main domain. David E. Volk 09:55, 21 April 2008 (CDT)
All, I've added a few more details up above, but the key point to emphasize is that this very general proposal can be tailored however the CZ community sees fit. Of course, as David alludes to, this requires some degree of enthusiasm by the biology workgroup. I don't want to invest lots of time on this proposal without a clear consensus and community backing. Andrew Su 12:56, 21 April 2008 (CDT)

Proposals System Navigation (advanced users only)

Proposal lists (some planned pages are still blank):