Just a quick post, meant to be fun, not to insult anyone. Because we can all use a little laugh in times of stress.
Update March 18:
Yay! We got 40 new people interested in helping out with the blog, which means that the volunteers only need to post 1x per month. It will also mean that the posts might be a bit more variable from day to day, or that there might be an error here and there. Please be kind to all the new people as they are learning to post. This is all volunteer work and I am so grateful for their help.
****** Original post *******
Do you love #MicrobiomeDigest.com and/or #SciCom ? MicrobiomeDigest.com needs 10 to 20 more volunteers for the daily updates! Without new volunteers we will have to shut it down. The more volunteers we have, the fewer times per month each volunteer will need to post. With 20 volunteers total, each person will do a post about 2x per month. It takes 2-3 h to do a post, and you can pick the days you want to post using a sign-up calendar, so it is very flexible.
If you sign up, you will also have the opportunity to use this platform to write your own posts. Anything goes, as long as it is related to microbiology or science. All volunteers have authorship rights, so you can write and upload your own posts as often as you like.
Volunteers need to have some background knowledge of microbiology/microbiome, and be able to screen lists of PubMed or publisher’s alerts for relevant scientific papers.
If you are interested, shoot me an email at eliesbik on the gmail platform (you got this!). In that email, include a short paragraph about yourself. No CV/Resumes are needed, just the willingness to contribute about once or twice a month. Ability to follow these instructions are part of the selection process 🙂
This post has nothing to do with microbiology, but everything with my love for maps. It is about the former Hetch Hetchy Railroad.
Hetch Hetchy reservoir is the main source of drinking water for the city of San Francisco and other parts of the Bay Area. The densely populated SF Bay Area only gets rainfall in the winter months and is dependent on reservoirs in the Sierra Nevada mountains for the collection and storage of rain and melting snow water for the dry summer months.
The Hetch Hetchy area once was a glacier-formed valley of what is now Yosemite National Park, California. It is located in the Northwestern part of the national park, far away from the much more famous Yosemite Valley. Hetch Hetchy Valley was equally beautiful, but it was turned into a reservoir, an artificial lake, in 1923 upon the completion of the O’Shaughnessy Dam.
Here are two photos comparing Hetch Hetchy valley before and after the dam was completed, taken from roughly the same position.
As of today, the Hetch Hetchy Project consists of the reservoir, the dam, hydroelectric plants, and a long aqueduct that carries the water to the SF Bay Area through a long series of tunnels, using only gravity.
The story about the Hetch Hetchy Railroad, built in the 1910s and 1920s to bring the construction crews and materials needed to build the dam, hydroelectric plants, and aqueducts is very interesting if you like history and maps. After completion of the dam in 1923, the HHRR was used to carry tourists and mail to the Northern Yosemite area. However, the steep terrain, sharp curves, and heavy snowfall made it very hard and expensive to operate the railroad. During World War II, HHRR rail materials such as steel and wood were used for war operations, and the railroad was finally completely dismantled in 1949.
Friends of us who live near an area where the railroad used to run told us about the success and decline of the railroad, and pointed us towards some sections where you can still see remnants of how the tracks ran. Even though the rails and wooden cross-ties are now all gone, you can still see the flattened track at some sections. Close to the Hetch Hetchy reservoir, the road to access the lake now runs over the railroad track.
Hearing about the now-gone railroad got me interested in mapping the complete course of the railroad. I first searched online and found a couple of railroad and other sites that mentioned it.
The information I found online was not enough detailed to know how the HHRR exactly ran. Luckily, our friends lent us their copy of the book Yosemite’s Hetch Hetchy Railroad by Ted Wurm. The author was part of a group of railfans who traveled the railroad around 1937 and took lots of photos. This book has a map of the railroad route, and several descriptions that allowed me to start to map the exact position of the tracks.
The Groveland Yosemite Gateway Museum also has a small exhibition on the Hetch Hetchy Railroad. It consists of several photos and a topographical map in which a person called Bill Dagg draw the course of the railroad by hand.
These two maps by Ted Wurm and Bill Dagg were a lot more detailed than what I could find online. I was very excited to see how much work these two men had put into mapping the railroad.
Since I could not find an electronic version of the railroad’s course online, I decided to make a KML file myself. KML stands for “Keyhole Markup Language“, and is a way to visualize geographic data in – for example – Google Earth. Google Earth is the advanced version of Google Maps and it allows you to add extra layers. An easy way to make a KML file is to draw a line or a polygon on top of the map displayed in Google Earth.
In order to draw the map, I downloaded Google Earth Pro. There is also a Google Earth Web version that runs in Chrome and that can display KML files, but it did not allow me to make one or edit points.
Using Google Earth satellite views and the maps from Ted Wurm’s book and Bill Dagg’s map from the Groveland Museum it was easy to spot some parts of the now-dismantled railroad. In some places, the railroad is now an unpaved or paved road, while in other spots the old foundations are still visible in the landscape.
I was also very excited to find some historical topographical maps on the website of the USGS from around 1947, just before the railroad was removed. These maps are available as KMZ (zipped KML) files, so you can just overlay them in Google Earth. This allowed me to draw the railroad even more precisely.
This was a fun project to do in Google Earth. In some cases, the historic topographical maps were a bit shifted from the current Google map, and in other cases new roads were put over the historic railroad, so I had to make an educated guess where the tracks had been. But in most spots it was easy to see the old tracks from the satellite images.
So here it is, a link to the historic Hetch Hetchy Railroad route (KML file) as drawn by me in Google Earth. If you click on the link it will show you the route in Google Maps.
Or, if you want to edit, download the KMZ file to display in Google Earth. Enjoy!
One thing that struck me as amazing is how beautiful the incline of the trail is over certain areas. Look at this elevation profile that Google Earth draw over the complete 68-mile route. In several places, the tracks go up or down a nearly perfect 4 degrees. What a great piece of 1920s engineering that was!
Another ThreadReader unroll of a recent Twitter thread that I did.
I got several questions from other scientists who all are interested in how to detect science misconduct and how to report. So here is a new thread.
First of all, this is a risky business. Reporting misconduct close to home (e.g. in your own lab or by a close collaborator) might damage not only other people’s careers, but also risk your own job, especially if you are early career.
Second, you need to be able to objectively word your concerns. Yelling “misconduct!” is not going to bring you very far. You have to stick to facts.
“These 2 protein bands look unexpectedly similar” is good.
“These 2 protein bands have been copied and pasted” is subjective.
Third, you have to be patient. If you report a paper/set of papers to a journal or institute, the investigation might take years to complete. Of the 1016 papers, I reported to journals in 2014 and 2015, 54 have been retracted, 181 have been corrected. The rest? … crickets.
Let’s assume that after these 3 disclaimers, you are still interested. There are a couple of extreme scenarios. Maybe you are generally interested in how to spot misconduct and how to respond. Or maybe you suspect misconduct in your lab and are not sure what to do.
If you are generally interested in cases of science misconduct, there are a couple of places you can start:
@Pubpeer is the place where people can comment on a paper – anonymously or signed – positive or negative. Check there regularly for the types of comments people leave.
Also, @PubPeer has a great Chrome plugin that will flag papers (based on DOI) on e.g. Pubmed, so you can see that in your literature searches. It does not seem to work with Google Scholar, unfortunately.
Here is how the Chrome extension will work with Pubmed searches. It will show which papers have comments in @PubPeer .
If you found a paper and you have concerns, you can leave a comment yourself, either under your full name (not recommended when you are just starting) or anonymously (will be moderated, so comments don’t appear immediately).
Make sure to remain objective in your comments. Again, stick to wording such as “unexpected similarities” or “sharp transition between two adjacent bands” instead of “cloned” “fabricated” “manipulated” etc. Assume there is a slight chance that it was an honest mistake
Most of the problems that are found in biomedical papers are potential duplications of photographic images. With my coauthors @ACasadevall1 and @FangFerric, we wrote about those types of problems here (apologies for the weird associated photo):
The Prevalence of Inappropriate Image Duplication in Biomedical Research PublicationsInaccurate data in scientific papers can result from honest error or intentional falsification. This study attempted to determine the percentage of published papers that contain inappropriate image d…https://mbio.asm.org/content/7/3/e00809-16
But there are many other potential problems that you might spot in a paper. You can check for plagiarism by taking part of a sentence (5-8 words works well) between quotes, and searching in Google Scholar. See if you get a single result, or re-use of the sentence.
Definitions will of course give many results, so that is not plagiarism. Here is an example of a definition sentence that will give many results in Scholar: “”Probiotics are live organisms that, when administered in adequate amounts”. That is NOT plagiarism.
But multiple hits in Google Scholar with a sentence such as “”Some of these health problems include bone loss, muscle atrophy, cardiac dysrhythmias” could be cause of concern, especially if there are many other sentences in a paper with multiple hits.
Posting on Pubpeer is one thing, but the conventional – albeit much more slow and ineffective – way of reporting papers with concerns is to write to the Editor of the journal in which the paper was published. Or to the institution in case of multiple papers by the same lab.
Most journals have information on their website with their contact information and their Editor in Chief. It is most effective to write to more than one email address (pick a couple of senior editors as well) so that there is more chance that a journal will actually respond.
I always write per email, not paper letters, as to leave a record. Unfortunately, some journals make it very hard to find their contact information. You might have to cyberstalk the editors and search Pubmed publications or faculty pages for their email addresses.
If there are multiple problematic papers, you can also report to the institution / university. There might even be multiple institutions, if a person moves from lab to lab. Search for “Research Compliance” or “Research Integrity” and the university’s name. It might be hidden.
How about if you suspect misconduct close to yourself (e.g. by a co-worker in your lab)? If you trust the PI, you could first raise it with them. If not, you could report it to the Research Integrity office of your university. Write them an anonymous (paper) letter or email.
Unfortunately, it is very risky to write under your own name. Research Integrity officers might promise you anonymity, but might reveal your name to the defendant in a later stage of the investigation. This has happened to me and it sucks.
That is just some general advice that I have about how to spot and report cases of misconduct. Happy to talk more about this in a different thread.
I have decided to take (at least) a year off from paid work to focus on my research integrity work. Since 2013, I have worked on finding plagiarism and image duplication in scientific papers. Every free minute I searched for papers, made reports highlighting the potential problems, and wrote to journals and institutions about these concerns.
Together with Arturo Casadevall, Ferric Feng and other co-authors, we published three papers about our work on the frequency of image duplication in biomedical papers. You can find them here:
As of today, I have reported 85 papers and theses for extensive plagiarism, and over 1200 papers for potential image duplication. But I have many more that I still need to report, and I am getting more and more requests for help, and the work – all done in my spare time – was piling up.
So after working for 2.5 years in microbiome startup companies, I have decided to take at least a year off – maybe longer – so that I will have much more time to work on this project. I will still closely follow the microbiome field, of course, as well.
Thank you all, for your support!
I am taking a year off from paid work to focus more on my science misconduct volunteer work. Science needs more help to detect image duplication, plagiarism, fabricated results, and predatory publishers.
Most of the work detecting these problems in science papers is done by volunteers like me. It takes perseverance and patience. Many journals, authors, and academic institutions will not take action.
Even if they respond, It might take years before papers with serious flaws are corrected. All that time, those papers are not flagged by the journals, and others researchers might cite them or base their research on them.
As of now, we can only flag papers on @PubPeer and install their plugin so you can see which papers have a comment, e.g. when doing literature searches.
And I will still write to journals or institutions about all papers with concerns that I found so far. Even if it takes hours to find their contact info.
I still have 100s of papers that I need to officially report and 100s of reported ones to follow up on. The only way I felt I can catch up on that is to quit my paid job. Which is scary.
It would be nice if journals, institutions, funding agencies, and countries would care more about the quality of their research. If they had more guts to respond to concerns raised by readers – and take action.
The work that volunteers like us do is not very rewarding obviously. No one likes criticism. It can also be dangerous. Authors might start personal attacks on us and sue us for libel.
I am also very aware of the collateral damage, e.g. of coauthors who did not commit misconduct, others workers in those labs, and the effects on family members.
With that in mind, it is important to focus on facts, on the potential problems in the papers and how to address those. The focus should be on the papers, not the authors.
I might make an exception in cases of authors using false affiliations and fake coauthors.
The more you dig into these cases the more other weird stuff you find. I can probably do this full-time for the rest of my life. Maybe I will.
An open peer review of a preprint paper by Hatch A et al. from Viome, posted on OSF Preprints, January 2019.
Full disclosure: I worked for Viome’s competitor uBiome from October 2016 – December 2018. I am currently an independent consultant.
Andrew Hatch, James Horne, Ryan Toma, Brittany L. Twibell, Kalie M. Somerville, Benjamin Pelle, Kinga P. Canfield, Guruduth Banavar, Ally Perlina, Helen Messier, Niels Klitgord and Momchilo Vuyisich*
Viome, Inc, Los Alamos, NM 87544, United States.
In a recent paper authored by Stanford researchers, biotech startups were criticized for not sharing their discoveries through peer-reviewed research studies. So in that light, it is great to see a biotech company such as Viome publish a study about their microbiome consumer product. Viome’s leaders have been very vocal about the superiority of their product – which is based on RNA transcription – over that of other microbiome consumer tests, which are based on DNA amplification and sequencing. But until this preprint came out, no research on the Viome product had been published. So I was excited to hear about this preprint!
It is important to first point out that this paper is not a peer-reviewed paper. It is a preprint, which means it is written as an academic paper, but it has not been peer-reviewed by other scientists. It is a first step, however, to share the work that Viome did to build a metatranscriptomics platform, and show some of their first results. I hope my comments will be useful in the process of getting this study published in a peer-reviewed journal.
The paper describes Viomega, Viome’s automated stool metatranscriptomics method that involves RNA extraction from stool samples, sequencing, and bioinformatics analysis. Let’s go over each of the sections.
The introduction of the paper is mostly stating how metatranscriptomics is superior to other techniques. It has a somewhat oversimplified table showing other methods (bad!) to their own method (good!).
For example, under “Method Biases”, the 16S Gene Sequencing column states “Heavily influenced by amplification method, but also sequencing quality, sample lysis, and bioinformatics“, where “heavily influenced” sounds a bit denigrating, while sequencing quality and bioinformatics are conveniently left out in the Metatranscriptomics column.
The “identifies all living organisms” appears oversimplified as well. First, is a phage or virus alive? Then, how about a bacterium that is in a viable-non-culturable or spore-state? It is alive, but it is probably not transcribing much RNA – can metatranscriptomics detect those?
The statement that Metatranscriptomics “allows assessment of pathway activities that can lead to personalized health insights and recommendations with molecular-level precision” is overly subjective and is not proven in this paper. That last part of the sentence sounds like a Viome commercial, not like something that belongs in a scientific paper.
The introduction text states that 16S gene sequencing misses archaea or eukaryotes, but fails to acknowledge that most of these can be identified with broad-range primers as well.
In short, the Introduction shows a too black- and white comparison between different microbial community methods that is not very objective.
Not unexpectedly from a biotech company, the paper does not provide a lot of technical details on sample extraction, library preparation, or bioinformatics analysis. In order to pass peer review, the authors will likely need to provide more details on their methods – so that others can easily replicated them. The statements about participant consent and IRB approval were also very short; most journals and peer reviewers would like to see something more than “all study procedures were approved by an IRB” from a non-academic institution.
The first part of the Results shows that the Viomega method can detect a range of different microorganisms with relative equal efficiency and precision. Using mock communities, the results are accurate, and reported to contain no false positives or negatives or sample-to-sample crosstalk. However, data given here were very sparse.
Figure 4 is a colorful representation of the mock community sequencing experiment, but is lacking percentages and other details, such as number of reads. Where there really no false positive or false negative reads? Not a single one? There is also no word on how the negative controls did perform over their tests of 10,000 samples. It would have been really valuable to have compared this mock community using all three methods compared in Table 1; 16S, metagenomics, and metatranscriptomics.
This section of the Results also shows that the Viome test is reproducible. Three different experiments were performed in which small numbers of participants (3 to 7 persons) collected stool samples testing the following:
In all three experiments, samples from the same stool specimen or individual were very similar to each other, showing that the Viome test results are reproducible (the dark squares along the diagonal in Figures 5, 6, 7).
Figures 5/6/7 show the test results in two different ways. The A panels in each figure are based on which active microbes are present in the samples, so based on the taxonomy of the RNA reads. The B panels are the results based on gene function composition (which genes are being expressed in that sample).
It was interesting to see that in all cases the microbial composition (taxonomy; A panels) was better able to tell individuals apart than the gene expression (B panels).
Detailed explanation: If you look at the A panels, the comparison of each person to other samples from the same person shows they are very similar (purple) while each individual is very different (light blue) from other persons. In the B panels, the individuals do not differ that much from each other; all individual-to-individual comparisons are darker shades of purple that are more difficult to tell apart.
That is a bit ironical, because the preprint states at the beginning (Abstract and Table 1) that functional gene analysis, not microbial composition based on taxonomy, is needed for personalized health insights. Instead, the paper appears to show that each person has their own, personal gut microbiome, while the functional capacities appear to be pretty similar between individuals.
The paper does not provide an answer how these small person-to-person variations in microbial gene expression will lead to the “goal to develop personalized nutrition algorithms” as stated in the Abstract.
Metatranscriptomics is superior to other techniques, stated the paper in the Introduction and Table 1, but the paper does not do anything to prove that. There was no comparison of 16S sequencing vs metatranscriptomics on e.g., the mock community shown in Figure 4. Such a comparison could add a lot of value to the paper, and could support the bold statements made in the Introduction.
Also, the amount of viruses, archaea, and eukaryotes in the sample set was not very high, suggesting that 16S sequencing does not miss as much diversity as the authors claim in the introduction. For example, strain level analysis (Table S1) shows crAssphage as the most prevalent virus, but it is ranked #144 of all taxa (26%). Similarly, Methanobrevibacter clocks at #269 (14%), while Entamoeba is the most prevalent eukaryote at #336 (10% of samples).
Many of the viruses/phages found in the stool samples appear to be plant associated (Suppl tables). For example, Phaseolus vulgaris endornavirus (beans), Pepper mild mottle virus, Cannabis cryptic virus, and shallot latent virus are among the most prevalent viruses. Could the authors comment on this? Could these be transient microbiome components that were part of a food item, or are they present in the same person over longer periods of time? It would be nice to see some data on that.
Several taxa appear to behave very differently on strain/species/genus level (Tables S1-S3). Examples where genus-level prevalence is much lower than expected based on strain prevalence are given below. In other examples, taxa with high prevalence at genus level seem to disappear at species or strain level; also given below.
Unfortunately, relative abundance data appears to be missing. How abundant are viruses, archaea and eukaryotes ? Which percentage of the transcriptomics reads were assigned to each of these groups? These numbers, now missing, would tell us which groups 16S sequencing would miss and would allow for a better comparison of different microbial community analysis tools.
The paper contained very few functional results on the data from 10,000 Viome samples. Viome has repeatedly claimed that transcriptomics are much more informative than just lists of microbial taxa, and the abstract promises “several small clinical studies to demonstrate the connections between diet and the gut metatranscriptome.”
Therefore, it was disappointing to see that the paper was mainly limited to taxonomic assignments. This dataset, which is one of the largest of its kind, sounded very promising, and I had hoped to see many more functional analyses on these samples.
Table 4 is the only part of the study that analyzes the functional capacity of the data from 10,000 samples. It lists the top 12 KEGG functions (although the legend says “top 10”; see below). Unfortunately, this table is not very informative and might contain some errors (see below). Most importantly, it is just a list of genes without any discussion on their function.
In addition, if the prevalence of these genes is over 99.9% (meaning that almost every person’s microbiome contains those genes), how can one use that for the correlation of microbial taxa or genes with lifestyle or diet? Based on the top 10 or top 100 (Table S4) KEGG functions, all subjects’ microbiomes appear to have the exact same genes. How is this functional data superior to the big inter-individual differences that are found by the more conventional 16S types of microbiome analyses? The paper would be much stronger if this important point would be discussed.
In summary, the strengths of this paper are the experiments showing that the Viomega technique is reproducible. That is a great paper by itself, but the current title and abstract are promising much more than the paper currently delivers. With a transcriptomics dataset of 10,000 stool samples, and repeated claims that transcriptomics will give much more functional insight than just a list of taxa, it is disappointing to see that the results are limited to microbial taxon prevalence and not any functional analysis.
Based on this paper, Viome’s claims that their test can connect the microbiome transcriptome to a subject’s diet – let alone give dietary advice – appear to be very far-fetched.
Examples where genus-level prevalence is much lower than expected based on strain prevalence:
Strain: Eggerthella lenta 1_1_60AFAA, at 97.08% prevalence
Species: Eggerthella lenta at 91.76%
Genus: Eggerthella at 61.12%
Strain: Veillonella dispar A at 92.34
Species: Veillonella dispar at 88.99%
Genus: Veillonella at 78.62%
Strain: Entamoeba nuttalli P19 at 9.85% / Entamoeba dispar SAW760 0.74%
Species: Entamoeba nuttalli: 9.87% / Entamoeba dispar 0.74%
Genus: Entamoeba: 0.79%
In other examples, taxa with high prevalence at genus level seem to disappear at species or strain level:
Strain: Lactococcus piscium MKFS47 at 0.99% / Lactococcus raffinolactis NBRC 100932 at 0.23%
Species: Lactococcus piscium at 0.98%
Genus: Lactococcus at 96.88%
Strain level: Escherichia coli isolate 15 at 37.8%; Escherichia coli strain LS5218 at 9.5%, Escherichia coli M17 at 7.7%.
Species level: Escherichia coli at 27.40%
Genus level: Escherichia: 85.92% prevalence
Strain: Saccharomyces sp. ‘boulardii’ strain unique28 is present at 1.82% of samples, as the most prevalent Saccharomyces strain. Also Saccharomyces cerevisiae S288C at 0.80%.
Species level: S. cerevisiae is present in 3.9% of samples.
Genus level: Saccharomyces it is present in 25.44%.
Strain: Streptococcus sp. 263_SSPC 5.56% / Streptococcus mutans U138 at 1.62%
Species: Streptococcus thermophilus 28.15% / Streptococcus mutans 9.92% / Streptococcus parasanguinis 7.54%
Genus: Streptococcus 90.47%
Strain: Salmonella enterica subsp. enterica strain SE696A 2.30%; Salmonella enterica subsp. enterica strain ADRDL-LA-5-2013 2.28% / Salmonella enterica subsp. enterica serovar Typhimurium strain 1.52%
Species: Salmonella enterica 11.71%
Genus: Salmonella 15.61%