Publications
scReady - an automated and accessible pipeline for single-cell RNA-Seq preprocessing: Empowering novice bioinformaticians [version 1; peer review: 1 approved with reservations]
Authors: Domenico Somma, William Haese-Hill, Fiona Achcar, Yiyi Cheng, Scott Arkison, Thomas D Otto
Wellcome Open Research •
Single-cell RNA sequencing is a powerful technology that lets researchers see which genes are active in individual cells, and its use is growing rapidly. Between generating the raw data in the lab and performing the final analysis that reveals cell function, the data must undergo an essential “cleaning” step called preprocessing. This step reduces noise and enhances the true biological signal. However, this process can be tedious, demands significant computing skills, and mistakes or skipped steps can negatively affect all later results and slow down research. Our team, which includes both laboratory and bioinformatics scientists, developed a new software tool called scReady to help other researchers preprocess their single-cell RNA sequencing data. scReady performs all the essential cleaning steps in a standardized way. Researchers provide their raw data, set the main parameters for their analysis, and run a single command. The tool then produces a clean, analysis-ready dataset along with a report that details the steps performed. The main benefit of scReady is that it makes single-cell data analysis more reliable and accessible, especially for researchers with limited computing experience. This has the potential to accelerate discoveries in biology and medicine by allowing more scientists to confidently use single-cell technologies.
RNAcare: integrating clinical data with transcriptomic evidence using rheumatoid arthritis as a case study
Authors: Mingcan Tang, William Haese-Hill, Fraser Morton, Carl Goodyear, Duncan Porter, Stefan Siebert, Thomas D Otto
BMC Medical Genomics •
Gene expression analysis is a crucial tool for uncovering the biological mechanisms that underlie differences between patient subgroups, offering insights that can inform clinical decisions. However, despite its potential, gene expression analysis remains challenging for clinicians due to the specialised skills required to access, integrate, and analyse large datasets. Existing tools primarily focus on RNA-Seq data analysis, providing user-friendly interfaces but often falling short in several critical areas: they typically do not integrate clinical data, lack support for patient-specific analyses, and offer limited flexibility in exploring relationships between gene expression and clinical outcomes in disease cohorts. Users, including clinicians with a general knowledge of transcriptomics, however, who may have limited programming experience, are increasingly seeking tools that go beyond traditional analysis. To overcome these issues, computational tools must incorporate advanced techniques, such as machine learning, to better understand how gene expression correlates with patient symptoms of interest.
paraCell: a novel software tool for the interactive analysis and visualization of standard and dual host–parasite single-cell RNA-seq data
Authors: Edward Agboraw, William Haese-Hill, Franziska Hentzschel, Emma Briggs, Dana Aghabi, Anna Heawood, Clare R Harding, Brian Shiels, Kathryn Crouch, Domenico Somma, Thomas D Otto
Nucleic Acids Research •
Advances in sequencing technology have led to a dramatic increase in the number of single-cell transcriptomic datasets. In the field of parasitology, these datasets typically describe the gene expression patterns of a given parasite species at the single-cell level under experimental conditions, in specific hosts or tissues, or at different life cycle stages. However, while this wealth of available data represents a significant resource, analysing these datasets often requires expert computational skills, preventing a considerable proportion of the parasitology community from meaningfully integrating existing single-cell data into their work. Here, we present paraCell, a novel software tool that allows the user to visualize and analyse pre-loaded single-cell data without requiring any programming ability. The source code is free to allow remote installation. On our web server, we demonstrated how to visualize and re-analyse published Plasmodium and Trypanosoma datasets. We have also generated Toxoplasma–mouse and Theileria–cow scRNA-seq datasets to highlight the functionality of paraCell for pathogen–host interaction. The analysis of the data highlights the impact of the host interferon-γ response and gene expression profiles associated with disease susceptibility by these intracellular parasites, respectively.
Annotation and visualization of parasite, fungi and arthropod genomes with Companion
Authors: William Haese-Hill, Kathryn Crouch, Thomas D Otto
Nucleic Acids Research •
As sequencing genomes has become increasingly popular, the need for annotation of the resulting assemblies is growing. Structural and functional annotation is still challenging as it includes finding the correct gene sequences, annotating other elements such as RNA and being able to submit those data to databases to share it with the community. Compared to de novo assembly where contiguous chromosomes are a sign of high quality, it is difficult to visualize and assess the quality of annotation. We developed the Companion web server to allow non-experts to annotate their genome using a reference-based method, enabling them to assess the output before submitting to public databases. In this update paper, we describe how we have included novel methods for gene finding and made the Companion server more efficient for annotation of genomes of up to 1 Gb in size. The reference set was increased to include genomes of interest for human and animal health from the fungi and arthropod kingdoms. We show that Companion outperforms existing comparable tools where closely related references are available.