Research Statement

My research focuses on developing robust, reusable software infrastructure for modern genomics and data-intensive life science. I have worked at the intersection of research software engineering, genome annotation and bioinformatics workflows, building tools that make complex analyses reproducible, maintainable and easier for others to use.

A recurring theme is turning fragile, one-off pipelines into sustainable platforms: modular workflows, containerisation, automated testing and clear documentation, developed in close collaboration with domain scientists. This improves the reliability of downstream biological insights and reduces the cost of keeping tools alive as methods and datasets evolve.

Research Areas

Genome annotation & comparative genomics

I am interested in how we generate, assess and improve genome annotations, particularly in complex eukaryotic systems such as parasites. My work focuses on using comparative genomics, orthology, and transcriptomic evidence to evaluate annotation quality, benchmark automated approaches against curated standards, and highlight systematic patterns of misannotation that can be fed back into future assemblies and pipelines.

Bioinformatics pipelines & FAIR research software

A core strand of my work is developing and maintaining bioinformatics pipelines and software that turn biological data into reproducible, shareable analyses. This includes systems for genome annotation as well as workflows that standardise quality control, analysis and post-processing for data types such as bulk and single-cell RNA-seq. I emphasise FAIR research software principles in this work: modular design, semantic versioning, containerisation, automated testing and documentation, so that workflows can be installed and reused rather than remaining one-off scripts tied to a single project.

Interactive visualisation & web-based analysis tools

My work also includes contributing to web-based tools that sit on top of complex genomic or clinical datasets and make them usable for researchers with limited coding experience. This involves helping to design and implement browser-based interfaces, guided workflows and integrated analyses that allow users to explore data, generate figures and interrogate results without managing infrastructure or writing scripts. A recurring focus is on ensuring that these tools remain transparent and reproducible, so that analyses can be revisited, compared and incorporated into wider research workflows.

Current Projects

Supporting manual annotation of helminth genomes

In Summer 2024 I carried out comparative genomics analysis to support an upcoming grant application. This involved comparing genome assembly and annotation quality of several automated tools against a "gold-standard" set of helminth reference genomes curated using manual means (e.g. genome browsers). The aim was to demonstrate the continued need to invest in platforms that facilitate manual annotation, by highlighting instances where automated methods consistently under-performed. In particular, I focused on exonic differences between orthologues of a set of closely related schistosome species, where high conservation would be expected. The results have mostly been written up in a draft paper which I hope to submit in the not-too-distant future.

Hackathon project: Improved visualisation of alternative splicing

I led a team of 10 bioinformaticians at the 2025 Glasgow CompBio Community Hackathon event to develop workflows for visualising evidence of alternative splicing. From the outset, we recognised that existing methods for displaying transcriptomic evidence (e.g. RNA-seq coverage tracks) don't clearly indicate where alternative transcripts might exist. We developed two complementary workflows, both implemented for compatability with the genome browser JBrowse 2: a plugin for rendering additional BED-format fields to indicate cell-type (colour) and strength of support (opacity) at splice junction sites; and a custom tag for aligned reads in BAM files to indicate "most likely" alternative transcript that they support, allowing reads to be sorted/coloured in JBrowse 2 using existing functionality to visually group supporting evidence. A publication capturing the analyis and hackathon context is in preparation.

Collaborative genome browsing with Apollo3

I am part of the international development team for Apollo3, a substantial JBrowse 2 plugin that facilitates a collaborative layer as well as many additional features, which is written in TypeScript. For my contribution, I have had a particular focus on contributing features to the user interface, in particular a "six-frame" view familiar to users of Artemis - a no longer supported genome browser - to encourage their migration to the new platform. Currently in beta, version 1 is planned for release to coincide with the publication of a paper, which is in preparation.

Melodramatick: Audience data for Digital Humanities

Melodramatick is an open-source web framework where audiences record their participation habits and discover new repertoire for a range of performing arts, broadening engagement in less-mainstream artforms under pressure to adapt. Users log attendance of live or streamed performances of works by "ticking" them in the interface; researchers access open, privacy-safe data on canon formation and participation to enable digital humanities research. Deployed prototypes include Operatick, cataloguing 1,000+ operatic works. Current plans include facilitating ingestion of historic performance data for longer term analysis of repertoire programming trends.

Research Impact

A central impact of my work is the development of research software that lowers the barrier to high-quality genomic analysis and is reused across multiple projects, groups and domains.

Tools that I have contributed to and continue to maintain include Companion, an automated annotation platform that delivers high-quality structural and functional annotations within hours. Companion combines reference-guided annotation with visualisation tools, and is accessible both as a public web server and a maintained Docker image. Since launch, it has been adopted by over 1,000 researchers worldwide, who have used it to annotate over 10,000 genomes across parasitology, crop science, and ecology (See Papers).

Another tool, peaks2utr, is a Python CLI for annotating 3′ untranslated regions (3′ UTRs) of mRNAs, which I developed from scratch. These regions regulate key mRNA-based processes such as localisation, stability and translation, making them important targets for functional inference. Since publication, peaks2utr has been downloaded thousands of times and has attracted a growing number of citations.