Summary of "Finding & Analyzing Influenza Data in the BVBRC"
Summary of Scientific Concepts, Discoveries, and Nature Phenomena
This video provides a comprehensive tutorial on accessing, exploring, and analyzing influenza virus data using the Bioinformatics Resource Centers (BVBRC), focusing particularly on influenza viruses within the Orthomyxoviridae family. It highlights the integration of multiple viral and bacterial databases into BVBRC and demonstrates how researchers can leverage this platform for influenza genomics and bioinformatics research.
Key Scientific Concepts and Data Types Covered
Influenza Virus Taxonomy and Genomic Organization
- Influenza viruses belong to the Orthomyxoviridae family.
- Influenza A, B, C, and D viruses are included, along with other related viruses in the same family.
- Influenza genomes are segmented (typically 8 segments), which complicates data navigation.
Data Types Available in BVBRC
- Genomic sequences (whole genomes, segments)
- Protein sequences (features, open reading frames)
- Protein structures (from electron microscopy, NMR, X-ray crystallography)
- Domains and motifs within proteins
- Immune epitopes (B cell, T cell, MHC class)
- Surveillance data (geographical and host-specific)
- Serology data
- Host factor data and experimental omics data (e.g., transcriptomics, proteomics)
- Metadata including host species, geographic origin, collection date, etc.
Views and Navigation Methods
- Taxon View: Browse data by taxonomic rank (family, genus, species).
- Genome View: Detailed view of a single genome record and associated metadata.
- Genome Group View: Tabular view of multiple genomes with customizable metadata columns.
- Feature View: View of proteins or other genomic features for a virus.
- Feature Group View: Tabular view of multiple protein or gene features.
Methodologies and Tools for Influenza Data Analysis in BVBRC
Data Filtering and Customization
- Filters based on segment completeness, subtype (e.g., Victoria or Yamagata for influenza B), host, country, collection year, etc.
- Customizable metadata columns for detailed data exploration (e.g., H and N subtypes, genome quality, isolate info).
Data Download and Export
- Export tables as text, CSV, or Excel files.
- View FASTA sequences as DNA or protein.
Search Methods
- Browsing via taxonomy or virus family.
- Global search for keywords across all data.
- Advanced search with multiple criteria (protein name, taxon, host, geography, collection date, genome length, etc.).
Visualization Tools
- Genome browser for sequence and feature visualization.
- Protein structure viewer (Mol*), enabling interactive exploration of 3D structures.
- Phylogenetic tree builder with customizable coloring based on metadata.
- Multiple sequence alignment and SNP analysis tools.
- Metacat tool for statistical comparison of genome or protein groups.
New and Updated Tools
- FastQC and de novo assembly for next-generation sequencing data quality control and genome assembly.
- Annotation services for viral genomes, including influenza A, B, C, and work in progress for influenza D.
- Taxonomic classification and metagenomic binning tools for analyzing complex samples (e.g., nasal swabs).
- BLAST service against local viral reference databases.
- PCR primer design.
- Subspecies classification for influenza strains (H5, Swine H1; H1N1 migration ongoing).
Workspace and Data Management
- Genome groups for nucleotide datasets.
- Feature groups for protein or gene datasets.
- Upload, download, share, and organize data files.
- Command Line Interface (CLI) and API for programmatic access to large datasets.
- Incremental data retrieval based on metadata filters.
Differences and Improvements Compared to Previous IRD Platform
- BVBRC integrates multiple databases (Influenza Research Database, Virus Pathogen Resource, PATRIC) into a single platform.
- Broader taxonomic coverage beyond influenza to include the entire Orthomyxoviridae family and others.
- Enhanced data views and navigation options.
- Improved protein structure viewer (Mol*).
- Added metagenomic and taxonomic classification tools.
- Workspace and group-based data management system.
- Ongoing migration of legacy IRD tools and data; some features like phenotype data and antiviral risk assessment are still in progress.
- Legacy IRD site remains available until full migration is complete, but no new data updates occur there.
Researchers and Sources Featured
- Anna Maria Navidomska (Presenter)
- Reference to:
- Perillo et al. (publication related to metagenomic binning pipeline)
- CheckV tool for viral genome quality assessment
- BVBRC team (Bioinformatics Resource Centers for Infectious Diseases, funded by NIH NIAID)
- Legacy Influenza Research Database (IRD)
- Virus Pathogen Resource (ViPR)
- PATRIC (Pathosystems Resource Integration Center)
Summary
The video provides a detailed tutorial on how to find, filter, visualize, and analyze influenza virus data using the BVBRC platform. It covers the types of data available, how to navigate the database, and introduces a suite of bioinformatics tools for sequence quality control, assembly, annotation, phylogenetics, metagenomics, and comparative genomics. The platform offers new functionalities and integrates multiple legacy resources, facilitating advanced influenza research and data analysis for the scientific community.
Category
Science and Nature