RUB »Faculty of Medicine » CUBiMed.RUB

CUBiMed.RUB - Core Unit Bioinformatics

The Core Unit Bioinformatics "CUBiMed.RUB" at the Faculty of Medicine offers a wide range of bioinformatics resources for academics in the fields of life sciences. We offer consulting and analyses, training and tutorials, software and access to high performance hardware. The focus is on proteomics, genomics, transcriptomics, their combination ("multi omics"), but also on other - related or new - omics technologies and clinical research data.

The Core Unit Bioinformatics "CUBiMed.RUB" at the Faculty of Medicine builds on the de.NBI-BioInfra.Prot project which was funded by the BMBF from 2014 to 2021. With the help of the Core Unit, the established range of resources can be continued (CUBiMed.RUB realizes the own contribution of the RUB for the prolonged de.NBI 2.0 project, funded until 2027).

A close cooperation with the European infrastructure initiative ELIXIR was established and will continue in the future.

The CUBiMed.RUB is run by members of the Medical Bioinformatics and the Research Area Bioinformatics of the Medizinisches Proteom-Center.

If you have any question or inquiries, feel free to contact us: cubimed@rub.de

Resources

We offer consulting and analyses, training and tutorials, software and access to high performance hardware for academics in life sciences. These are provided as scientific cooperations and are usually free of charge, for more details see our FAQs.

Consulting and analyses

CUBiMed.RUB offers support for the analysis of proteomics, genomics, transcriptomics and related clinical data. The consulting and analysis is free of charge for academics in the life sciences, which especially supports groups without an associated bioinformatics or biostatistics department.

We offer a bioinformatics consulting service regarding the application of our own as well as third-party proteomics software. We advise on data analysis and the selection of suitable software tools. In addition, we develop user-friendly workflows for frequently required analyses and make them available. In addition to the expert advice provided by our bioinformaticians, we also offer to perform the corresponding analyses, supported by our high-performance hardware.

We also provide support in the planning of omics studies, in the selection of suitable statistical analysis methods and in the interpretation and visual presentation of the results obtained. This also includes up-to-date machine learning methods.

Additionally, we help researchers to run their analysis on high performance hardware in the cloud.

Training and Tutorials

We offer training courses for scientists about programming languages, data analysis and software tools. These courses are regularly held in–person (in Bochum or at different scientific conferences) or online in context of the de.NBI project. On request, we also offer additional training courses for small groups. Here are examples of such training courses:

"Differential analysis of proteomics data using R": R is a programming language especially suited for statistical analysis. We offer basic training courses that introduce life science researchers to R and help them to start their first analysis. The courses do not only teach the programming language but also the methodology and ways of interpreting the results. More advanced courses can be attended that focus on more advanced topics and data visualization.

"Introduction to Python": Our course covers the basic programming paradigms of Python, handling of research data with the library "pandas", the visualization of research data with the libraries "plotly" and "sweetviz", up to the integration of Python into the high-performance programming language Rust.

Our trainings courses are announced via de.NBI:
Link to de.NBI training homepage https://www.denbi.de/training

Software

MaCPepDB

MaCPepDB is a tryptic digest of the complete UniProt KB (SwissProt & TrEMBL) designed to not only allow queries of peptide sequences and return the respective information about connected proteins and thus whether a peptide is unique but also allow queries of specific masses of peptides or precursors of MS/MS spectra. Furthermore, posttranslational modifications can be considered in a query as well as different mass deviations. Hence the database can be used by a sequence query not only to, for example, check in which proteins of the UniProt database a tryptic peptide can be found but also to find possibly interfering peptides in PRM/SRM experiments.

Web: https://macpepdb.mpc.rub.de

API-Documentation: https://macpepdb.mpc.rub.de/docs/api

Source Code
Version 1.x & 2.x

Version 3.x - Under development

PIA - Protein Inference Algorithms

PIA is a toolbox for MS based protein inference and identification analysis.

PIA allows you to inspect the results of common proteomics spectrum identification search engines, combine them seamlessly and conduct statistical analyses. The main focus of PIA lays on the integrated inference algorithms, i.e. concluding the proteins from a set of identified spectra. But it also allows you to inspect your peptide spectrum matches, calculate FDR values across different search engine results and visualize the correspondence between PSMs, peptides and proteins.

https://github.com/medbioinf/pia

CalibraCurve

CalibraCurve is a tool for generating and visualizing calibration curves for targeted proteomics data. CalibraCurve enables automated batch mode determination of dynamic linear ranges and quantification limits for targeted proteomics and similar assays. The software uses a variety of measurements to assess the accuracy of the calibration and provides intuitive visualizations.

Link to Github:
https://github.com/mpc-bioinformatics/CalibraCurve

Workflows

For the creation of reproducible results in bioinformatics analyses, we use specialized workflow systems like Nextflow and Snakemake. Here, we offer not only to use well-defined and general approaches, but also help with the creation or create custom tailored approaches for various omics analyses. These workflows allow reproducibility with containerized software and thus help to provide FAIR (findable, accessable, interoperable, re-usable) results.

Support for SDRF generation

SDRF (Sample Data Reference Format) for proteomics provides sample metadata in a structured and consistent way. This often missing information not only contributes to greatly improved reproducibility of proteomic studies, but also provides a way to conduct periodic semi-automatic reanalysis of large datasets using new and modern tools. Due to the promising advantages of the format a SDRF file is now recommended for submission of any dataset to the PRIDE database.

The generation of the SDRF might be complex depending on your dataset. Therefore we can aid you in the process.

Support for repository submissions

To store the fundament of scientific analyses and the respective results, most communities and journals make it necessary by now to store the raw machine results of any omics method together with the basic findings in specialized repositories. In proteomics, these are the ProteomeXchange repositories, in sequencing omics for example the European Genome-phenome Archive (EGA), European Nucleotide Archive (ENA) or ArrayExpress.

As the uploads can be cumbersome, we offer support to handle these.

Additional software not listed here and projects under development can also be found at our GitHub pages (https://github.com/cubimedrub, https://github.com/mpc-bioinformatics and https://github.com/medbioinf)

Hardware and compute cluster

CUBiMed runs a compute cluster with several hundred CPU cores and gigabytes of memory in the RUB's data centre. Together with low-latency storage devices and a fast internet connection, this cluster enhances our ability to support researchers in answering scientific questions.

On this cluster, we provide virtual machines using the OpenStack environment, which are free for scientific use. You can run workflow engines such as Nextflow or Snakemake, or perform advanced computations in R or any scientific analysis.

If you are planning computational analyses that you cannot perform on your laptop or desktop, please contact us for further information.

Legacy

Here we list tools and infrastructure which we currently cannot maintain anymore, but which were part of our services in the past.

BIONDA

Bionda is a free database for fast information on biomarkers for several diseases, which was created based on text-mining of abstracts of published open access scientific literature. Biomarkers can include genes, proteins and miRNAs. If a biomarker-disease pair co-occurred within the same sentence within the abstract, it was handled as potential biomarker pair and if this biomarker pair had enough occurrences within literature to achieve a significant p-value for the χ2 test, it was added as relevant biomarker to the database. Bionda also used to have a web application which was taken offline due to maintenance overhead. Of course, the database freeze of 2021 can still be acquired as a csv file from Zenodo at https://doi.org/10.5281/zenodo.14770260.

The publication for Bionda can be found here: https://doi.org/10.1093/bioadv/vbab015.

Heads of core unit

Jun.-Prof. Dr. Julian Uszkoreit

Tel.: +49 234 32 18175

E-Mail

Prof. Dr. Martin Eisenacher

Tel.: +49 234 32 18104

E-Mail