Invited speakers

Person Flora Jay
Evolution, Phylogeny & Population genetics

Creating Artificial Genomes using Generative Networks

Generative models have shown breakthroughs in a wide spectrum of domains due to recent advancements in machine learning algorithms and increased computational power. Despite these impressive achievements, the ability of generative models to create realistic synthetic data is still under-exploited in genetics and absent from population genetics. Yet a known limitation of this field is the reduced access to many genetic databases due to concerns about violations of individual privacy, although they would provide a rich resource for data mining and integration towards advancing genetic studies. We demonstrated that deep generative adversarial networks (GANs) and restricted Boltzmann machines (RBMs) can be trained to learn the high dimensional distributions of real genomic datasets and generate novel high- quality artificial genomes (AGs) with little privacy loss.
We show that our generated AGs replicate characteristics of the source dataset such as allele frequencies, linkage disequilibrium, pairwise haplotype distances and population structure. Moreover, they can also inherit complex features such as signals of selection and genotype-phenotype associations. To illustrate the promising outcomes of our method, we showed that imputation quality for low frequency alleles can be improved by augmenting reference panels with AGs and that the RBM latent space provides a relevant encoding of the data, hence allowing further exploration of the reference dataset and providing features that could help solving supervised tasks.



Person Núria López-Bigas
Genetics & Precision Medicine

Computational analysis of cancer genomes

Somatic mutations are the driving force of cancer genome evolution. The rate of somatic mutations appears to be greatly variable across the genome due to variations in chromatin organization, DNA accessibility and replication timing. In addition, other variables that influence the mutation rate in a local scale are starting to emerge. I will discuss recent findings from our lab on how DNA-binding proteins, nucleosomes and differences in exons and introns influence mutation rate. These findings have important implications for our understanding of mutational and DNA repair processes, genome evolution and in the identification of cancer driver mutations.
Given the evolutionary principles of cancer, one effective way to identify genomic elements involved in cancer is by tracing the signals left by the positive selection of driver mutations across tumours. We analyze thousands of tumor genomes to identify cancer genes and driver mutations (available in The analysis of tumor cohorts provide valuable information to improve the interpretation of individual variants detected in newly sequenced tumors in clinical or research settings. We have developed, a tool designed to identify driver mutations and biomarkers of drug response in individual tumors.
Some cancer therapies damage DNA and cause mutations both in cancer and healthy cells of the patient. Currently we ignore the mutation burden caused by different cancer treatments. We have recently identified mutational signatures, or footprints of six widely-used anti-cancer therapies across more than 3,500 metastatic tumors originating from different organs. These include previously known and new mutational signatures generated by platinum-based drugs, and a novel signature of nucleoside metabolic inhibitors. Exploiting these mutational footprints, we estimate the contribution of different treatments to the mutation burden of tumors and their risk of contributing coding and potential driver mutations in the genome. These mutational footprints pave the way for precisely assessing the mutational risk of different cancer therapies to understand their long-term side effects.



Person Andrea Rau
Statistics & Machine learning

Integrative and interactive analyses of multi-omics data

The increased availability and affordability of high-throughput sequencing technologies in recent years has facilitated the use of multi-omic studies to expand and enrich our understanding of complex biological systems. However, defining a holistic and meaningful way to exploit these heterogeneous and multi-faceted ‘omics data can be complicated by several major obstacles. These include the unknown hierarchy and potentially ambiguous relationships among different sources of data, the explosion in data dimension, issues due to batch effects and quality control, potentially incomplete or missing data, limited sample sizes, and the occasional difficulty in posing well-defined and answerable research questions of such data. In light of these challenges, in this talk I will provide an overview of some of our methodological contributions to integrative multi-omic analyses, and I will discuss how the development of interactive tools can be a useful addition to the multi-omic analysis toolbox.



Person Johannes Söding
Structural Bioinformatics, Genome Wide Association Study & Gene Regulatory Network


Person Martin Weigt
Proteomics & Structural Bioinformatics

Protein sequence landscapes: from data-driven models to evolution-guided sequence design

Thanks to the sequencing revolution in biology, protein sequence databases have been growing exponentially over the last years. Data-driven computational approaches are becoming more and more popular in exploring this increasing data richness. In my talk, I will show that global statistical modeling approaches, like (Restricted) Boltzmann Machines are able to accurately capture the natural variability of amino-acid sequences across entire families of evolutionarily related but distantly diverged proteins. We show that these models, also known under the name Direct Couplings Analysis, are biologically interpretable; they allow to extract information about the three-dimensional protein structure and about protein-protein interactions from sequence data, and they unveil distributed sequence motifs. These models can be seen as highly performant generative models - they capture the natural sequence variability far beyond fitted quantities, and they allow to design novel, fully functional proteins. This last observation opens the field towards data-driven evolution-guided approaches to protein design.



IFB invited speakers

Person Kjell Petersen
Technical Coordinator Elixir Norway -

Elixir Norway provides a wide range of services to the life science community. They range from specialized and focused services within the marine and human health domains targeting the international audience, to more generic data management and data analysis type of services targeting a national user base. When it comes to human genetic data, additional requirements comes into play that heavily influence how we can develop and operate e-infrastructure services for research.
In this talk I will present the philosophy, architecture and platforms we in the Norwegian Elixir node are building our e-infrastructure for life science services on top of. I will start with the less restricted case of non-sensitive data, and then move towards the case of sensitive data, including our work to prepare the Norwegian Federated EGA node for production, and the hosting of sensitive human data as part of the network of Federated EGA services.


Online user: 23 Privacy