Mikhail Karasikov

ML Engineer, PhD

kaiko.ai

About me

I am a Machine Learning engineer at Kaiko.AI, where I develop self-supervised and supervised Machine Learning methods for clinical pathology. I focus on building foundation models for pathology images and RNA-Seq data to help hospitals with diagnostics and other types of clinical problems.

I completed my PhD at ETH Zurich, where I designed novel algorithms and compressed data structures for indexing huge collections of biological sequences and developed methods scalable to the entire Sequence Read Archive. These methods enable analysis and queries, which would otherwise be practically impossible using only the raw data.

Prior, I studied Math, Physics, and Optimal Control at the Moscow Institute of Physics and Technology (MIPT). Then, I did a double Master’s program in Mathematics and Machine Learning at MIPT and Skoltech. At the same time, I completed a two-year CS program at the Yandex School of Data Analysis and then interned at Inria Grenoble-Rhône-Alpes working on various problems of computational structural biology.

Interests

Machine Learning
Bioinformatics
Computational Biology
Compressed Data Structures

Free time

Hiking/Camping, Skiing, Biking
Guitar, Piano
Reading, Theater, Art

Education

Ph.D. in Computer Science, 2023

ETH Zurich, Zurich, Switzerland
M.Sc. in Math. and Computer Science, 2017

Skoltech, Moscow, Russia
M.Sc. in Applied Math. and Physics, 2017

MIPT, Moscow, Russia
PG Dip. in Computer Science, 2016

Yandex School of Data Analysis, Moscow, Russia
B.Sc. in Applied Math. and Physics, 2015

MIPT, Moscow, Russia

Recent & Upcoming Talks

PhD defense

Doctoral examination (ETH Zurich, Zurich, Switzerland).

Jul 10, 2023 20:30 — 22:30

Searching in nucleotide archives at Petabase scale with MetaGraph

Zurich Seminars in Bioinformatics 2022 (UZH Irchel Y55-l-06/08, Zurich, Switzerland).

Nov 24, 2022 12:15

Searching in nucleotide archives at Petabase scale with MetaGraph

Biological Data Science - CSHL Meeting 2022 (Cold Spring Harbor, New York, USA).

Nov 9, 2022 — Nov 12, 2022

Scalable Indexing of Sequence Data in Annotated Genome Graphs

Invited talk at JOBIM 2022 (Rennes, France).

Jul 7, 2022 15:00

Scalable Indexing of Sequence Data in Annotated Genome Graphs

Multi-genome Representations with MetaGraph

Talk at IGGSy 2022 (Ascona, Switzerland).

Jul 4, 2022 11:15

Multi-genome Representations with MetaGraph

See all events

Projects

Ocean Microbiomics Database

Collaboration with Sunagawa Lab. Provided k-mer based sequence search and Counting de Bruijn graph indexes for the Genome Collection of the Ocean Microbiomics Database. Other contributors: Lucas Paoli, Harun Mustafa, Andre Kahles. (Published in Nature).

MetaGraph

A C++ framework library for indexing very large collections of DNA/Protein sequences and a tool for sequence search, alignment, and assembly. Although the target use cases of MetaGraph overlap with BLAST, MetaGraph mainly focuses on the scalable indexing of raw sequencing data in annotated de Bruijn graphs with up to $\sim 10^{12}$ nodes and $\sim 10^{7}$ annotation labels. It also provides an online platform MetaGraph Online. Other contributors: Marc Zimmermann, Thomas Zhou, the MetaGraph team.

Compressed Hybrid Bit Vector Representations

A C++ library with hybrid schemes for representing bit vectors in compressed space.

GeoDNA

A portal for sequence search and geographical positioning based on the metagenomic MetaSUB data. The initial prototype was set up on a weekend but it served well and was also used as a base for the MetaGraph Search platform. Other contributors: Marc Zimmermann, Jiayu Chen, André Kahles, Thomas Zhou. (Published in Cell).

De Bruijn Graph Visualizer

A web app visualizing de Bruijn graphs and the BOSS table (Bowe et al.). Developed to interactively illustrate the core data structure used as a k-mer index for graph representation in MetaGraph.

Protein Scoring

A method for single-model coarse-grained protein quality assessment developed during my internship at Inria Grenoble-Rhône-Alpes (with Guillaume Pagès and Sergei Grudinin).

Activity Prediction

Classification of time-series data from smartphone accelerometer sensor. Implemented as a practical demonstration of the methods developed in my B.Sc. thesis.

Featured Publications

Mikhail Karasikov, Harun Mustafa, Gunnar Rätsch, André Kahles (2021). Lossless Indexing with Counting de Bruijn Graphs. In RECOMB 2022.

Cite Project DOI Preprint Code

Daniel Danciu, Mikhail Karasikov, Harun Mustafa, André Kahles, Gunnar Rätsch (2021). Topology-based Sparsification of Graph Annotations. In ISMB/ECCB 2021.

PDF Cite Project Slides DOI Code

Mikhail Karasikov, Harun Mustafa, Daniel Danciu, Marc Zimmermann, Christopher Barber, Gunnar Rätsch, André Kahles (2020). MetaGraph: Indexing and Analysing Nucleotide Archives at Petabase-scale. In bioRxiv.

Cite Project DOI Preprint Code

Mikhail Karasikov, Guillaume Pagès, Sergei Grudinin (2018). Smooth Orientation-Dependent Scoring Function for Coarse-Grained Protein Quality Assessment. In Bioinformatics.

PDF Cite Project DOI Code

Mikhail Karasikov, Harun Mustafa, Amir Joudaki, Sara Javadzadeh No, Gunnar Rätsch, André Kahles (2018). Sparse Binary Relation Representations for Genome Graph Annotation. In RECOMB 2019.

PDF Cite Project Slides DOI Code

Teaching

Courses TAed at ETH Zürich, Institute for Machine Learning:

Deep Learning (Fall 2017)
Computational Intelligence Lab (Spring 2018, 2019)
Advanced Machine Learning (Fall 2018, 2019, 2020)
Introduction to Machine Learning (Spring 2020)
Statistical Learning Theory (Spring 2021)
Computational Challenges in Medical Genomics (Spring 2019, 2020, 2021, 2022)

Contact

DM me