Niko Kesten

Mathematical molecular biologist interested in using spatially arranged epigenetic data to connect internal cellular state with local intercellular communication environment in order to better understand disease progression.

Research Statement

I use geometry and statistics with minimal prior assumptions to interpret large biological datasets in order to characterize molecular mechanisms with low bias and high sensitivity.

Computational Molecular Biology Work

I used this approach in building my repetitive region analysis pipeline that used multi-mapped reads, often thrown away due to analytical difficulty, combined with more localizable information from fully-mapped reads to gain statistical power. This allows insight into processes in hard to align regions of the genome, previously ignored by most analysis. The program is able to look both at binding near repeat regions as well as transcription of the regions themselves. The computation is able to be parallelized helping with limitations in RNA-seq read alignment speed in noncoding regions. The software was used in three high-profile projects that all produced mechanistic insights useful in targeted drug development:
 
In Repeat expansions confer WRN dependence in microsatellite-unstable cancers, the pipeline was able to characterize binding patterns at TA-dinucleotide repeats in ChIP-seq data.
In CDK4/6 inhibition reprograms the breast cancer enhancer landscape by stimulating AP-1 transcriptional activity, the program found that CDK4/6 inhibition was causing increased transcription of ERV repeats in tumor cells helping to induce apoptosis. This required multiple virtual servers and fully used the concurrent design to make the analysis tractable.
In Reprogramming of the esophageal squamous carcinoma epigenome by SOX2 promotes ADAR1 dependence, the software found active ERVs near SOX2 binding sites in mouse data. 
 
I used this method to write a multiomics pipeline to combine transcriptome and cistrome data, allowing for inference of which transcription factors or open chromatin regions may be driving specific genes that are being transcribed. This was useful in three projects across different areas of cancer research:
 
The algorithm was able to identify which epigenetic markers were affecting transcription and oncogenesis in the project: Distinct oncogenic signatures in malignant PEComa and leiomyosarcoma identified by integrative RNA-seq and H3K27ac ChIP-seq analysis.
In Combination therapies to improve the efficacy of immunotherapy in triple-negative breast cancer, the program found genetic sites of interest to determine the pathways leading to apoptosis.
The software was able to profile colorectal tumors in: Abstract PR014: Cytokine mediated epigenetic reprogramming of CRC primary tumors drives liver metastasis.
 
This way of thinking has been useful in both computational and lab methods development projects. In CoBRA: containerized bioinformatics workflow for reproducible ChIP/ATAC-seq analysis, I added features, debugged and checked the pipeline results for correctness and completeness. In FiTAc-seq: fixed-tissue ChIP-seq for H3K27ac profiling and super-enhancer analysis of FFPE tissues, I performed the epigenetic profiling and super-enhancer analysis checking that the method was reliable and in concordance with existing assays. This approach was also helpful to identify pathways to overcome drug resistance in: Selective CDK7 inhibition suppresses cell cycle progression and MYC signaling while enhancing apoptosis in therapy-resistant estrogen receptor positive breast cancer
 
I lead the computational analysis and interpretation in the four projects:
ERG-Mediated Coregulator Complex Formation Maintains Androgen Receptor Signaling in Prostate Cancer,
Enhanced Efficacy of Aurora Kinase Inhibitors in G2/M Checkpoint Deficient TP53 Mutant Uterine Carcinomas Is Linked to the Summation of LKB1–AKT–p53 …,
Collective extrusion initiates dissemination in organotypic model of high-grade serous carcinoma, and
Dissecting mechanisms of replication fork stabilization in patient-derived high-grade serous organoid cultures and their impact on therapeutic sensitivity and the immune-tumor ….
In these projects I performed data quality control, ran pipelines and wrote analysis scripts. I worked closely with experimental biologists to ensure proper interpretation and use of the different types of data, considering the biology as well as both computational and experimental limitations.

Current Research and Future Direction

In 2024 I plan to build on this body of work by refining my mathematical model of spatial transcriptomic data to take into consideration the differences between assayed gene expression and activity predicted from morphological profiling via computer vision. This allows isolation of the effects of the spatial distribution of cells on the slide. This should reduce bias when finding cell-cell interaction patterns from gene expression correlations across space. The model avoids the weaknesses of automated vision in certain aspects of understanding image composition in order to pick up patterns involving multiple cell types as well as avoiding the circular reasoning inherent in using expression for scRNA-seq cell clustering and aggregation.

This view would be useful in any organoid or solid tissue samples where the arrangement of cells on the assayed slide is expected to affect gene expression. For example, this could be used to look at immune infiltrates and muscle-tendon communication. This could also be used to look at how the epigenetically induced neuroplasticity from certain serotonergic compounds drives rewiring of maladaptive neural network connectivity patterns, providing a molecular mechanism for their observed effectiveness in addiction and mental health treatment.
 
My previous work looked at continuous variation of epigenetic data, such as cell type transitions, by viewing the cells as reference points in a map identifying position with gene expression looking for spatial transitions instead of the pseudotime analysis used in single cell work. This builds on that approach by looking more closely at the cell imaging data and using it to determine which cells are similar prior to the expression and position data, greatly increasing statistical power. I hope to integrate this into a spatial transcriptomics analysis computing kit to be called STACK.
 
Spatial transcriptomics is a great way for me to combine my RNA-seq analysis experience and abstract geometrical training because, assuming smooth expression variation over space, the biology is easily modeled using concepts from differential geometry. This allows correlation of internal cell state with morphology and local environment.

I am open to working on any other projects that could benefit from a mathematical perspective. It is worthwhile to use models that minimize assumptions and false positives at the cost of increased computing time because sequencing is still much more expensive than processor cycles. The recent advancements in sequencing technology have greatly increased the need for careful statistical analysis of NGS data since the massive number of reads produced allows for many places to search for patterns and avoid false positives. Multiomics has increased the sophistication of appropriate models significantly, requiring deeper mathematical expertise in biology.

More generally, I hope to promote the use of dynamic models in NGS analysis in order to connect the data to the biology. This allows for more information to be used such as recognizing that transcript count is related to protein production rate. Time-series analysis in particular can benefit from this approach. Every data analysis program is written with assumptions in mind and using the right model for the experiment at hand is key. For example, the fact that genes and mRNA are generally thought of as abstractions of proteins means that when doing analysis about mRNA and genes it is important to keep in mind what is happening to the final protein product. This is relevant to most epigenetic analysis, since analysis of noncoding regions and epigenetic markers is usually tied to nearby genes.

As of October 2024 I have several manuscripts in progress including projects about MET amplification in lung cancer, distinguishing PEComa from LMS, and cell differentiation in AITL. I am also working on using more differentiable and dynamical systems models in RNA-seq analysis in general.

Stay tuned!

Previous Work and Education

Before working in biology at MGH and DFCI, I worked in industrial process automation writing software for electrical grid management and tax fraud detection at hmx.ai. Before that I worked in complex systems research at NECSI using ideas from physics in development economics. While there, I helped write an article featured in the CFC Annual Report 2017.
 
I focused on the differential geometry used to characterize continuous spacetime symmetries in mathematical physics for my undergraduate honors thesis at Washington University in St. Louis advised by Xiang Tang called: Lie Groups and Lie Algebras, for which I was awarded highest honors, inducted into Sigma Xi and chosen as an MAA student nominee member. I worked on the software for the IEEE LED Modular Dance Floor. I was also jazz director of KWUR 90.3 FM where I had a show called Rhomboid Dreamscape on which I performed live surreal dream analysis on callers and played jazz, classical and electronic music for the people of St. Louis.
  
While a student at Wellesley High School I worked at Northeastern University doing statistical analysis for fuel cell catalyst design in the lab of Eugene Smotkin, taught Java and C++ classes to advanced middle schoolers, and was given the Bausch and Lomb award as the top science student in my graduating class. I also threw shot for the track and field team and played bass in the band that was a national finalist in the Essentially Ellington Jazz Festival.
  
See my Google Scholar for a complete list of publications and metrics

About Me

I am a collaborative researcher who maintains strong relationships with co-authors in Boston and around the world. I try to take a biologically informed, careful and statistically sound approach to computational data analysis. I enjoy learning from and teaching others. I have general quantitative training and knowledge that could be useful in any scientific field. 
  
I believe that healthcare, food, and shelter are human rights. In my free time I enjoy music, comedy, and weightlifting. I am also interested in philosophy and history. I love to read random articles on Wikipedia. I live in Cambridge, MA.
My pronouns are he/him.

Contact Information

Please contact me with collaboration or employment opportunities at:
 
nikokesten@gmail.com
 
Coming 2025: gmail@nikokesten.com will forward to nikokesten@gmail.com doubling the number of permutations of words that you can contact me at by making my address invariant under the transposition (2 1 3).