Niko Kesten
Mathematical molecular biologist interested in using spatially arranged epigenetic data to connect internal cellular state with local intercellular communication environment in order to better understand disease progression.
Research Statement
I use geometry and statistics with minimal prior assumptions to interpret large biological datasets in order to characterize molecular mechanisms with low bias and high sensitivity.
Computational Molecular Biology Work
I used this approach in building my repetitive region analysis pipeline that used multi-mapped reads, often thrown away due to analytical difficulty, combined with more localizable information from fully-mapped reads to gain statistical power. This allows insight into processes in hard to align regions of the genome, previously ignored by most analysis. The program is able to look both at binding near repeat regions as well as transcription of the regions themselves. The computation is able to be parallelized helping with limitations in RNA-seq read alignment speed in noncoding regions. The software was used in three high-profile projects that all produced mechanistic insights useful in targeted drug development:
I used this method to write a multiomics pipeline to combine transcriptome and cistrome data, allowing for inference of which transcription factors or open chromatin regions may be driving specific genes that are being transcribed. This was useful in three projects across different areas of cancer research:
I lead the computational analysis and interpretation in the four projects:
In these projects I performed data quality control, ran pipelines and wrote analysis scripts. I worked closely with experimental biologists to ensure proper interpretation and use of the different types of data, considering the biology as well as both computational and experimental limitations.
Current Research and Future Direction
In 2024 I plan to build on this body of work by refining my mathematical model of spatial transcriptomic data to take into consideration the differences between assayed gene expression and activity predicted from morphological profiling via computer vision. This allows isolation of the effects of the spatial distribution of cells on the slide. This should reduce bias when finding cell-cell interaction patterns from gene expression correlations across space. The model avoids the weaknesses of automated vision in certain aspects of understanding image composition in order to pick up patterns involving multiple cell types as well as avoiding the circular reasoning inherent in using expression for scRNA-seq cell clustering and aggregation.
This view would be useful in any
organoid or solid tissue samples where the arrangement of cells on the assayed slide is expected to affect
gene expression. For example, this could be used to look at immune infiltrates and muscle-tendon communication. This could also be used to look at how the epigenetically induced neuroplasticity from certain serotonergic compounds drives rewiring of maladaptive neural network connectivity patterns, providing a molecular mechanism for their observed effectiveness in addiction and mental health treatment.
My previous work looked at continuous variation of epigenetic data, such as cell type transitions, by viewing the cells as reference points in a map identifying position with gene expression looking for spatial transitions instead of the
pseudotime analysis used in single cell work. This builds on that approach by looking more closely at the cell imaging data and using it to determine which cells are similar prior to the expression and position data, greatly increasing statistical power. I hope to integrate this into a spatial transcriptomics analysis computing kit to be called STACK.
Spatial transcriptomics is a great way for me to combine my
RNA-seq analysis experience and abstract geometrical training because, assuming smooth expression variation over space, the biology is easily modeled using concepts from
differential geometry. This allows correlation of internal cell state with morphology and local environment.
I am open to working on any other projects that could benefit from a mathematical perspective. It is worthwhile to use models that minimize assumptions and false positives at the cost of increased computing time because sequencing is still much more expensive than processor cycles. The recent advancements in sequencing technology have greatly increased the need for careful statistical analysis of NGS data since the massive number of reads produced allows for many places to search for patterns and avoid false positives. Multiomics has increased the sophistication of appropriate models significantly, requiring deeper mathematical expertise in biology.
More generally, I hope to promote the use of dynamic models in NGS analysis in order to connect the data to the biology. This allows for more information to be used such as recognizing that transcript count is related to protein production rate. Time-series analysis in particular can benefit from this approach. Every data analysis program is written with assumptions in mind and using the right model for the experiment at hand is key. For example, the fact that genes and mRNA are generally thought of as abstractions of proteins means that when doing analysis about mRNA and genes it is important to keep in mind what is happening to the final protein product. This is relevant to most epigenetic analysis, since analysis of noncoding regions and epigenetic markers is usually tied to nearby genes.
As of October 2024 I have several manuscripts in progress including projects about MET amplification in lung cancer, distinguishing PEComa from LMS, and cell differentiation in AITL. I am also working on using more differentiable and dynamical systems models in RNA-seq analysis in general.
Stay tuned!
Previous Work and Education
Before working in biology at
MGH and
DFCI, I worked in industrial process automation writing software for electrical grid management and tax fraud detection at
hmx.ai. Before that I worked in complex systems research at
NECSI using ideas from physics in development economics. While there, I helped write an article featured in the
CFC Annual Report 2017.
I focused on the differential geometry used to characterize continuous spacetime symmetries in mathematical physics for my undergraduate honors thesis at Washington University in St. Louis advised by
Xiang Tang called:
Lie Groups and Lie Algebras, for which I was awarded highest honors, inducted into
Sigma Xi and chosen as an
MAA student nominee member. I worked on the software for the
IEEE LED Modular Dance Floor. I was also jazz director of
KWUR 90.3 FM where I had a show called Rhomboid Dreamscape on which I performed live surreal dream analysis on callers and played jazz, classical and electronic music for the people of St. Louis.
While a student at Wellesley High School I worked at Northeastern University doing statistical analysis for fuel cell catalyst design in the lab of
Eugene Smotkin, taught Java and C++ classes to advanced middle schoolers, and was given the Bausch and Lomb award as the top science student in my graduating class. I also threw shot for the track and field team and played bass in the band that was a national finalist in the
Essentially Ellington Jazz Festival.
About Me
I am a collaborative researcher who maintains strong relationships with co-authors in Boston and around the world. I try to take a biologically informed, careful and statistically sound approach to computational data analysis. I enjoy learning from and teaching others. I have general quantitative training and knowledge that could be useful in any scientific field.
I believe that healthcare, food, and shelter are human rights. In my free time I enjoy music, comedy, and weightlifting. I am also interested in philosophy and history. I love to read random articles on
Wikipedia. I live in Cambridge, MA.
My pronouns are he/him.
Contact Information
Please contact me with collaboration or employment opportunities at:
nikokesten@gmail.com
Coming 2025: gmail@nikokesten.com will forward to nikokesten@gmail.com doubling the number of permutations of words that you can contact me at by making my address invariant under the transposition (2 1 3).