Sergey V. Samsonau

DC
h-index10
5papers
2citations
Novelty33%
AI Score40

5 Papers

11.5ED-PHApr 21
The Research Guide: From Informal Role to Profession

Sergey V. Samsonau, Matthew Pearce

Guiding others through authentic scientific research outside of PhD programs has been practiced for decades in specialized secondary schools, undergraduate research programs, and independent settings. These practitioners work in the middle, between the classroom science teacher and the PhD advisor, guiding learners with aptitude or serious interest. Sport and music have dedicated professions for this middle position (the school-team coach and the school band director); research does not. This paper names that missing profession the Research Guide: the practitioner who develops another person's capacity to do research, from framing a question to communicating findings. Hundreds of thousands of middle and high school students already pursue authentic research each year, even more college undergraduates participate in research with a faculty member, and millions of adults engage in citizen science. In current practice, the programs that serve this middle group mostly default to a simplified version of the PhD apprenticeship model structured around one mentor with a few students at a time, without systematic training; they overwhelmingly frame research as the hypothetico-deductive cycle alone. The role calls for cognitive apprenticeship, a pedagogical approach in which an expert's tacit moves on open-ended problems are made visible and scaffolded, then faded as the learner develops, while the research outcomes themselves remain unpredictable. It spans multiple modes of inquiry (not only the hypothetico-deductive cycle) and demands a combination that no existing training program produces: pedagogy, research methodology, developmental assessment, risk and productive struggle management, domain flexibility, and community building. Together these demands warrant a dedicated profession: a named role, a training pathway, a career ladder, hiring standards, and institutional recognition.

DCMar 7, 2024Code
Improvements & Evaluations on the MLCommons CloudMask Benchmark

Varshitha Chennamsetti, Laiba Mehnaz, Dan Zhao et al.

In this paper, we report the performance benchmarking results of deep learning models on MLCommons' Science cloud-masking benchmark using a high-performance computing cluster at New York University (NYU): NYU Greene. MLCommons is a consortium that develops and maintains several scientific benchmarks that can benefit from developments in AI. We provide a description of the cloud-masking benchmark task, updated code, and the best model for this benchmark when using our selected hyperparameter settings. Our benchmarking results include the highest accuracy achieved on the NYU system as well as the average time taken for both training and inference on the benchmark across several runs/seeds. Our code can be found on GitHub. MLCommons team has been kept informed about our progress and may use the developed code for their future work.

14.1DLApr 13
Visible, Trackable, Forkable: Opening the Process of Science

Sergey V. Samsonau

The way science is currently practiced shows conclusions but hides how they were reached. Researchers work privately, polish their results, publish a finished paper, and defend it. Errors are punished by retraction rather than corrected by amendment. Alternative directions are pursued through competing papers with no shared history. The reasoning, the dead ends, the trade-offs, the corrections: everything that would let others understand how a conclusion was reached is invisible. Two decades of open science reform have addressed this by opening specific artifacts: papers, data, code, notebooks, protocols. Each is valuable, but the unit remains a finished product. None opens the thinking process itself: the evolving sequence of questions, interpretations, dead ends, and direction changes that constitutes the actual scientific contribution. This paper argues that opening the process of science (not just its outputs) would produce a step change in the speed of scientific progress, the accessibility of scientific reasoning, the trustworthiness of scientific claims, and the scalability of scientific quality assurance. We identify three properties the workflow needs: visible (the process is open, not just the product), trackable (every change is recorded and attributable), and forkable (anyone can branch from any point with shared history preserved). A visible, trackable flow is inherently verifiable: by humans, by automated tools, by AI agents. Software development adopted this flow decades ago, and the results (faster correction, broader contribution, maintained quality at scale) demonstrate the opportunity for science.

37.5SEMar 18
scicode-lint: Detecting Methodology Bugs in Scientific Python Code with LLM-Generated Patterns

Sergey V. Samsonau

Methodology bugs in scientific Python code produce plausible but incorrect results that traditional linters and static analysis tools cannot detect. Several research groups have built ML-specific linters, demonstrating that detection is feasible. Yet these tools share a sustainability problem: dependency on specific pylint or Python versions, limited packaging, and reliance on manual engineering for every new pattern. As AI-generated code increases the volume of scientific software, the need for automated methodology checking (such as detecting data leakage, incorrect cross-validation, and missing random seeds) grows. We present scicode-lint, whose two-tier architecture separates pattern design (frontier models at build time) from execution (small local model at runtime). Patterns are generated, not hand-coded; adapting to new library versions costs tokens, not engineering hours. On Kaggle notebooks with human-labeled ground truth, preprocessing leakage detection reaches 65% precision at 100% recall; on 38 published scientific papers applying AI/ML, precision is 62% (LLM-judged) with substantial variation across pattern categories; on a held-out paper set, precision is 54%. On controlled tests, scicode-lint achieves 97.7% accuracy across 66 patterns.

DCDec 11, 2023
MLCommons Cloud Masking Benchmark with Early Stopping

Varshitha Chennamsetti, Gregor von Laszewski, Ruochen Gu et al.

In this paper, we report on work performed for the MLCommons Science Working Group on the cloud masking benchmark. MLCommons is a consortium that develops and maintains several scientific benchmarks that aim to benefit developments in AI. The benchmarks are conducted on the High Performance Computing (HPC) Clusters of New York University and University of Virginia, as well as a commodity desktop. We provide a description of the cloud masking benchmark, as well as a summary of our submission to MLCommons on the benchmark experiment we conducted. It includes a modification to the reference implementation of the cloud masking benchmark enabling early stopping. This benchmark is executed on the NYU HPC through a custom batch script that runs the various experiments through the batch queuing system while allowing for variation on the number of epochs trained. Our submission includes the modified code, a custom batch script to modify epochs, documentation, and the benchmark results. We report the highest accuracy (scientific metric) and the average time taken (performance metric) for training and inference that was achieved on NYU HPC Greene. We also provide a comparison of the compute capabilities between different systems by running the benchmark for one epoch. Our submission can be found in a Globus repository that is accessible to MLCommons Science Working Group.