IMMar 1, 2022
Determining Research Priorities for Astronomy Using Machine LearningBrian Thomas, Harley Thronson, Anthony Buonomo et al.
We summarize the first exploratory investigation into whether Machine Learning techniques can augment science strategic planning. We find that an approach based on Latent Dirichlet Allocation using abstracts drawn from high impact astronomy journals may provide a leading indicator of future interest in a research topic. We show two topic metrics that correlate well with the high-priority research areas identified by the 2010 National Academies' Astronomy and Astrophysics Decadal Survey science frontier panels. One metric is based on a sum of the fractional contribution to each topic by all scientific papers ("counts") while the other is the Compound Annual Growth Rate of these counts. These same metrics also show the same degree of correlation with the whitepapers submitted to the same Decadal Survey. Our results suggest that the Decadal Survey may under-emphasize fast growing research. A preliminary version of our work was presented by Thronson et al. 2021.
IMFeb 23, 2015
Knowledge Discovery Framework for the Virtual ObservatoryBrian Thomas, Edward Shaya, Zenping Huang et al.
We describe a framework that allows a scientist-user to easily query for information across all Virtual Observatory (VO) repositories and pull it back for analysis. This framework hides the gory details of meta-data remediation and data formatting from the user, allowing them to get on with search, retrieval and analysis of VO data as if they were drawn from a single source using a science based terminology rather than a data-centric one.
IMFeb 23, 2015
A User Interface for Semantically Oriented Data Mining of Astronomy RepositoriesBrian Thomas, Edward Shaya
We present a user-friendly, but powerful interface for the data mining of scientific repositories. We present the tool in use with actual astronomy data and show how it may be used to achieve many different types of powerful semantic queries. The tool itself hides the gory details of query formulation, and data retrieval from the user, and allows the user to create workflows which may be used to transform the data into a convenient form.
IMFeb 20, 2015
Development of a VO Registry Subject Ontology using Automated MethodsBrian Thomas
We report on our initial work to automate the generation of a domain ontology using subject fields of resources held in the Virtual Observatory registry. Preliminary results are comparable to more generalized ontology learning software currently in use. We expect to be able to refine our solution to improve both the depth and breadth of the generated ontology.
IMFeb 3, 2015
Learning from FITS: Limitations in use in modern astronomical researchBrian Thomas, Tim Jenness, Frossie Economou et al.
The Flexible Image Transport System (FITS) standard has been a great boon to astronomy, allowing observatories, scientists and the public to exchange astronomical information easily. The FITS standard, however, is showing its age. Developed in the late 1970s, the FITS authors made a number of implementation choices that, while common at the time, are now seen to limit its utility with modern data. The authors of the FITS standard could not anticipate the challenges which we are facing today in astronomical computing. Difficulties we now face include, but are not limited to, addressing the need to handle an expanded range of specialized data product types (data models), being more conducive to the networked exchange and storage of data, handling very large datasets, and capturing significantly more complex metadata and data relationships. There are members of the community today who find some or all of these limitations unworkable, and have decided to move ahead with storing data in other formats. If this fragmentation continues, we risk abandoning the advantages of broad interoperability, and ready archivability, that the FITS format provides for astronomy. In this paper we detail some selected important problems which exist within the FITS standard today. These problems may provide insight into deeper underlying issues which reside in the format and we provide a discussion of some lessons learned. It is not our intention here to prescribe specific remedies to these issues; rather, it is to call attention of the FITS and greater astronomical computing communities to these problems in the hope that it will spur action to address them.
IMJun 27, 2012
Managing Distributed Software Development in the Virtual Astronomical ObservatoryJanet D. Evans, Raymond L. Plante, Nina Bonaventura et al.
The U.S. Virtual Astronomical Observatory (VAO) is a product-driven organization that provides new scientific research capabilities to the astronomical community. Software development for the VAO follows a lightweight framework that guides development of science applications and infrastructure. Challenges to be overcome include distributed development teams, part-time efforts, and highly constrained schedules. We describe the process we followed to conquer these challenges while developing Iris, the VAO application for analysis of 1-D astronomical spectral energy distributions (SEDs). Iris was successfully built and released in less than a year with a team distributed across four institutions. The project followed existing International Virtual Observatory Alliance inter-operability standards for spectral data and contributed a SED library as a by-product of the project. We emphasize lessons learned that will be folded into future development efforts. In our experience, a well-defined process that provides guidelines to ensure the project is cohesive and stays on track is key to success. Internal product deliveries with a planned test and feedback loop are critical. Release candidates are measured against use cases established early in the process, and provide the opportunity to assess priorities and make course corrections during development. Also key is the participation of a stakeholder such as a lead scientist who manages the technical questions, advises on priorities, and is actively involved as a lead tester. Finally, frequent scheduled communications (for example a bi-weekly tele-conference) assure issues are resolved quickly and the team is working toward a common vision