The National Science Foundation's (NSF) research program on Critical Techniques, Technologies and Methodologies for Advancing Foundations and Applications of Big Data Sciences and Engineering (BIGDATA) seeks to further develop data science, a transdisciplinary field of research, to understand phenomena through data analytics and massive computation on vast amounts of empirical data. This solicitation invites proposals that focus on the foundations of data science and innovative applications of data science. Of particular interest are proposals for which data science and the availability of big data are creating new opportunities for research not possible before, as well as proposals that explore research topics identified by participating NSF directorates listed in this solicitation. Proposers must choose one of two categories: Foundations (F) or Innovative Applications (IA).
Foundations (BIGDATA: F): Proposals in this category are expected to address the development of highly innovative, fundamental techniques, theories, methodologies, and technologies for big data management and/or analytics—including knowledge management, semantic technologies, and foundational mathematical, statistical, and probabilistic approaches—that have wide applicability beyond specific narrow domains. Proposals must justify why the new methods are needed, and why general extensions of small data models, methods, and/or approaches are not adequate.
Proposals focusing on design and development of novel systems addressing emerging challenges in big data such as fairness, interpretability, modeling transparency, algorithmic accountability, reproducibility, and multi-modal interfaces to data are encouraged.
Innovative Applications (BIGDATA: IA): Proposals in this category must focus on the development of innovative big data techniques, methodologies, and technologies for specific application areas, or innovative adaptations of existing big data techniques, methodologies, and technologies to new application areas. Proposals must address a big data challenge of key importance to at least one application domain from one of the participating NSF directorates. Proposals should be clear about how research in the domain is enabled by the availability of big data and insights provided by the analysis of these big data, i.e., how new problems can be addressed that could not previously be addressed.
It is expected that projects in this category will require close interaction among researchers from technical/methodological disciplines and those from science and engineering application domains, in order to explore complex, data-driven questions in one or more domains, including the development of domain-specific and cross-domain knowledge structures. Thus, projects are expected to be collaborative in nature, involving researchers from domain disciplines and one or more technical disciplines such as computer science, mathematics, statistics, computational science, etc., stimulating further research on all sides of the collaboration.
Applicants considering submitting proposals in this category are strongly encouraged to discuss their planned research with one or more program officer(s) from the respective NSF directorate(s), in advance of submitting the proposal. It is anticipated that proposals awarded in this category will be jointly, or fully, funded by the participating NSF directorate(s) interested in the application area(s).
The BIGDATA solicitation encourages projects to address reproducibility and replicability of the proposed experimental, methodological, and computational approaches in both categories. Given the varying definitions of "replicability", "repeatability", "reproducibility", and like terms, in different contexts, and across different domains and user communities, PIs should clearly indicate which definition(s) is (are) most appropriate for their approaches. PIs are also encouraged to retain relevant digital data, software, and/or algorithms with documentation describing the study design, preprocessing pipelines, statistical methods, and computational platforms. This information may be provided as part of a project's Data Management Plan.