Data Science Core Facility

Donald Danforth Plant Science Center • St. Louis, MO

Contact Info:

Director Noah Fahlgren, PhD

Pricing:

Available Upon Request

Summary:

Data Science at the Donald Danforth Plant Science Center is a computing and data analytics hub that develops and deploys technologies in computer science, mathematics, and statistics to accelerate discoveries from data and models in plant science.

The facility supports computing through several modalities: 1) high-performance computing and workflow management on an on-premise HTCondor cluster and a cloud-based (Amazon Web Services) auto-scaling cluster; 2) virtualized applications using machine- and container-level virtualization; 3) web/database applications and support. Currently, the on-premise infrastructure contains over 1300 processors and 2800 graphics processors, more than 8 terabytes of memory, and a single, high-performance 987 terabyte storage area network. These resources are shared in a managed, multi-user environment and communicate via a 10 gigabit ethernet network. Management of the system is simplified through virtualization of key services, which also allows for the deployment of diverse applications and platforms simultaneously. Additionally, the cloud-based infrastructure can be utilized for computing and storage needs that exceed on-premise resources or capabilities, such as for high-performance, large-scale GPU computing.

Services offered by the Data Science Facility include 1) user services: authentication services/user accounts, software installation, patches and upgrades, troubleshooting, advising, Slack (virtual help desk), GitHub (version control), training (system usage, specific software, workflows), documentation, and outreach; 2) computing: cluster resources, web server hosting, database server hosting, maintenance/upgrades, system monitoring, and virtual machine and container management; 3) storage: monitoring, performance configuration, and maintenance. Additionally, the facility consults on the development of computational, data analysis, and experimental design components of proposals and assists with editing of computational and statistical analysis sections of manuscripts. The core facility also offers analysis services, ranging from whole project consulting to individual analyses.

Intellectual development is offered by members of the facility through regular workshops and training events, custom application development for lab or group projects, and community-based sharing of software, ideas, and methods. In addition, the facility enhances interaction between groups at the center and partner institutions and facilitates interoperation between local computing and storage resources and public/private cloud/cyberinfrastructures such as Amazon Web Services, CyVerse, and Open Science Grid.

Equipment:

  • 1,300 processors
    • 10 gigabit Ethernet network
      • 2,800 graphics processors
        • 8TB memory
          • Cloud-based (Amazon Web Services) auto-scaling cluser
            • CyVerse
              • HTCondor cluster
                • Open Science Grid
                  • Single, high-performance 987 TB storage area network

                    Inquiry Form:

                    Hidden