This 5-day in-person workshop will provide researchers with an introduction to advanced topics in computationally reproducible research in Python, including software and techniques for working with very large datasets.

This includes working in cloud computing environments, docker containers, and parallel processing using tools like parsl and dask. The workshop will also cover concrete methods for documenting and uploading data to the Arctic Data Center, advanced approaches to tracking data provenance, responsible research and data management practices including data sovereignty and the CARE principles, and ethical concerns with data-intensive modeling and analysis.

Topics include:

  • Scalable computing
  • Cloud computing concepts
  • Docker environments
  • Remote computing
  • Parallel processing and concurrency
  • Large data transfer, data staging
  • Data extraction, I/O efficiency