Difference between revisions of "Infrastructure Transition Plan 2014"

From OSGeo
Jump to navigation Jump to search
Line 33: Line 33:
 
== Hardware ==
 
== Hardware ==
  
* Buy new hardware ($3000-$5000 USD)
+
* Buy new hardware (1U 8 drive $3000-$5000 USD, 1U 6 drive $2000-3000 USD)
 
** Possibly use SSDs
 
** Possibly use SSDs
 
** or, pick hard disks that will last for years over faster things that need more careful maintenance
 
** or, pick hard disks that will last for years over faster things that need more careful maintenance

Revision as of 11:15, 28 May 2014

This is a draft document for the purposes of collaborative planning of the new server transition. This notice will be removed once SAC has determined it's final course of action.

Background

Current Physical Machines hosted at Oregon State University's Open Source Lab (OSL) are entering the latter part of their life expectancy. With the recent replacement of hard drives and raid card batteries effecting performance it's time to start planning for the next 3-5 years of computing needs. We have a recently acquired large backup machine at OSL, 9TB usable space. OSGeo1 at Peer will be off as of May 2014.

Past Performance

Current hardware has for the most part met the original stated goals of hosting websites for projects, issue tracking, version control, and mailing lists. Uptime has been generally good, performance occasionally not so good when things aren't configured right (open proxy, excessive WMS requests, large numbers of 404 from bots). Most services were not configured with redundancy as small amounts of downtime were deemed acceptable, which may no longer be the case.

Our biggest dilemma has been lack of people power. We currently only have about 4-5 people who partake in core system administration. Several other people kindly manage Nabble (mail archive hosting), and some other external resources. Ideas on how to balance the workload and recruit more help is important to being able to keep the systems running.

Future Needs

  • Build services
    • More projects are using static websites built from version control, primarily with Sphinx.
    • Some projects have expressed interest in continuous integration services.
  • There's a renewed interest in global mirroring or GeoCDN type setup for redundancy and speed. Something similar to OSM, or maybe even swapping space with OSM.
  • More redundancy to increase uptime of important websites.
  • Separate web serving from other operations
  • Long term archive of Foss4g sites


Projects, please list specific needs you would like met.

  • avoid downtime longer that some hours, may require automatic mirror and failover setups (projects: GRASS GIS, ... others...?)
  • A Document Management System for storing Contributor Licensing Agreements for projects.\
  • Support for git & github sync
  • More isolation so projects don't take each other out when one is misconfigured or has extremely heavy usage.

Ideas

Hardware

  • Buy new hardware (1U 8 drive $3000-$5000 USD, 1U 6 drive $2000-3000 USD)
    • Possibly use SSDs
    • or, pick hard disks that will last for years over faster things that need more careful maintenance
  • Take advantage of various free hosting
  • Pay for external hosting
    • github pro
    • hetzner (QGIS is currently renting a server)
    • bluehost
    • digitalocean
    • rackspace
    • linode
    • etc...
  • Pool resource with Projects that have bigger budgets
  • Leverage Category:ICA OSGeo Lab Network for hosting nodes

VM configuration

  • Provide consistent, baseline setup in general, then implement faster hardware configurations where there is time, attention and resources to do so
    • Puppet, Chef, Juju, etc...

Admin crew

  • Funded sysadmin time? (Short of outsourcing everything, can we find ways to avoid relying solely on volunteer time to support the infrastructure? How can we handle funded sysadmin time in a fair way vs volunteer contributions? Are there good examples to follow in other non-profit orgs?)
  • Have a "fire crew" on alert throughout the entire day, 24x7
    • at least one "fire crew" member (ready to handle any emergency SAC issues) on alert at any time in the day, which includes one in a North American timezone, one in a European timezone, and one in an Asia/Pacific timezone
    • "fire crew" positions should be funded/paid
    • "fire crew" schedule and contact info should be made available publicly, so at any given time issues can be brought to the fire crew member on alert

Mirrors

  • Mirrors or Distributed services (These people have offered to host some services or mirrors) (the GRASS GIS project has a running mirror system)
  • CDN
    • Cloudflare

Storage

  • Dedicated Disk Storage
    • Put all files into networked storage mounted via NFS (or something similar)
    • GlusterFS (supports georeplication)
    • Use XFS (supported in Linux for a long time), ZFS, or something else good with lots of small files.