Infrastructure Transition Plan 2014

'''This is a draft document for the purposes of collaborative planning of the new server transition. This notice will be removed once SAC has determined it's final course of action.'''

= Background =

Current Physical Machines hosted at Oregon State University's Open Source Lab (OSL) are entering the latter part of their life expectancy. With the recent replacement of hard drives and raid card batteries effecting performance it's time to start planning for the next 3-5 years of computing needs. We have a recently acquired large backup machine at OSL, 9TB usable space. OSGeo1 at Peer will be off as of May 2014.

Past Performance
Current hardware has for the most part met the original stated goals of hosting websites for projects, issue tracking, version control, and mailing lists. Uptime has been generally good, performance occasionally not so good when things aren't configured right (open proxy, excessive WMS requests, large numbers of 404 from bots). Most services were not configured with redundancy as small amounts of downtime were deemed acceptable, which may no longer be the case.

Our biggest dilemma has been lack of people power. We currently only have about 4-5 people who partake in core system administration. Several other people kindly manage Nabble (mail archive hosting), and some other external resources. Ideas on how to balance the workload and recruit more help is important to being able to keep the systems running.

Future Needs

 * Build services
 * More projects are using static websites built from version control, primarily with Sphinx.
 * Some projects have expressed interest in continuous integration services.
 * There's a renewed interest in global mirroring or GeoCDN type setup for redundancy and speed. Something similar to OSM, or maybe even swapping space with OSM.
 * More redundancy to increase uptime of important websites.
 * Separate web serving from other operations
 * Long term archive of Foss4g sites

Projects, please list specific needs you would like met.


 * avoid downtime longer that some hours, may require automatic mirror and failover setups (projects: GRASS GIS, ... others...?)

= Ideas =


 * Buy new hardware
 * Possibly use SSDs
 * or, pick hard disks that will last for years over faster things that need more careful maintenance
 * Take advantage of various free hosting
 * e.g. Readthedocs.org
 * Pay for external hosting
 * github pro
 * hetzner (QGIS is currently renting a server)
 * bluehost
 * digitalocean
 * rackspace
 * linode
 * etc...
 * Pool resource with Projects that have bigger budgets
 * Provide consistent, baseline setup in general, then implement faster hardware configurations where there is time, attention and resources to do so
 * Leverage Category:ICA OSGeo Lab Network for hosting nodes
 * Funded sysadmin time? (Short of outsourcing everything, can we find ways to avoid relying solely on volunteer time to support the infrastructure? How can we handle funded sysadmin time in a fair way vs volunteer contributions? Are there good examples to follow in other non-profit orgs?)
 * Have a "fire crew" on alert throughout the entire day, 24x7
 * at least one "fire crew" member (ready to handle any emergency SAC issues) on alert at any time in the day, which includes one in a North American timezone, one in a European timezone, and one in an Asia/Pacific timezone
 * "fire crew" positions should be funded/paid
 * "fire crew" schedule and contact info should be made available publicly, so at any given time issues can be brought to the fire crew member on alert