Infrastructure Transition Plan 2010

'''This is a draft document for the purposes of collaborative planning of the new server transition. This notice will be removed once SAC has determined it's final course of action.'''

= Background =

SAC and the board have allocated a budget to purchase new server machines. These new servers have been specified, quoted and ordered. Delivery by Feb 22, 2010. They will be physically hosted by the Open Source Lab (OSL) and the main host OS on which virtual machines will be managed in part by OSL. We will continue use of current Telescience blades but plan to discontinue use of PEER1 services for osgeo1 and osgeo2 once all services have been migrated.

= New Hardware =

osl1(osgeo3)

 * 2x 4 core 2.5 Ghz cpu
 * 6x 146 GB 15K rpm, 3GB/s hard drives in RAID 5 configuration.
 * 48 GB of RAM
 * Dual NIC ethernet

osl2(osgeo4)

 * 2x 4 core 2.5 Ghz cpu
 * 6x 300 GB 15K rpm, 6GB/s hard drives in RAID 6 configuration.
 * 48 GB of RAM
 * Dual NIC ethernet

= Resource Allocation =

The plan includes running virtual machines on the new machines. OSL has suggested KVM as that's their preferred vm solution and they could provide support. OSL plans to install ganeti to manage the virtual machines - it allows things like live moving of VMs between machines, scaling of RAM, running VM creation/installation scripts, vnc connection to guests(in case ssh is down), etc...

Ideas(Virtual Machines)
Each line should be a suggested virtual machine(VM) (or in the case of Telescience 1 blade). There are lots of possible scenarios but this list will try to capture the most common options (expect the final selection to be a subset).

One alternative is to simply give each service/project it's own virtual machine(VM), this may make administration easier(for security) or harder (for backup, general management) and may not use resources efficiently. For example if there were more than 12 VMs on any one machine they would each have at most 4GB of RAM. By pooling some services that use the same infrastructure we could essentially balance 16GB of RAM across 4 sites, assuming that heavy loads occur only occasionally any one of the 4 sites could potentially use the 16GB as needed and would be unlikely to conflict with the other 3.

osl1

 * Trac/SVN with orwithout Postgres - Trac from source
 * Apache/PHP (Drupal + Mediawiki )(with or without MySQL + Postgres )
 * LAMP (Drupal + MySQL)
 * www.osgeo.org
 * mapguide.osgeo.org
 * fdo.osgeo.org
 * LAPP (MediaWiki+Postgres)
 * wiki.osgeo.org
 * Mysql
 * Postgres
 * Secure VM
 * LDAP
 * LDAP Python admin scripts.
 * Secure admin notes for OSGeo admins
 * not* using LDAP for logins.

osl2

 * Postfix/Mailman
 * download.osgeo.org mirror (rsynced from telascience)
 * Local Backup
 * QGIS VM (Apache/Joomla + MySQL)
 * qgis.org joomla site
 * GRASS VM
 * grass web site (static from svn)
 * grass wiki (mediawiki on mysql)
 * automated linux builds (for binary distribution)
 * Lower load project websites (hosted on xblade14 now - relatively low priority to migrate)
 * mapserver.org
 * gdal.org
 * geotools.org

Telescience Blades

 * Lower load project websites
 * Buildbot slaves
 * Offsite Backup
 * download.osgeo.org

Telescience Blades
= Base Image =


 * Debian Stable 64bit + Backports
 * 10 GB HD (This is the default set by OSL, we can request a different size and the images can always be grown)
 * ? GB RAM
 * 64 bit
 * Standard partitioning /boot, swap, / (This is OSL default for backup and management purposes, we can request something different.)
 * ext3 (Currently investigating ext4 instead)

Package List
Policy: Install from packages unless exception agreed on by SAC

Standard Packages

 * Open-ssh server

Selective Packages

 * Apache
 * Php (Apache by default should be the non-php builds, except for the servers that require php)
 * MySQL
 * Postgresql
 * SVN
 * Postfix
 * Mailman

Source Exceptions Packages that will be installed from source in order to obtain specific version and customizations.


 * Trac (mod_wsgi? or mod_python?)

= Migration Plan &amp; Schedule =

Priority

 * 1) Migrate osgeo2 (qgis.org joomla site, wiki.osgeo.org, backups, moodle? ocs? wiktionary? fossgis wiki? community.osgeo.org? planet? )
 * 2) Trac/SVN

Schedule
(All dates are approximate, alternative schedule suggestions welcome)


 * Order - Feb 10,2010
 * General Plan - Feb 26, 2010
 * Physical Installation - Feb 22-March ? 2010
 * Specific Plan - March 5, 2010
 * Software Setup(Start) - March 8, 2010
 * Migration - March 2010

= TODO: List =


 * Create a base virtual machine image for all new VMs - OSL will do this for us.
 * Naming scheme for virtual machines.
 * Upgrade Telescience blade OS (May require service shuffle rotation or downtime)
 * Contingency plan for unexpected hardware failure

= Questions to ask OSL/Ourselves =


 * Can ram be increased/decreased live? No
 * Can ram be increased/decreased via a web interface live or with power cycle?With power cycle via Ganeti cli
 * Is it easy to move VMs between the machines? Yes, using Ganeit cli
 * Should the LDAP be hosted on one of the Host OS' for reliability?
 * Would LVM snapshot backups of virtual machines be a viable backup method? Should be doable, still needs some testing.
 * Define our base VM: (OSL does not recommend gentoo, though that is what they use as the base KVM host)
 * Choose a standard: Debian Stable + backports, Ubuntu LTS, Centos ... (Does it need to implement SELinux or is that overkill?)
 * ext4 formatting? OSL still testing that backup and management tools work with ext4, otherwise ext3
 * 32bit vs64bit - in some cases smaller VMs with only 2 GB etc could perform better with 32 bit :64 bit
 * default HD size? - remember to leave lots of room for /var, logs and database dumps even if there's not much in the VM : 10 GB
 * How much ram should we reserve for the host OS?
 * Naming of the Virtual Machines?
 * Latitude, Longitude, Northing, Easting, Parallels, etc.
 * Mercator, Albers, Robinson, Sinusodial, etc.
 * wiki, mail, web, ldap, etc.
 * vm1, vm2, vm3, etc.