Infrastructure Transition Plan 2010

'''This is a draft document for the purposes of collaborative planning of the new server transition. This notice will be removed once SAC has determined it's final course of action. ''' = Background = SAC and the board have allocated a budget to purchase new server machines. These new servers have been specified, quoted and will be ordered soon. They will be physically hosted by the Open Source Lab (OSL) and the main host OS on which virtual machines will be managed in part by OSL. We will continue use of current Telescience blades but plan to discontinue use of PEER1 services for osgeo1 and osgeo2 once all services have been migrated.

= New Hardware =

osl1

 * 2x 4 core 2.5 Ghz cpu
 * 6x 146 GB 15K rpm, 3GB/s hard drives in RAID 5 configuration.
 * 48 GB of RAM
 * Dual NIC ethernet

osl2

 * 2x 4 core 2.5 Ghz cpu
 * 6x 300 GB 15K rpm, 6GB/s hard drives in RAID 6 configuration.
 * 48 GB of RAM
 * Dual NIC ethernet

= Resource Allocation = The plan includes running virtual machines on the new machines. OSL has suggested KVM as that's their preferred vm solution and they could provide support. OSL plans to install ganeti to manage the virtual machines - it allows things like live moving of VMs between machines, scaling of RAM, running VM creation/installation scripts, vnc connection to guests(in case ssh is down), etc...

Ideas
Each line should be a suggested virtual machine(VM) (or in the case of Telescience 1 blade). There are lots of possible scenarios but this list will try to capture the most common options (expect the final selection to be a subset).

One alternative is to simply give each service/project it's own virtual machine(VM), this may make administration easier(for security) or harder (for backup, general management) and may not use resources efficiently. For example if there were more than 12 VMs on any one machine they would each have at most 4GB of RAM. By pooling some services that use the same infrastructure we could essentially balance 16GB of RAM across 4 sites, assuming that heavy loads occur only occasionally any one of the 4 sites could potentially use the 16GB as needed and would be unlikely to conflict with the other 3.

osl1

 * Trac/SVN with or without Postgres
 * Apache/PHP (Drupal + Mediawiki)(with or without MySQL + Postgres)
 * LAMP (Drupal + MySQL)
 * LAPP (MediaWiki+Postgres)
 * Mysql
 * Postgres
 * LDAP (On whichever machine is least likely to have I/O conflicts)
 * Apache/Joomla (with or without MySQL)

osl2

 * LDAP
 * Postfix/Mailman
 * Download (If each project had it's own vm would their downloads be in that vm or in a shared download server?)
 * Local Backup
 * Lower load project websites

Telescience Blades

 * Lower load project websites
 * Buildbot
 * Offsite Backup
 * Download Mirror

Telescience Blades
= Migration Plan & Schedule =

Priority

 * 1) DNS
 * 2) Trac/SVN
 * 3) Migrate osgeo2

Schedule
(All dates are approximate, alternative schedule suggestions welcome)
 * Order - Feb 10,2010
 * General Plan - Feb 26, 2010
 * Physical Installation - Feb/March 2010
 * Specific Plan - March 5, 2010
 * Software Setup(Start) - March 8, 2010
 * Migration - March 2010

= TODO: List =
 * Create a base virtual machine image for all new VMs
 * Upgrade Telescience blade OS (May require service shuffle rotation or downtime)
 * Contingency plan for unexpected hardware failure

= Questions to ask OSL/Ourselves =
 * Can ram be increased/decreased live?
 * Can ram be increased/decreased via a web interface live or with power cycle?
 * Is it easy to move VMs between the machines? Via web interface?
 * Should the LDAP be hosted on one of the Host OS' for reliability?
 * Would LVM snapshot backups of virtual machines be a viable backup method?
 * Define our base VM: (OSL does not recommend gentoo, though that is what they use as the base KVM host)
 * Choose a standard: Debian Stable, Ubuntu LTS, Centos ... (Does it need to implement SELinux or is that overkill?)
 * ext4 formatting
 * 32bit vs 64bit - in some cases smaller VMs with only 2 GB etc could perform better with 32 bit
 * default HD size? - remember to leave lots of room for /var, logs and database dumps even if there's not much in the VM