Difference between revisions of "Infrastructure Transition Plan 2010"

From OSGeo
Jump to navigation Jump to search
m (→‎Background: ordered)
(updates from mailing list discussion)
Line 1: Line 1:
'''This is a draft document for the purposes of collaborative planning of the new server transition. This notice will be removed once SAC has determined it's final course of action.
+
'''This is a draft document for the purposes of collaborative planning of the new server transition. This notice will be removed once SAC has determined it's final course of action.'''  
'''
 
= Background =
 
[[SAC]] and the board have allocated a budget to purchase new server machines. These new servers have been specified, quoted and ordered. Delivery by Feb 22, 2010. They will be physically hosted by the Open Source Lab (OSL) and the main host OS on which virtual machines will be managed in part by OSL. We will continue use of current Telescience blades but plan to discontinue use of PEER1 services for osgeo1 and osgeo2 once all services have been migrated.
 
  
= New Hardware =
+
= Background  =
== osl1 ==
 
* 2x 4 core 2.5 Ghz cpu
 
* 6x 146 GB 15K rpm, 3GB/s hard drives in RAID 5 configuration.
 
* 48 GB of RAM
 
* Dual NIC ethernet
 
  
== osl2 ==
+
[[SAC]] and the board have allocated a budget to purchase new server machines. These new servers have been specified, quoted and ordered. Delivery by Feb 22, 2010. They will be physically hosted by the Open Source Lab (OSL) and the main host OS on which virtual machines will be managed in part by OSL. We will continue use of current Telescience blades but plan to discontinue use of PEER1 services for osgeo1 and osgeo2 once all services have been migrated.
* 2x 4 core 2.5 Ghz cpu
 
* 6x 300 GB 15K rpm, 6GB/s hard drives in RAID 6 configuration.
 
* 48 GB of RAM
 
* Dual NIC ethernet
 
  
= Resource Allocation =
+
= New Hardware  =
The plan includes running virtual machines on the new machines. OSL has suggested KVM as that's their preferred vm solution and they could provide support.
 
OSL plans to install [http://code.google.com/p/ganeti/ ganeti] to manage the virtual machines - it allows things like live moving of VMs between machines, scaling of RAM, running VM creation/installation scripts, vnc connection to guests(in case ssh is down), etc...
 
  
== Ideas ==
+
== osl1(osgeo3)  ==
Each line should be a suggested virtual machine(VM) (or in the case of Telescience 1 blade). There are lots of possible scenarios but this list will try to capture the most common options (expect the final selection to be a subset).
+
 
 +
*2x 4 core 2.5 Ghz cpu
 +
*6x 146 GB 15K rpm, 3GB/s hard drives in RAID 5 configuration.
 +
*48 GB of RAM
 +
*Dual NIC ethernet
 +
 
 +
== osl2(osgeo4)  ==
 +
 
 +
*2x 4 core 2.5 Ghz cpu
 +
*6x 300 GB 15K rpm, 6GB/s hard drives in RAID 6 configuration.
 +
*48 GB of RAM
 +
*Dual NIC ethernet
 +
 
 +
= Resource Allocation  =
 +
 
 +
The plan includes running virtual machines on the new machines. OSL has suggested KVM as that's their preferred vm solution and they could provide support. OSL plans to install [http://code.google.com/p/ganeti/ ganeti] to manage the virtual machines - it allows things like live moving of VMs between machines, scaling of RAM, running VM creation/installation scripts, vnc connection to guests(in case ssh is down), etc...
 +
 
 +
== Ideas(Virtual Machines)<br>  ==
 +
 
 +
Each line should be a suggested virtual machine(VM) (or in the case of Telescience 1 blade). There are lots of possible scenarios but this list will try to capture the most common options (expect the final selection to be a subset).  
  
 
One alternative is to simply give each service/project it's own virtual machine(VM), this may make administration easier(for security) or harder (for backup, general management) and may not use resources efficiently. For example if there were more than 12 VMs on any one machine they would each have at most 4GB of RAM. By pooling some services that use the same infrastructure we could essentially balance 16GB of RAM across 4 sites, assuming that heavy loads occur only occasionally any one of the 4 sites could potentially use the 16GB as needed and would be unlikely to conflict with the other 3.  
 
One alternative is to simply give each service/project it's own virtual machine(VM), this may make administration easier(for security) or harder (for backup, general management) and may not use resources efficiently. For example if there were more than 12 VMs on any one machine they would each have at most 4GB of RAM. By pooling some services that use the same infrastructure we could essentially balance 16GB of RAM across 4 sites, assuming that heavy loads occur only occasionally any one of the 4 sites could potentially use the 16GB as needed and would be unlikely to conflict with the other 3.  
Line 28: Line 33:
 
=== osl1  ===
 
=== osl1  ===
  
*Trac/SVN with or<strike>without </strike>Postgres - Trac from source  
+
*Trac/SVN with <strike>orwithout </strike>Postgres - Trac from source  
*Apache/PHP (Drupal +<strike>Mediawiki</strike>)(with or <strike>without</strike> MySQL + <strike>Postgres</strike>)  
+
*Apache/PHP (Drupal +<strike>Mediawiki</strike>)(with <strike>or without</strike> MySQL + <strike>Postgres</strike>)  
**LAMP (Drupal + MySQL)
+
**LAMP (Drupal + MySQL)  
*** www.osgeo.org
+
***www.osgeo.org  
*** mapguide.osgeo.org
+
***mapguide.osgeo.org  
*** fdo.osgeo.org
+
***fdo.osgeo.org  
 
**LAPP (MediaWiki+Postgres)  
 
**LAPP (MediaWiki+Postgres)  
*<strike>Mysql</strike>
+
***wiki.osgeo.org
*<strike>Postgres</strike>
+
*<strike>Mysql</strike>  
* Secure VM
+
*<strike>Postgres</strike>  
** LDAP  
+
*Secure VM  
** LDAP Python admin scripts.
+
**LDAP  
** Secure admin notes for OSGeo admins
+
**LDAP Python admin scripts.  
** *not* using LDAP for logins.  
+
**Secure admin notes for OSGeo admins  
*Apache/Joomla (with or without MySQL) (What is this VM for?)
+
***not* using LDAP for logins.
 +
 
 +
=== osl2  ===
 +
 
 +
*Postfix/Mailman
 +
*download.osgeo.org mirror (rsynced from telascience)
 +
*Local Backup
 +
*QGIS VM (Apache/Joomla + MySQL)<br>
 +
**qgis.org joomla site
 +
*GRASS VM
 +
**grass web site (static from svn)
 +
**grass wiki (mediawiki on mysql)
 +
**automated linux builds (for binary distribution)
 +
*Lower load project websites (hosted on xblade14 now - relatively low priority to migrate)
 +
**mapserver.org
 +
**gdal.org
 +
**geotools.org
 +
 
 +
=== Telescience Blades  ===
 +
 
 +
*Lower load project websites
 +
*Buildbot slaves
 +
*Offsite Backup
 +
*download.osgeo.org
 +
 
 +
== Final Plan  ==
 +
 
 +
=== osl1  ===
 +
 
 +
=== osl2  ===
 +
 
 +
=== Telescience Blades  ===
 +
 
 +
= Base Image  =
 +
 
 +
*Debian Stable 64bit + Backports
 +
*10 GB HD (This is the default set by OSL, we can request a different size and the images can always be grown)  
 +
*&nbsp;? GB RAM
 +
*64 bit
 +
*Standard partitioning /boot, swap, / (This is OSL&nbsp;default for backup and management purposes, we can request something different.)
 +
*ext3 (Currently investigating ext4 instead)
 +
 
 +
== Package List  ==
 +
 
 +
Policy: Install from packages unless exception agreed on by SAC
 +
 
 +
=== Standard Packages  ===
 +
 
 +
*Open-ssh server
 +
 
 +
=== Selective Packages  ===
 +
 
 +
*Apache
 +
*Php (Apache by default should be the non-php builds, except for the servers that require php)
 +
*MySQL
 +
*Postgresql
 +
*SVN
 +
*Postfix
 +
*Mailman
 +
 
 +
'''Source Exceptions''' Packages that will be installed from source in order to obtain specific version and customizations.
 +
 
 +
*Trac (mod_wsgi? or mod_python?)
 +
 
 +
= Migration Plan &amp; Schedule  =
  
=== osl2 ===
+
== Priority  ==
* Postfix/Mailman
 
* download.osgeo.org mirror (rsynced from telascience)
 
* Local Backup
 
* QGIS VM
 
** qgis.org joomla site
 
* GRASS VM
 
** grass web site (static from svn)
 
** grass wiki (mediawiki on mysql)
 
** automated linux builds (for binary distribution)
 
* Lower load project websites (hosted on xblade14 now - relatively low priority to migrate)
 
** mapserver.org
 
** gdal.org
 
** geotools.org
 
  
=== Telescience Blades ===
+
#Migrate osgeo2 (qgis.org joomla site, wiki.osgeo.org, backups, moodle? ocs? wiktionary? fossgis wiki? community.osgeo.org? planet? )
* Lower load project websites
+
#Trac/SVN
* Buildbot slaves
+
#
* Offsite Backup
 
* download.osgeo.org
 
  
== Final Plan ==
+
== Schedule  ==
=== osl1 ===
 
=== osl2 ===
 
=== Telescience Blades ===
 
  
= Base Image =
+
(All dates are approximate, alternative schedule suggestions welcome)
* Debian Stable + Backports
 
* GB HD
 
* GB RAM
 
* x bit
 
  
== Package List ==
+
*Order - Feb 10,2010
Policy: Install from packages unless exception agreed on by SAC
+
*General Plan - Feb 26, 2010
* Apache
+
*Physical Installation - Feb 22-March&nbsp;? 2010
* Open-ssh server
+
*Specific Plan - March 5, 2010
 +
*Software Setup(Start) - March 8, 2010
 +
*Migration - March 2010
  
'''Source Exceptions'''
+
= TODO: List  =
* Trac
 
  
= Migration Plan & Schedule =
+
*<strike>Create a base virtual machine image for all new VMs</strike> - OSL will do this for us.
== Priority ==
+
*Naming scheme for virtual machines.
# DNS
+
*Upgrade Telescience blade OS (May require service shuffle rotation or downtime)  
# Trac/SVN
+
*Contingency plan for unexpected hardware failure
# Migrate osgeo2 (qgis.org joomla site, wiki.osgeo.org, backups, moodle? ocs? wiktionary? fossgis wiki? community.osgeo.org? planet? )
 
  
== Schedule ==
+
= Questions to ask OSL/Ourselves  =
(All dates are approximate, alternative schedule suggestions welcome)
 
* Order - Feb 10,2010
 
* General Plan - Feb 26, 2010
 
* Physical Installation - Feb/March 2010
 
* Specific Plan - March 5, 2010
 
* Software Setup(Start) - March 8, 2010
 
* Migration - March 2010
 
  
= TODO: List =
+
*Can ram be increased/decreased live? No
* Create a base virtual machine image for all new VMs
+
**Can ram be increased/decreased via a web interface live or with power cycle?With power cycle via [[Ganeti]] cli<br>
* Upgrade Telescience blade OS (May require service shuffle rotation or downtime)
+
*Is it easy to move VMs between the machines? Yes, using [[Ganeit]] cli
* Contingency plan for unexpected hardware failure
+
*Should the LDAP be hosted on one of the Host OS' for reliability?
 +
*Would LVM snapshot backups of virtual machines be a viable backup method? Should be doable, still needs some testing.
 +
*Define our base VM: (OSL does not recommend gentoo, though that is what they use as the base KVM host)
 +
**Choose a standard: '''Debian Stable + backports''', Ubuntu LTS, Centos ... (Does it need to implement SELinux or is that overkill?)  
 +
***ext4 formatting? OSL still testing that backup and management tools work with ext4, otherwise ext3
 +
***32bit vs'''64bit''' - in some cases smaller VMs with only 2 GB etc could perform better with 32 bit&nbsp;:'''64 bit'''
 +
***default HD size? - remember to leave lots of room for /var, logs and database dumps even if there's not much in the VM&nbsp;: '''10 GB'''
 +
*How much ram should we reserve for the host OS?
 +
*Naming of the Virtual Machines?
 +
**Latitude, Longitude, Northing, Easting, Parallels, etc.
 +
**Mercator, Albers, Robinson, Sinusodial, etc.
 +
**wiki, mail, web, ldap, etc.
 +
**vm1, vm2, vm3, etc.
  
= Questions to ask OSL/Ourselves =
 
* Can ram be increased/decreased live?
 
** Can ram be increased/decreased via a web interface live or with power cycle?
 
* Is it easy to move VMs between the machines? Via web interface?
 
* Should the LDAP be hosted on one of the Host OS' for reliability?
 
* Would LVM snapshot backups of virtual machines be a viable backup method?
 
* Define our base VM: (OSL does not recommend gentoo, though that is what they use as the base KVM host)
 
** Choose a standard: '''Debian Stable + backports''', Ubuntu LTS, Centos ... (Does it need to implement SELinux or is that overkill?)
 
*** ext4 formatting
 
*** 32bit vs 64bit - in some cases smaller VMs with only 2 GB etc could perform better with 32 bit
 
*** default HD size? - remember to leave lots of room for /var, logs and database dumps even if there's not much in the VM
 
 
[[Category:Infrastructure]]
 
[[Category:Infrastructure]]

Revision as of 12:55, 20 February 2010

This is a draft document for the purposes of collaborative planning of the new server transition. This notice will be removed once SAC has determined it's final course of action.

Background

SAC and the board have allocated a budget to purchase new server machines. These new servers have been specified, quoted and ordered. Delivery by Feb 22, 2010. They will be physically hosted by the Open Source Lab (OSL) and the main host OS on which virtual machines will be managed in part by OSL. We will continue use of current Telescience blades but plan to discontinue use of PEER1 services for osgeo1 and osgeo2 once all services have been migrated.

New Hardware

osl1(osgeo3)

  • 2x 4 core 2.5 Ghz cpu
  • 6x 146 GB 15K rpm, 3GB/s hard drives in RAID 5 configuration.
  • 48 GB of RAM
  • Dual NIC ethernet

osl2(osgeo4)

  • 2x 4 core 2.5 Ghz cpu
  • 6x 300 GB 15K rpm, 6GB/s hard drives in RAID 6 configuration.
  • 48 GB of RAM
  • Dual NIC ethernet

Resource Allocation

The plan includes running virtual machines on the new machines. OSL has suggested KVM as that's their preferred vm solution and they could provide support. OSL plans to install ganeti to manage the virtual machines - it allows things like live moving of VMs between machines, scaling of RAM, running VM creation/installation scripts, vnc connection to guests(in case ssh is down), etc...

Ideas(Virtual Machines)

Each line should be a suggested virtual machine(VM) (or in the case of Telescience 1 blade). There are lots of possible scenarios but this list will try to capture the most common options (expect the final selection to be a subset).

One alternative is to simply give each service/project it's own virtual machine(VM), this may make administration easier(for security) or harder (for backup, general management) and may not use resources efficiently. For example if there were more than 12 VMs on any one machine they would each have at most 4GB of RAM. By pooling some services that use the same infrastructure we could essentially balance 16GB of RAM across 4 sites, assuming that heavy loads occur only occasionally any one of the 4 sites could potentially use the 16GB as needed and would be unlikely to conflict with the other 3.

osl1

  • Trac/SVN with orwithout Postgres - Trac from source
  • Apache/PHP (Drupal +Mediawiki)(with or without MySQL + Postgres)
    • LAMP (Drupal + MySQL)
      • www.osgeo.org
      • mapguide.osgeo.org
      • fdo.osgeo.org
    • LAPP (MediaWiki+Postgres)
      • wiki.osgeo.org
  • Mysql
  • Postgres
  • Secure VM
    • LDAP
    • LDAP Python admin scripts.
    • Secure admin notes for OSGeo admins
      • not* using LDAP for logins.

osl2

  • Postfix/Mailman
  • download.osgeo.org mirror (rsynced from telascience)
  • Local Backup
  • QGIS VM (Apache/Joomla + MySQL)
    • qgis.org joomla site
  • GRASS VM
    • grass web site (static from svn)
    • grass wiki (mediawiki on mysql)
    • automated linux builds (for binary distribution)
  • Lower load project websites (hosted on xblade14 now - relatively low priority to migrate)
    • mapserver.org
    • gdal.org
    • geotools.org

Telescience Blades

  • Lower load project websites
  • Buildbot slaves
  • Offsite Backup
  • download.osgeo.org

Final Plan

osl1

osl2

Telescience Blades

Base Image

  • Debian Stable 64bit + Backports
  • 10 GB HD (This is the default set by OSL, we can request a different size and the images can always be grown)
  •  ? GB RAM
  • 64 bit
  • Standard partitioning /boot, swap, / (This is OSL default for backup and management purposes, we can request something different.)
  • ext3 (Currently investigating ext4 instead)

Package List

Policy: Install from packages unless exception agreed on by SAC

Standard Packages

  • Open-ssh server

Selective Packages

  • Apache
  • Php (Apache by default should be the non-php builds, except for the servers that require php)
  • MySQL
  • Postgresql
  • SVN
  • Postfix
  • Mailman

Source Exceptions Packages that will be installed from source in order to obtain specific version and customizations.

  • Trac (mod_wsgi? or mod_python?)

Migration Plan & Schedule

Priority

  1. Migrate osgeo2 (qgis.org joomla site, wiki.osgeo.org, backups, moodle? ocs? wiktionary? fossgis wiki? community.osgeo.org? planet? )
  2. Trac/SVN

Schedule

(All dates are approximate, alternative schedule suggestions welcome)

  • Order - Feb 10,2010
  • General Plan - Feb 26, 2010
  • Physical Installation - Feb 22-March ? 2010
  • Specific Plan - March 5, 2010
  • Software Setup(Start) - March 8, 2010
  • Migration - March 2010

TODO: List

  • Create a base virtual machine image for all new VMs - OSL will do this for us.
  • Naming scheme for virtual machines.
  • Upgrade Telescience blade OS (May require service shuffle rotation or downtime)
  • Contingency plan for unexpected hardware failure

Questions to ask OSL/Ourselves

  • Can ram be increased/decreased live? No
    • Can ram be increased/decreased via a web interface live or with power cycle?With power cycle via Ganeti cli
  • Is it easy to move VMs between the machines? Yes, using Ganeit cli
  • Should the LDAP be hosted on one of the Host OS' for reliability?
  • Would LVM snapshot backups of virtual machines be a viable backup method? Should be doable, still needs some testing.
  • Define our base VM: (OSL does not recommend gentoo, though that is what they use as the base KVM host)
    • Choose a standard: Debian Stable + backports, Ubuntu LTS, Centos ... (Does it need to implement SELinux or is that overkill?)
      • ext4 formatting? OSL still testing that backup and management tools work with ext4, otherwise ext3
      • 32bit vs64bit - in some cases smaller VMs with only 2 GB etc could perform better with 32 bit :64 bit
      • default HD size? - remember to leave lots of room for /var, logs and database dumps even if there's not much in the VM : 10 GB
  • How much ram should we reserve for the host OS?
  • Naming of the Virtual Machines?
    • Latitude, Longitude, Northing, Easting, Parallels, etc.
    • Mercator, Albers, Robinson, Sinusodial, etc.
    • wiki, mail, web, ldap, etc.
    • vm1, vm2, vm3, etc.