Difference between revisions of "SAC:Admin and Troubleshooting"

From OSGeo
Jump to navigation Jump to search
(+VM hanging on OSUOSL)
Line 4: Line 4:
 
* Discuss issues in irc (#telascience)
 
* Discuss issues in irc (#telascience)
  
== NFS Mount Missing ==  
+
== VM hanging on OSUOSL ==
  
For some reason, the /home mount on the various servers is often lost.
+
see [[OSL]] for how to open a ticket with OSUOSL's support
 
 
To fix...
 
  
 
== LDAP Server Down ==
 
== LDAP Server Down ==
Line 26: Line 24:
 
  REPAIR TABLE cache QUICK;
 
  REPAIR TABLE cache QUICK;
  
== Entire www.osgeo.org down ==
+
== Entire www.osgeo.org down (PEER1) ==
  
 
* ISP/DNS problem: what to do? do we need to call anyone?
 
* ISP/DNS problem: what to do? do we need to call anyone?
Line 35: Line 33:
 
== telascience blade(s) down ==
 
== telascience blade(s) down ==
  
... call FrankW?
+
... reinstall stuff elsewhere ...
 
 
  
 
=== Contact User with Shell Access ===
 
=== Contact User with Shell Access ===

Revision as of 03:51, 24 September 2011

Troubleshooting

VM hanging on OSUOSL

see OSL for how to open a ticket with OSUOSL's support

LDAP Server Down

The LDAP server runs on .220, and if it needs to be restarted it can be done as root with the command:

/opt/fedora/slapd-ldapt/start-slapd

Peer1 LDAP Server Hanging

If there is a power outage like there was on 2-20-07, slapd's database will need to be recovered.

sudo /usr/sbin/slapd_db_recover -h /var/lib/ldap/osgeo2

MySQL Cleanup for Drupal

If a report in drupal starts saying a table is crashed and needs repair, log into mysql and run the following, for example for the cache table:

REPAIR TABLE cache QUICK;

Entire www.osgeo.org down (PEER1)

  • ISP/DNS problem: what to do? do we need to call anyone?
  • hardware reset: Contact SAC:Primary Administrators. One option is a power cycle on the UPS to restart osgeo.org, using the "Reboot Immediate" item on the UPS.

TODO: Define rescue plan with responsible people - see also SAC:Primary Administrators

telascience blade(s) down

... reinstall stuff elsewhere ...

Contact User with Shell Access

If services or o/s need restarting or something else needs emergency attention contact one of the following people with shell access directly:

PEER 1 Trouble Ticket Process

TODO: add details here or point to elsewhere?