Difference between revisions of "GIS workstation setup tips"

From OSGeo
Jump to navigation Jump to search
(hardware is important too)
 
(24 intermediate revisions by 5 users not shown)
Line 1: Line 1:
= Page is under heavy construction... =
+
A page to collect ''best practice tips'' for setting up a GIS workstation. ''Page is under on-going construction''
  
A page to collect ''best practice tips'' for setting up a GIS workstation
 
 
----
 
 
== Hardware ==
 
== Hardware ==
 
=== CPU ===
 
=== CPU ===
 
Multi-core is in and there are many steps(grades or variants), if you plan to run virtual machines make sure to get something with the virtualization instruction set.
 
Multi-core is in and there are many steps(grades or variants), if you plan to run virtual machines make sure to get something with the virtualization instruction set.
 +
 +
The high end Intel i7 cpu's are the fastest (as of 2012 & in the forseeable future) but bang for the buck decreases radically. Given how most GIS related software does not use multiple threads or cores particularly well, getting many cores is not critical. As of late 2012 I'd suggest an AMD FX8150 or Intel i7-3820 as suitable high end cpu's at sensible prices. AMD about 40% cheaper & 20% slower (if 8 seconds vs 10 is worth the $$ is your call :-)
 +
 +
Note that most FOSS GIS users are concurrently running multiple applications, such as using QGIS to view Postgis data. This runs both applications concurrently, so in this sort of usage, where multiple programs (processes) are being run togther, you can (if cpu is not the bottleneck) fully utilise multiple cores for your GIS work.
  
 
=== RAM ===
 
=== RAM ===
 
The more the better, but it does depend on what applications you plan to use. Things like R benefit more from RAM than some applications that are more graphic intensive.
 
The more the better, but it does depend on what applications you plan to use. Things like R benefit more from RAM than some applications that are more graphic intensive.
 +
 +
Use a 64 bit Operating System or anything over 3Gb om memory is largely wasted (unavailable for applications). (Simplistic but realistic). Get at least 8, preferably 16Gb memory.
  
 
=== Video Card ===
 
=== Video Card ===
 
Quality, dual screens is a big draw for GIS analysts to see a map on one screen and a table on another.
 
Quality, dual screens is a big draw for GIS analysts to see a map on one screen and a table on another.
If you do mostly GIS number crunching via commandline you probably don't need a beefy card, unless you want to learn to program GPU's
+
If you do mostly GIS number crunching via commandline you probably don't need a beefy card, unless you want to learn to program GPU's.
 +
 
 +
Any modern graphics card is likely to be more than adequate, even the cheaper ones. The "better" ones are generally used for gaming, with fast 3D rendering of animations, a type of load that is seldom relevant to GIS users (but not always - GIS can be used for 3D visualisations which can benefit from better cards).
  
 
=== Hard Drives ===
 
=== Hard Drives ===
 
Plan for double what you actually think you need. Why? Backups and room to grow.
 
Plan for double what you actually think you need. Why? Backups and room to grow.
 
SATA is by far the cost most effective solution which has a good balance of speed/price/size.
 
SATA is by far the cost most effective solution which has a good balance of speed/price/size.
2TB will cost $1000 or less these days.
+
3TB will cost $200 or less these days (late 2012).
  
Consider using a smaller faster drive for your OS and larger slower drives for Read Only data storage.  
+
Consider using a smaller faster drive for your OS and larger slower drives for Read Only data storage.
 +
 
 +
In particular, an SSD drive to boot from & a large SATA drive to store data files. An SSD for swap is also likely to help performance if swapping is required.
 +
 
 +
=== RAID Drives ===
 +
RAIDs are only good if:
 +
# <strike>It's hardware based</strike> (Linux software RAID works fine and trades slightly slower performance for better written code and tighter OS integration than is found in hardware solutions)
 +
# You run drive health monitoring tools on a regular basis (<tt>smartmontools</tt>) AND
 +
# You know how to rebuild it if it fails (<tt>mdadm</tt>)
 +
 
 +
My addendum to the advice above: software RAID (MD RAID) on Linux works well. A 2-disk RAID 0 setup will close to double disk I/O on both reads and writes. But you have doubled the chances of a disk failure, either one will lose all the stored data. You decide if the performance boost is worth the risk for your purposes.
 +
 
 +
RAID 1 offers the same close to double I/O benefits, and full hardware failure redundancy, but at the cost of overall capacity: it will be limited to the size of one drive (as the second is the full mirror).
 +
 
 +
Hardware RAID systems are no longer the province of high end servers, with effective several Tb 4-8 SATA disk external e-SATA/NAS storage units from companies like Arena, Stardom, Raidon etc., around USD300 These are useful for GIS users wanting to manage large vector or raster databases, with good RAID support, hot swap & auto rebuild capabilities. These won't compete with SSD's for sheer throughput, but are much cheaper for larger volumes.
  
 
== Operating System ==
 
== Operating System ==
Line 28: Line 47:
  
 
When I first was introduced to Linux I tried a few other distributions. This included ArkLinux, Fedora, OpenSuse and Gentoo. After some advice from my local Linux User Group I tried Debian. I don't think I'd ever have been successful with Linux if it hadn't been for Debian.
 
When I first was introduced to Linux I tried a few other distributions. This included ArkLinux, Fedora, OpenSuse and Gentoo. After some advice from my local Linux User Group I tried Debian. I don't think I'd ever have been successful with Linux if it hadn't been for Debian.
 +
 +
:: another perspective from another Linux GIS user: Debian is one of the Linux purists distros, not ideal for new users, but a good server distro. OpenSuse & Ubuntu have the best support for GIS applications, I use both, generally prefer OpenSuse, & I find the GIS repository maintainers for OpenSuse are more responsive to questions & have more up to date packages. Just my impression. Both are good choices.
 +
 +
Linux lets users "get their hands dirty", with much better support for scripting & command line applications generally, which is often useful to a serious GIS user. It is also easier to build a complete GIS workstation on Linux, using totally free software than it is under Windows, due to the GIS package repositories that provide the software ready to go for your system (well, OpenSuse & Ubuntu anyway, & Fedora isn't far behind.).
  
 
A couple other things to note:
 
A couple other things to note:
Line 43: Line 66:
  
 
== Partitions ==
 
== Partitions ==
'''Tips at a glance:'''
+
===Tips at a glance===
 
* One separate partition for the OS
 
* One separate partition for the OS
 
* On GNU/Linux OSes an extra ''swap'' partition is required. Recommended size is equal to or double as much as the system's available RAM.
 
* On GNU/Linux OSes an extra ''swap'' partition is required. Recommended size is equal to or double as much as the system's available RAM.
Line 49: Line 72:
 
* If a second (directly attached to the motherboard) or a fast external hard disk drive is available, then data can be stored on it (''not the one that carries the OS'') to protect system resources from being consumed 100% when the computer executes heavy GIS processing tasks which will slow down the system and prohibit the use of other programs.
 
* If a second (directly attached to the motherboard) or a fast external hard disk drive is available, then data can be stored on it (''not the one that carries the OS'') to protect system resources from being consumed 100% when the computer executes heavy GIS processing tasks which will slow down the system and prohibit the use of other programs.
 
* In case of multi-user access on projects, easy data exporting/sharing and can be achieved using for example the NFS network filesystem protocol
 
* In case of multi-user access on projects, easy data exporting/sharing and can be achieved using for example the NFS network filesystem protocol
 +
* Create at least 2 partitions / and /home to make upgrades and file management easier.
  
 
+
=== Details ===
==== Details ====
 
  
 
Before even installing an operating system on a computer, the hard disk drive needs to be partitioned. For practical as well as for data safety/security reasons it is recommended to divide the hard disk's available space into (at least) two or more partitions. One partition to carry the operating system itself and another partition to store data. The size of the partition that will carry the operating system depends on the requirements of the operating system that is to be installed. The size of the ''data'' partition(s) depends on the user's needs.
 
Before even installing an operating system on a computer, the hard disk drive needs to be partitioned. For practical as well as for data safety/security reasons it is recommended to divide the hard disk's available space into (at least) two or more partitions. One partition to carry the operating system itself and another partition to store data. The size of the partition that will carry the operating system depends on the requirements of the operating system that is to be installed. The size of the ''data'' partition(s) depends on the user's needs.
Line 58: Line 81:
 
There are several disk filesystem types which often are OS relevant/specific. More information can be found at wikipedia: http://en.wikipedia.org/wiki/File_system
 
There are several disk filesystem types which often are OS relevant/specific. More information can be found at wikipedia: http://en.wikipedia.org/wiki/File_system
  
 
+
=== More about partitions on GNU/Linux ===
==== More about partitions on GNU/Linux ====
 
 
* Which filesystem is better(=safer/faster)?  
 
* Which filesystem is better(=safer/faster)?  
 
Among the most famous disk filesystems used on GNU/Linux OSes, are the ext3 and the XFS. The ''ext3'' filesystem type seems to be faster than other types in several fields. On the other hand the XFS, although used mainly on servers, seems to perform the best overall highscore.  [Sources: http://www.debian-administration.org/articles/388, http://linuxgazette.net/122/piszcz.html ]
 
Among the most famous disk filesystems used on GNU/Linux OSes, are the ext3 and the XFS. The ''ext3'' filesystem type seems to be faster than other types in several fields. On the other hand the XFS, although used mainly on servers, seems to perform the best overall highscore.  [Sources: http://www.debian-administration.org/articles/388, http://linuxgazette.net/122/piszcz.html ]
 +
- Note: As of Ubuntu 8.10 JFS is also supported, originally developed by IBM it's reported to be faster than ext3.
  
 
* What about Linux' swap?
 
* What about Linux' swap?
 
In addition, on a GNU/Linux operating system one needs to define a ''swap'' partition. It is suggested to give as much space as the computer's available Random Access Memory (RAM) or, even better, the double. To exemplify, if a system features 1GB RAM then the swap partition should be sized between 1 and 2GB. [ More details at https://help.ubuntu.com/community/SwapFaq ]
 
In addition, on a GNU/Linux operating system one needs to define a ''swap'' partition. It is suggested to give as much space as the computer's available Random Access Memory (RAM) or, even better, the double. To exemplify, if a system features 1GB RAM then the swap partition should be sized between 1 and 2GB. [ More details at https://help.ubuntu.com/community/SwapFaq ]
 
+
- Note: This depends entirely on how you intend to use the machine. If you have a lot of ram (16GB) and want to always run everything in it, make the swap very small. Once you get past 2GB there is little or no reason to increase the swap(except if you use the suspend mode).
 +
 
 +
=== LVM ===
 +
 
 +
{{Wikipedia|Logical_volume_management}} lets you arbitrarily span multiple drives of any size to make it look like one drive.
 +
 
 +
Pro:
 +
* You can keep shoving disks in as you need and simply add on to the existing mount points
  
 +
Con:
 +
* You data can get spread across a lot of drives and if one goes bad in the middle it can be awkward to shuffle.
 +
* A little more difficult to setup to start.
  
 
== Backing up data ==
 
== Backing up data ==
Line 75: Line 108:
 
* syncing method: using rsync  
 
* syncing method: using rsync  
  
 +
Offsite backup?
  
 
== Other important issues ==
 
== Other important issues ==
+++
+
* Use a Universal Power Supply(UPS) and configure it to trigger safe shutdowns on extended power failure.
  
 
Note: A Good GIS system can be very similar to a good gaming system, poke around the internet for what people use for those.
 
Note: A Good GIS system can be very similar to a good gaming system, poke around the internet for what people use for those.
 +
 +
[[Category: Best practice]]
 +
 +
== Related links ==
 +
 +
* Discussion for creating this wiki-page at the GRASS-user mailing list:
 +
http://n2.nabble.com/Tips-for-setting-up-an-new-FOSS-GEO-linux-box-tc1885845.html#none
 +
 +
* Some "historical" discussions upon the subject can be retrieved by searching within the GRASS-user mailing list for the keyword "workstation".

Latest revision as of 12:43, 20 July 2013

A page to collect best practice tips for setting up a GIS workstation. Page is under on-going construction

Hardware

CPU

Multi-core is in and there are many steps(grades or variants), if you plan to run virtual machines make sure to get something with the virtualization instruction set.

The high end Intel i7 cpu's are the fastest (as of 2012 & in the forseeable future) but bang for the buck decreases radically. Given how most GIS related software does not use multiple threads or cores particularly well, getting many cores is not critical. As of late 2012 I'd suggest an AMD FX8150 or Intel i7-3820 as suitable high end cpu's at sensible prices. AMD about 40% cheaper & 20% slower (if 8 seconds vs 10 is worth the $$ is your call :-)

Note that most FOSS GIS users are concurrently running multiple applications, such as using QGIS to view Postgis data. This runs both applications concurrently, so in this sort of usage, where multiple programs (processes) are being run togther, you can (if cpu is not the bottleneck) fully utilise multiple cores for your GIS work.

RAM

The more the better, but it does depend on what applications you plan to use. Things like R benefit more from RAM than some applications that are more graphic intensive.

Use a 64 bit Operating System or anything over 3Gb om memory is largely wasted (unavailable for applications). (Simplistic but realistic). Get at least 8, preferably 16Gb memory.

Video Card

Quality, dual screens is a big draw for GIS analysts to see a map on one screen and a table on another. If you do mostly GIS number crunching via commandline you probably don't need a beefy card, unless you want to learn to program GPU's.

Any modern graphics card is likely to be more than adequate, even the cheaper ones. The "better" ones are generally used for gaming, with fast 3D rendering of animations, a type of load that is seldom relevant to GIS users (but not always - GIS can be used for 3D visualisations which can benefit from better cards).

Hard Drives

Plan for double what you actually think you need. Why? Backups and room to grow. SATA is by far the cost most effective solution which has a good balance of speed/price/size. 3TB will cost $200 or less these days (late 2012).

Consider using a smaller faster drive for your OS and larger slower drives for Read Only data storage.

In particular, an SSD drive to boot from & a large SATA drive to store data files. An SSD for swap is also likely to help performance if swapping is required.

RAID Drives

RAIDs are only good if:

  1. It's hardware based (Linux software RAID works fine and trades slightly slower performance for better written code and tighter OS integration than is found in hardware solutions)
  2. You run drive health monitoring tools on a regular basis (smartmontools) AND
  3. You know how to rebuild it if it fails (mdadm)

My addendum to the advice above: software RAID (MD RAID) on Linux works well. A 2-disk RAID 0 setup will close to double disk I/O on both reads and writes. But you have doubled the chances of a disk failure, either one will lose all the stored data. You decide if the performance boost is worth the risk for your purposes.

RAID 1 offers the same close to double I/O benefits, and full hardware failure redundancy, but at the cost of overall capacity: it will be limited to the size of one drive (as the second is the full mirror).

Hardware RAID systems are no longer the province of high end servers, with effective several Tb 4-8 SATA disk external e-SATA/NAS storage units from companies like Arena, Stardom, Raidon etc., around USD300 These are useful for GIS users wanting to manage large vector or raster databases, with good RAID support, hot swap & auto rebuild capabilities. These won't compete with SSD's for sheer throughput, but are much cheaper for larger volumes.

Operating System

Are there any advantages to choose a specific operating system (OS)?

I've been using Linux at home and at work for several years, so I thought I would share some thoughts here. In my own humble opinion Debian/Ubuntu is the best Linux Operating System for new Linux users. I've found the community at Debian (via the mailing list) to be one of the most supportive of all the online communities I've been a part of. There are plenty of people on that list willing to share their knowledge. Ubuntu does a better job of packaging the Debian operating system, and they make support for hardware that requires proprietary drivers much easier. I also find that the Ubuntu software package repositories often contain software I need, which is not always in the Debian repository.

When I first was introduced to Linux I tried a few other distributions. This included ArkLinux, Fedora, OpenSuse and Gentoo. After some advice from my local Linux User Group I tried Debian. I don't think I'd ever have been successful with Linux if it hadn't been for Debian.

another perspective from another Linux GIS user: Debian is one of the Linux purists distros, not ideal for new users, but a good server distro. OpenSuse & Ubuntu have the best support for GIS applications, I use both, generally prefer OpenSuse, & I find the GIS repository maintainers for OpenSuse are more responsive to questions & have more up to date packages. Just my impression. Both are good choices.

Linux lets users "get their hands dirty", with much better support for scripting & command line applications generally, which is often useful to a serious GIS user. It is also easier to build a complete GIS workstation on Linux, using totally free software than it is under Windows, due to the GIS package repositories that provide the software ready to go for your system (well, OpenSuse & Ubuntu anyway, & Fedora isn't far behind.).

A couple other things to note:

  • What most people think of as "Microsoft Windows" is actually an operating system, desktop, and suite of office applications packaged together. In Linux these these are all different components. Linux is the operating system, while the desktop used is a separate component, like Gnome or KDE. You can also use different office software, like Open Office or AbiWord.
  • All Linux distributions essentially build on the same operating system. The main distinction between them is the way they manage software installation, which is quite different then on Microsoft Windows. Linux distributions can be divided into two (2) main camps. One camp uses the Debian format for installing and managing software, while the other uses the Red Hat format. The other factors that distinguish an operating system are how often they relese new versions of the distrubution and how they handle "non-free" software. These two factors help to distinguish Ubuntu and Debian. Debian makes a new release whenever it is ready, not on a regular schedule, while Ubuntu makes regularly scheduled releases. Debian takes a somewhat "hostile" view to free software, while Ubuntu does not.
  • One of the main challenges to using Linux is hardware compatibility. Before you buy any hardware or periphials (scanners, network interface cards, microphones, scanners, video cards) for a computer on which you plan to run Linux make sure it is compatible with Linux. Many companies make Linux specific drivers for their hardware, while others have hardware that doesn't play with Linux well. Almost all companies make hardware that plays with Microsoft's operating systems well. This isn't the case with Linux. Don't buy hardware for Linux and expect it to "just work" unless you have done your homework.

The Sunburned Surveyor

32 vs 64 bit

If you have or plan to have 4GB of RAM or greater 64 bit is necessary to correctly address the RAM and use the system to it's potential. For windows this might mean some slightly buggy drivers and the need to run some 32 bit applications in compatibility mode (does this work). For linux at this point you're pretty much taken care of. Occasionally some applications aren't compiled or take longer to reach 64 bit platforms (like Adobe Flash) but that trend is decreasing. Some compiling may be necessary to get what you want, but that is true for all systems in order to have the latest patches a tools.

Partitions

Tips at a glance

  • One separate partition for the OS
  • On GNU/Linux OSes an extra swap partition is required. Recommended size is equal to or double as much as the system's available RAM.
  • At least one separate partition for data storage
  • If a second (directly attached to the motherboard) or a fast external hard disk drive is available, then data can be stored on it (not the one that carries the OS) to protect system resources from being consumed 100% when the computer executes heavy GIS processing tasks which will slow down the system and prohibit the use of other programs.
  • In case of multi-user access on projects, easy data exporting/sharing and can be achieved using for example the NFS network filesystem protocol
  • Create at least 2 partitions / and /home to make upgrades and file management easier.

Details

Before even installing an operating system on a computer, the hard disk drive needs to be partitioned. For practical as well as for data safety/security reasons it is recommended to divide the hard disk's available space into (at least) two or more partitions. One partition to carry the operating system itself and another partition to store data. The size of the partition that will carry the operating system depends on the requirements of the operating system that is to be installed. The size of the data partition(s) depends on the user's needs.

  • Which disk filesystem type?

There are several disk filesystem types which often are OS relevant/specific. More information can be found at wikipedia: http://en.wikipedia.org/wiki/File_system

More about partitions on GNU/Linux

  • Which filesystem is better(=safer/faster)?

Among the most famous disk filesystems used on GNU/Linux OSes, are the ext3 and the XFS. The ext3 filesystem type seems to be faster than other types in several fields. On the other hand the XFS, although used mainly on servers, seems to perform the best overall highscore. [Sources: http://www.debian-administration.org/articles/388, http://linuxgazette.net/122/piszcz.html ]

- Note: As of Ubuntu 8.10 JFS is also supported, originally developed by IBM it's reported to be faster than ext3.
  • What about Linux' swap?

In addition, on a GNU/Linux operating system one needs to define a swap partition. It is suggested to give as much space as the computer's available Random Access Memory (RAM) or, even better, the double. To exemplify, if a system features 1GB RAM then the swap partition should be sized between 1 and 2GB. [ More details at https://help.ubuntu.com/community/SwapFaq ]

- Note: This depends entirely on how you intend to use the machine. If you have a lot of ram (16GB) and want to always run everything in it, make the swap very small. Once you get past 2GB there is little or no reason to increase the swap(except if you use the suspend mode).

LVM

Logical_volume_management lets you arbitrarily span multiple drives of any size to make it look like one drive.

Pro:

  • You can keep shoving disks in as you need and simply add on to the existing mount points

Con:

  • You data can get spread across a lot of drives and if one goes bad in the middle it can be awkward to shuffle.
  • A little more difficult to setup to start.

Backing up data

How often should one back up data?

How should one back up data?

  • simple method: tar cjvf
  • syncing method: using rsync

Offsite backup?

Other important issues

  • Use a Universal Power Supply(UPS) and configure it to trigger safe shutdowns on extended power failure.

Note: A Good GIS system can be very similar to a good gaming system, poke around the internet for what people use for those.

Related links

  • Discussion for creating this wiki-page at the GRASS-user mailing list:

http://n2.nabble.com/Tips-for-setting-up-an-new-FOSS-GEO-linux-box-tc1885845.html#none

  • Some "historical" discussions upon the subject can be retrieved by searching within the GRASS-user mailing list for the keyword "workstation".