Monday, May 2, 2011

Error: Too many open files

Couple days ago, I ran into this error on CentOS 5.2.  To my understanding this error can occur on all flavors of linux.

This totally caught me by surprise as I had never seen this before.  I had just finished installing awstats and wanted to run an update for the first time.  The logs were gathered from 6 webservers and each webserver had about 225 logs.
The awstats log merging script tried to open these log files (225×6 = 1,350 logs) and merge them to read the data but it kept crashing.  The reason it kept crashing was because the limit for the maximum number of files that you are allowed to open in a shell was set to 1024.

To check what your limits are just type in the following command:
ulimit -a
To change the limit for the number of files you are allowed to open, change the “open files” limit.  To do so, type in the following command:
ulimit -n3000
This will set the limit to 3000 files.  You can set it to whatever number you need to.
After changing my “open files” limit, I was able to run my initial awstats update without any issues.

Wednesday, April 6, 2011

What's the difference between Terminating and Stopping an EC2 Instance?

 Background Information

               Amazon supports the ability to terminate or stop a running instance.  The ability to stop a running instance is only supported by instances that were launched with an EBS-based AMI.  Also, you cannot stop a Spot Instance.  There are distinct differences between stopping and terminating an instance.   It's important to properly understand the implications of each action.


Terminate Instance
When you terminate an EC2 instance, the instance will be shutdown and the virtual machine that was provisioned for you will be permanently taken away and you will no longer be charged for instance usage.  Any data that was stored locally on the instance will be lost.  Any attached EBS volumes will be detached and deleted.  However, if you attach an EBS Snapshot to an instance at boot time, the default option in the Dashboard is to delete the attached EBS volume upon termination.

Tuesday, April 5, 2011

AWS - Steps to create AMI from a non EBS instance.

  1. Copy pk-MX7IVEMMKXYAMPDYTZK2LXQH45N2YI44.pem under /mnt folder.
  2. Copy cert-MX7IVEMMKXYAMPDYTZK2LXQH45N2YI44.pem under /mnt folder.
  3. Make directory /home/ec2 if not already present. If already present, skip step (4) & (5) below.
  4. Download Amazon ec2 ami tools from:  wget
to /home/ec2 folder.
  1. Download Amazon ec2 api tools from: wget
to /home/ec2 folder.
            You will see two folders under /home/ec2.
                        (a) ec2-ami-tools-1.3-34544
                        (b) ec2-api-tools-1.3-42584

Ec2 Starter

How To: Getting Started with Amazon EC2

Amazon EC2 is among the more potent items in Amazon's web services arsenal. You've probably heard of many of the other services such as S3 for storage and FPS for payments. EC2 is all about the "elastic compute cloud." In layman's terms, it's a server. In slightly less layman's terms, EC2 lets you easily run and manage many instances (like servers) and given the proper software and configurations, have a scalable platform for your web application, outsource resource-intensive tasks to EC2 or for whatever you would use a server farm.
There are three different sizes of EC2 instances you can summon and they're all probably more powerful than the server currently running your blog. Unless you're offloading video processing or something intense to EC2, the default small instance with its 1.7GB of RAM and 160GB disk should be more than fine. It's just nice to know that if for any reason I need a farm of machines each with 15GB of RAM, I can get that easily.
EC2 has been around for a while but has gained interest in the last few weeks as Amazon released an elastic IP feature. One of the larger EC2 issues deals with data persistence on instances. There are many limitations with EC2 that make it difficult to use unless you carefully build around the EC2 architecture and don't just assume that you can move your app to EC2 flawlessly. If an instance crashes and you run it again, you'll loose data and when the instance comes back up it will have a new IP, adding another hurdle with DNS issues. Fortunately, the elastic IP feature lets you assign a static IP address to your instances.
As the title of this article implies, this article is meant to be a beginner's look into tinkering with EC2. Just because you will be able to host a page on EC2 at the end of this article does not mean you should start using it as your only server. Many considerations need to be made when using EC2 to get around the data persistence issue. If your startup is looking to use EC2 as a scalable platform, fortunately there are many services that have already built stable systems on top of EC2, ready for your consumption: WeoCeo, Scalr and RightScale. Enough talk, shall we jump right in?
Note: Most of the information below (and more) is available in the EC2 API doc if you enjoy reading those things.

Saturday, January 22, 2011

Postgres - Administration

As with everything that contains valuable data, PostgreSQL databases should be backed up regularly. While the procedure is essentially simple, it is important to have a basic understanding of the underlying techniques and assumptions.
There are three fundamentally different approaches to backing up PostgreSQL data:
  • SQL dump
  • File system level backup
  • On-line backup
Each has its own strengths and weaknesses.

23.1. SQL Dump

The idea behind the SQL-dump method is to generate a text file with SQL commands that, when fed back to the server, will recreate the database in the same state as it was at the time of the dump. PostgreSQL provides the utility program pg_dump for this purpose. The basic usage of this command is:
pg_dump dbname > outfile
As you see, pg_dump writes its results to the standard output. We will see below how this can be useful.
pg_dump is a regular PostgreSQL client application (albeit a particularly clever one). This means that you can do this backup procedure from any remote host that has access to the database. But remember that pg_dump does not operate with special permissions. In particular, it must have read access to all tables that you want to back up, so in practice you almost always have to run it as a database superuser.
To specify which database server pg_dump should contact, use the command line options -h host and -p port. The default host is the local host or whatever your PGHOST environment variable specifies. Similarly, the default port is indicated by the PGPORT environment variable or, failing that, by the compiled-in default. (Conveniently, the server will normally have the same compiled-in default.)
As any other PostgreSQL client application, pg_dump will by default connect with the database user name that is equal to the current operating system user name. To override this, either specify the -U option or set the environment variable PGUSER. Remember that pg_dump connections are subject to the normal client authentication mechanisms (which are described in Chapter 20).
Dumps created by pg_dump are internally consistent, that is, updates to the database while pg_dump is running will not be in the dump. pg_dump does not block other operations on the database while it is working. (Exceptions are those operations that need to operate with an exclusive lock, such as VACUUM FULL.)
Important: When your database schema relies on OIDs (for instance as foreign keys) you must instruct pg_dump to dump the OIDs as well. To do this, use the -o command line option.

23.1.1. Restoring the dump

The text files created by pg_dump are intended to be read in by the psql program. The general command form to restore a dump is
psql dbname < infile
where infile is what you used as outfile for the pg_dump command. The database dbname will not be created by this command, you must create it yourself from template0 before executing psql (e.g., with createdb -T template0 dbname). psql supports options similar to pg_dump for controlling the database server location and the user name. See psql's reference page for more information.
Not only must the target database already exist before starting to run the restore, but so must all the users who own objects in the dumped database or were granted permissions on the objects. If they do not, then the restore will fail to recreate the objects with the original ownership and/or permissions. (Sometimes this is what you want, but usually it is not.)
Once restored, it is wise to run ANALYZE on each database so the optimizer has useful statistics. An easy way to do this is to run vacuumdb -a -z to VACUUM ANALYZE all databases; this is equivalent to running VACUUM ANALYZE manually.
The ability of pg_dump and psql to write to or read from pipes makes it possible to dump a database directly from one server to another; for example:
pg_dump -h host1 dbname | psql -h host2 dbname

Important: The dumps produced by pg_dump are relative to template0. This means that any languages, procedures, etc. added to template1 will also be dumped by pg_dump. As a result, when restoring, if you are using a customized template1, you must create the empty database from template0, as in the example above.

23.1.2. Using pg_dumpall

The above mechanism is cumbersome and inappropriate when backing up an entire database cluster. For this reason the pg_dumpall program is provided. pg_dumpall backs up each database in a given cluster, and also preserves cluster-wide data such as users and groups. The basic usage of this command is:
pg_dumpall > outfile
The resulting dump can be restored with psql:
psql -f infile postgres
(Actually, you can specify any existing database name to start from, but if you are reloading in an empty cluster then postgres should generally be used.) It is always necessary to have database superuser access when restoring a pg_dumpall dump, as that is required to restore the user and group information.

23.1.3. Handling large databases

Since PostgreSQL allows tables larger than the maximum file size on your system, it can be problematic to dump such a table to a file, since the resulting file will likely be larger than the maximum size allowed by your system. Since pg_dump can write to the standard output, you can just use standard Unix tools to work around this possible problem.
Use compressed dumps. You can use your favorite compression program, for example gzip.
pg_dump dbname | gzip > filename.gz
Reload with
createdb dbname
gunzip -c filename.gz | psql dbname
cat filename.gz | gunzip | psql dbname
Use split. The split command allows you to split the output into pieces that are acceptable in size to the underlying file system. For example, to make chunks of 1 megabyte:
pg_dump dbname | split -b 1m - filename
Reload with
createdb dbname
cat filename* | psql dbname
Use the custom dump format. If PostgreSQL was built on a system with the zlib compression library installed, the custom dump format will compress data as it writes it to the output file. This will produce dump file sizes similar to using gzip, but it has the added advantage that tables can be restored selectively. The following command dumps a database using the custom dump format:
pg_dump -Fc dbname > filename
A custom-format dump is not a script for psql, but instead must be restored with pg_restore. See the pg_dump and pg_restore reference pages for details.

Postgres login - how to log into a Postgresql database

Postgres login commands

If you are logged into the same computer that Postgres is running on you can use the following psql login command, specifying the database (mydb) and username (myuser):
psql -d mydb -U myuser
If you need to log into a Postgres database on a server named myhost, you can use this Postgres login command:
psql -h myhost -d mydb -U myuser
If for some reason you are not prompted for a password when issuing these commands, you can use the -W option, leading to these two command alternatives:
psql -d mydb -U myuser -W
psql -h myhost -d mydb -U myuser -W

Sunday, December 26, 2010

Using Elastic IP to Identify Internal Instances on Amazon EC2

Elastic IP

Amazon EC2 supports Elastic IP Addresses to implement the effect of having a static IP address for public servers running on EC2. You can point the Elastic IP at any of your EC2 instances, changing the active instance at any time, without changing the IP address seen by the public outside of EC2.
This is a valuable feature for things like web and email servers, especially if you need to replace a failing server or upgrade or downgrade the hardware capabilities of the server, but read on for an insiders’ secret way to use Elastic IP addresses for non-public servers.

Internal Servers

Not all servers should be publicly accessible. For example, you may have an internal EC2 instance which hosts your database server accessed by other application instances inside EC2. You want to architect your installation so that you can replace the database server (instance failure, resizing, etc) but you want to make it easy to get all your application servers to start using the new instance.
There are a number of design approaches which people have used to accomplish this, including:

Tuesday, December 14, 2010

Mysql Replication :-

This tutorial will go through the setup of MySQL database replication. I will also talk about how to get everything working smoothly again after a server crash, or if you wish to switch databases. I will try to explain what is going on behind the scenes for every step (something I've found missing from other tutorials). This is written specifically for MySQL 5.0 on Centos 4, but should be very similar on other Linux distributions. It should also work this way with MySQL 4.x.

The theory

We have 2 servers, one of which is a Master and the other which is a Slave. We tell the Master that it should keep a log of every action performed on it. We tell the slave server that it should look at this log on the Master and whenever something new happens, it should do the same thing.
You should follow the instructions below with two console windows open - one for the Master and one for the Slave. Also note that I will capitalise the first letters of Master and Slave to indicate I am talking about the servers.

FTP :- vsftpd configuration in EC2 Servers.

# Example config file /etc/vsftpd/vsftpd.conf
# The default compiled in settings are fairly paranoid. This sample file
# loosens things up a bit, to make the ftp daemon more usable.
# Please see vsftpd.conf.5 for all compiled in defaults.
# READ THIS: This example file is NOT an exhaustive list of vsftpd options.
# Please read the vsftpd.conf.5 manual page to get a full idea of vsftpd's
# capabilities.
# Allow anonymous FTP? (Beware - allowed by default if you comment this out).
# Uncomment this to allow local users to log in.
# Uncomment this to enable any form of FTP write command.
# Default umask for local users is 077. You may wish to change this to 022,
# if your users expect that (022 is used by most other ftpd's)

Monday, December 13, 2010

Munin/Apache Plugin

Apache Plugin

The default install of Munin on Debian Lenny comes with 3 apache backend:
  • apache_processes
  • apache_accesses
  • apache_volume
The first one works out of the box, while the 2 others depend on the status module of Apache. As such, Apache needs to be configured to use this module, moreover, the ExtendedStatus must be enabled.

apache2 server config

First make sure that status module is enabled:
# a2enmod status
Then, edit /etc/apache2/mods-available/status.conf and add, just above <Location /server-status> :
ExtendedStatus On
Then, restart the service:
# /etc/init.d/apache2 restart

Monday, November 29, 2010

The Linux split command

Why use split?  Sometimes, you’ll want to split a file simply because it’s faster to run a command or script against it. Also, you may need to move a file from one system to another and by breaking a 20gb file into forty 500 megabyte pieces, it might be much easier to move the data around. You can also break it into chunks and write to a CD and easily restore to actual file when needed.
One option for dealing with huge files is to break them into more manageable chunks and move or process these chunks independently. The command to use for this is called split and it works with text or binary files. Split can divide files into chunks that contain a certain number of lines.
The following will create a series of 200,000-line files, giving them the default names for split files – xaa, xab, xac and so on:
$ split -l 200000 largelogfile
- The following example gives more meaningful names to each chunk and will result in a series of files called log_aa, log_ab, log_ac and so on:
$ split -l 200000 largelogfile log_
If you’re going to split a binary file such as movies and mp3 files then it’s just as easy as splitting a text file. The following command will split the WMV file, movie1.wmv, into a series of 10 kilobyte chunks and and name them wmv_aa, wmv_ab, wmv_ac and so on.
$ split -b 10k movie1.wmv wmv_
So now that you’ve split the file into many smaller ones, you can easily use the cat command to restore all smaller chunks into original text file or binary. Here is a quick example of breaking a file into smaller pieces and restoring it to original:
$ -rw-r--r--  1 root root 47704 May  8 15:11 design.jpg

Error: Too many open files

Couple days ago, I ran into this error on CentOS 5.2.  To my understanding this error can occur on all flavors of l...