Checking for intrusions in Apache Web server log analysis
Checking for intrusions in Apache Web server log analysis
By Brad Causey | Apr 22, 2009
It seems every new device, appliance and even desktop software program has the capability to generate logs or text-based data. There are a number of challenges associated with managing the onslaught of log data.
The first is centrally storing and gathering these logs; luckily, there are a number of available products for this. Logs are usually shipped off to a syslog, log management or SIM system that is centrally located in the network. So the big question is: How do you sift through Web server log data and find relevant security information?
Although there are many different open source and commercial software applications that perform some level of log analysis, one thing is usually common among them -- regular expressions (regex). Regular expressions are basically a string of characters that allow nearly any scripting language or search tool to perform fast, advanced searches against large amounts of text data. There are a few variations of regex formats, and the most commonly used by scripting languages are called Perl-derivative regular expressions. These include regex formats for .NET framework, Python, Java, JavaScript and, of course, Perl. By using this type of regex in combination with any scripting language or search tool, you can quickly and efficiently parse large amounts of data for meaningful information.
One of the most common log formats we tend to see issues in is Apache, or httpd. These Web server logs tend to hide a number of secrets that are vital to find, such as attack attempts, successful attack signatures, and even precursor activities to an impending attack.
We will focus on the use of regex with egrep. Egrep uses a very simple syntax for searching files and is readily present on nearly every operating system in common environments today. (Windows users can download a free version from a variety of sources).
Keep in mind that regex used with egrep is also compatible with any program or scripting language that supports regex.
For this article, we'll look at Apache logs. But the concepts applied via egrep, regex and httpd logs can be used across hundreds of other platforms, tools and log types. Understanding what is dangerous and how to search for it is a great step toward recognizing security issues within your organization.
Step one: Web log format
In order to create expressions to analyze the contents of these logs, we need to understand the log entry structure. Apache stores something called a server access log, usually in /etc/httpd/logs, and typically is named something like access_log.
You can configure httpd (Apache) to send these logs to a syslog or SIM system; if so, your log format may be different from the default. Apache stores return delimited entries in access_log in the following format:
10.10.10.10 - frank
[10/Oct/2007:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326
Let's break this down section by section. The first value, 10.10.10.10, is simply the client IP address, directly followed by the hostname of the client if HostnameLookups is enabled. Next, we have the date and time stamp, 10/Oct/2007:11:55:36 -0700. This is obviously important for correlation purposes.
Next, we have the HTTP header information. This is especially helpful because it gives us details about what request was made by the client. In this case, GET/apache_pb.gif HTTP/1.0 indicates a GET method of request, targeting the image file named apache_pb.gif that is located in the root of the httpd Web server's directory.
Finally, the server return code, 200, indicates the request was completed successfully. The last bit of information is simply the size of the object returned to the client for that request.


0 comments
Facebook
LinkedIn
Digg

