Smart File Type Detection Using PHP

padlockIn most web applications today, there is a need to allow users to upload images, audio and video files. Sometimes, we also need to restrict certain types of files from being uploaded – an executable file being an obvious example.

Security aside, one might also want to prevent users from misusing the upload facility, e.g. uploading copyrighted music files illegally and using the service to promote piracy! In this article, we’ll look into a few ways in which we can achieve this.

The Ultimate Designer Toolkit: 2 Million+ Assets

Envato Elements gives you unlimited access to 2 million+ pro design resources, themes, templates, photos, graphics and more. Everything you'll ever need in your design resource toolkit.

Explore Design Resources

File type detection using extension and MIME types

I am not going to talk about this in too much detail as after all, this is what we normally do when we want to restrict certain files. We simply get the MIME type of the file using $_FILES['myFile']['type'] and check if it’s of a valid type.

Or we might scan the last few characters of the file name and reject files ending with a certain extension. Unfortunately, these methods are hardly sufficient, as one can easily change the extension of a file to bypass this restriction. Furthermore, MIME type information is given by the browser and most browsers, if not all, determine the mime type based upon the file’s extension! Hence MIME types can be pretty easily spoofed too.

Let’s now explore some others ways which offer better fool-proofness.

Using Magic Bytes

The best way to determine the file type is by examining the first few bytes of a file – referred to as “magic bytes”. Magic bytes are essentially signatures that vary in length between 2 to 40 bytes in the file headers, or at the end of a file. There are several hundred types of files, and quite a few of them have several file signatures associated with them. You can see a list of file signatures over here.

Although inconsistent, this is our best bet in detecting file types reliably. This seemingly difficult task has been made really easy by a PECL extension called Fileinfo. As of PHP 5.3, Fileinfo is shipped with the main distribution and is enabled by default, so this is definitely a robust and simple way to detect and impose restrictions on the types of files uploaded.

Let’s now see how we can detect a file type using Fileinfo:

$file = "/path/to/file";

// in PHP 4, we can do:
$fhandle = finfo_open(FILEINFO_MIME);
$mime_type = finfo_file($fhandle,$file); // e.g. gives "image/jpeg" 

// in PHP 5, we can do:

$file_info = new finfo(FILEINFO_MIME);	// object oriented approach!
$mime_type = $file_info->buffer(file_get_contents($file));  // e.g. gives "image/jpeg"

switch($mime_type) {
	case "image/jpeg":
		// your actions go here...
}

Handling image uploads

If you intend to allow only image uploads, then you can use the inbuilt getimagesize() function to ensure that the user is actually uploading a valid image file. This functions returns false, if the file is not a valid image file.

//  Let's assume that the name attribute of the file input field you have used is "myfile"

$tempFile =  $_FILES['myFile']['tmp_name'];  // path of the temp file created by PHP during upload
$imginfo_array = getimagesize($tempFile);   // returns a false if not a valid image file

if ($imginfo_array !== false) {
    $mime_type = $imginfo_array['mime'];
    switch($mime_type) { 

	case "image/jpeg":
		// your actions go here...

    }
}
else {
    echo "This is not a valid image file";
}

Reading and interpreting magic bytes manually

If for some reason, you are not able to install Fileinfo, then you can still manually determine the file type by reading the first few bytes of a file and comparing them with known magic bytes associated with the particular file type. This process definitely has an element of trial and error, because there is still a chance that there are a few undocumented magic bytes associated with legitimate file formats. As a result, valid files could be rejected by your system. However it’s not impossible as a couple of years back, I was asked to work on a script that allowed only genuine mp3 files to be uploaded, and since we could not use Fileinfo, we resorted to this manual scanning. It took me a while to account for some of the undocumented magic bytes for mp3, but pretty soon, I got a stable upload script running.

Before I end, I would just like to part with a general word of caution: Make sure that you never call an include() with a file that was uploaded, as PHP code can very well be hidden as part of the picture, and the picture would pass your tests for file validation just fine, only to cause havoc when executed by the server.