I haven’t written for a while, but better late than never... So, what is the story?
In web development exist the problem with uploading files with malicious content, which can harm the server or the system (site) at all. The most common way to protect from this is to restrict file types by checking file extension and mime-types, and only allow certain types to be uploaded. But in some situation it’s not enough.
To start with brief explanation of what is mime-type and file extension. Mime-type (Multimedia Internet Mail Extensions) describes what kind of information file contains. In addition mime-types are used by web servers to specify file type when it responds to the browser's HTTP request (stored in HTTP header). Every browser interprets mime-type in its own way and every file may have more than one mime-type.
Everybody knows what the file extension is. It’s the same as mime-type, but it’s used by operating system to know the type of data in file and which program windows must use to open it. Some OS (UNIX) are not supporting file extensions.
To perform more security we may check not only mime-types and extension as well as file signature. Every file has unique characters in the first 20 bytes (may be less than 20 and not exactly consecutively). This data are presented in two forms – hexadecimal and ASCII text. Second one is not necessary to have sense (Pdf files have ASCII text “%PDF“, for example). To read file header from stream, you may use this C# code:
string hex = string.Empty;
int bytesToReading = (stream.Length < 20) ?
Convert.ToInt32(stream.Length) : 20;
Byte bytes = new byte[bytesToReading];
stream.Read(bytes, 0, bytesToReading);
for (int i = 0; i < bytesToReading; i++)
if (bytes[i] != 0)
hex += bytes[i].ToString("X");
stream.Position = stream.Seek(-bytesToReading, SeekOrigin.Current);
I’ll give a simple solution to put all this in your system. In xml file describes “file” objects, which will have extension, list of mime-types and list of possible hex signatures, which file may have. This information you can get form internet (http://www.garykessler.net/library/file_sigs.html, http://mark0.net/soft-trid-deflist.html). I recommend you to check signatures in more than one source. This xml file will collect “files”, which system will allow to be uploaded. Then parse the information from xml and fill it in static objects representing the xml structure. System will allow uploading a file, when his extension, mime-type and hex signature match with some of these with one from static “file” objects. And this simple solution will prevent unnecessary file uploading.
When searching the internet about this issue I found a very simple and useful tool called TrIDNet, which identify file by its content.
I will glad to hear your opinion about the problem.