To build our first antivirus, we need to know about the virus first. Every antivirus program needs to constantly update their database to defend against new virus thread. Day by day not only our security systems are getting smarter but the viruses as well.
Let's see some example of viruses -
Polymorphic virus - This type of virus is hard to detect because this virus changes its signature every time it creates a replica. Antivirus software takes more than days to detect this virus type.
Worm – Worm is a type of virus that self-replicate to computers that are connected using the bandwidth and computing of every host machine.
There are more types of viruses out there.
Wannacrypt or wannacry is one of the most popular virus attack, This type of virus is called Ransomware.
import pandas as pd import numpy as np import pickle import sklearn.ensemble as ske from sklearn import cross_validation, tree, linear_model from sklearn.feature_selection import SelectFromModel from sklearn.externals import joblib from sklearn.naive_bayes import GaussianNB from sklearn.metrics import confusion_matrix
These viruses are created by malicious hackers to gain access to your computer and use your resources to do their task.
For malware analysis, we need follow mainly two approach
1. Static Approach
2. Dynamic Approach
Static Approach are code based means, in this approach antivirus software look into the code of the program to determine if it’s malicious or safe.
Dynamic approach looks for on going tasks running by software to determine the status.
So, Let’s start building our antivirus software using machine learning.
We are gonna use these libraries to build our antivirus.
Pandas is for data analysis.
Numpy is used for mapping the data. Sklearn ensemble used to save our learn feature as a byte string.
The first thing we need is to load the dataset where we stored the features and labels of different programs and saved as CSV format on our local machine. This CSV file contains two possible label, legitimate or malicious. Then we print the total number of features per row.
And after that, we are gonna set multiple Classifier to determine which is works best for the perticular machine. Below I’ve given the whole python program. Please install these dependencies to run the script.
You can use pip or anaconda to install those dependencies.
So let's jump right into the program.
This program gonna save the best classifier for us and save as pickle file into the classifier directory. Now we will look at our main progrm where we will use our trained classifier to detect a program is malicious or not.
Here is main program