data mining

Collection of dataset:
The HIV integrase inhibitors used in the dataset are the analogs of raltegravir compound. All the active and non-active compounds are collected from the pubchem database. The compounds with IC50 values less than 50 are taken to be active and the compounds with IC50 value grater than 100 are taken to be non-active compounds.

Generation of 2d .sdf files:
Go to
Copy and paste the CID numbers into the “Enter IDs” window area, and make sure the value in the “Choose a Format” window is SDF.
Then click the “Download” button. After a few moments, depending on how many CIDs you entered.
Generation of 3d .sdf files using open babel:
The 2d .sdf files are converted into 3d .sdf files using open babel software. The .2d sdf file is loaded on to the open babel interface.
Select generate 3d co-ordinates and add hydrogen’s from open babel options.
Specify the output file and place the file needs to be stored and click on run.
Generation of Moe database file:
The 3d .sdf file is converted into .mdb file by loading it into moe software and saving it by .mdb extension.
Auto-QSAR on HIV dataset:
Use the AutoQSAR [ ] command to start up AutoQuaSAR. In default settings, the panel on the right is displayed.

Select an analysis method from the Method area. Available methods are PLS, PCR, Binary, and GA-MLR. The analysis method is set to PCR in default settings. Change the analysis method to PLS.

Click on ▼ next to the Original File field. A list of database files stored in the work directory is displayed. 

Select an mdb or sd file to be used as an original file. 
To reference locations other than the current working directory, select Others (select our HIV mdb file). The work file name is displayed. 

The numerical value field farthest left in the original database is treated as an Activity Field. Select the Activity Field to be Pic50 values....

