An instance selection framework for mining data streams to predict antibody-feature function relationships on RV144 HIV vaccine recipients

Sarac, Ferdi and Seker, Huseyin (2016) An instance selection framework for mining data streams to predict antibody-feature function relationships on RV144 HIV vaccine recipients. In: Proceedings of the 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE, 003356-003361. ISBN 9781509018970

Full text not available from this repository. (Request a copy)
Official URL: https://doi.org/10.1109/SMC.2016.7844752

Abstract

Data streams are rapidly and constantly growing. Analysis of rapidly changing data streams is quite difficult since the amount of data increases in timely manner. Individual patient records provide a vital resource for health research for the benefit of society, such as understanding the association between human immune system and viruses. As the patient records have been constantly growing, data reduction techniques are needed to reduce the complexity of the data, the cost of data storage and to enhance generalization performance. This study uses the concept of data stream mining to predict the effect of antibody features (IgGs) and primary Natural Killing (NK) cells' cytotoxic activities on RV144 vaccine receipts and to disclose the functional relationship between immune system and HV virus. In order to adapt the data stream mining techniques, this data is manumitted to mimic a data stream. We propose a novel instance selection framework that identifies relevant and important instances and yields better results than the entire data set. The RV144 vaccine data set contains 100 data samples in which 20 of them are the placebo samples and 80 of them are the vaccine injected samples. Each data sample has twenty antibody features that consist of features related to IgG subclass and antigen specificity. To accomplish our goal the data randomly divided into four chunks which have been utilised for sequential random sampling of the data. In addition, a synthetic data set was created and divided into five chunks similar to RV144 data set. Then each chunk is sequentially added to the database at a time. However, instead of using entire data set to select samples, we utilised one chunk at a time and most relevant and important instances of upcoming samples are selected before new chunk of data has arrived. Therefore, our framework does not only reduce the size of data set but also reduce the cost of storage.

Item Type: Book Section
Uncontrolled Keywords: reservoirs, predictive models, vaccines, data models, data mining, support vector machines, conferences
Subjects: B800 Medical Technology
G400 Computer Science
Department: Faculties > Engineering and Environment > Mathematics, Physics and Electrical Engineering
Depositing User: Becky Skoyles
Date Deposited: 10 Apr 2017 13:19
Last Modified: 12 Oct 2019 22:25
URI: http://nrl.northumbria.ac.uk/id/eprint/30417

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year

View more statistics