Data loss prevention based on text classification in controlled environments

Author
Kongsgård, Kyrre Wahl
Nordbotten, Nils Agne
Mancini, Federico
Engelstad, Paal E.
Date Issued
2016
Keywords
Datasikkerhet
Klassifikasjon
Permalink
http://hdl.handle.net/20.500.12242/602
https://publications.ffi.no/123456789/602
DOI
10.1007/978-3-319-49806-5_7
Collection
Articles
Description
Kongsgård, Kyrre Wahl; Nordbotten, Nils Agne; Mancini, Federico; Engelstad, Paal E.. Data loss prevention based on text classification in controlled environments. Lecture Notes in Computer Science 2016 ;Volum 10063 LNCS. s. 131-150
1435122.pdf
Size: 522k
Abstract
Loss of sensitive data is a common problem with potentially severe consequences. By categorizing documents according to their sensitivity, security controls can be performed based on this classification. However, errors in the classification process may effectively result in information leakage. While automated classification techniques can be used to mitigate this risk, little work has been done to evaluate the effectiveness of such techniques when sensitive content has been transformed (e.g., a document can be summarized, rewritten, or have paragraphs copy-pasted into a new one). To better handle these more difficult data leaks, this paper proposes the use of controlled environments to detect misclassification. By monitoring the incoming information flow, the documents imported into a controlled environment can be used to better determine the sensitivity of the document(s) created within the same environment. Our evaluation results show that this approach, using techniques from machine learning and information retrieval, provides improved detection of incorrectly classified documents that have been subject to more complex data transformations.
View Meta Data