LADEMU: a modular & continuous approach for generating labelled APT datasets from emulations

Forfatter
Gjerstad, Julie
Kadiric, Fikret
Grov, Gudmund
Kjellstadli, Espen Hammer
Asprusten, Markus Leira
Publisert
2023-01-26
Emneord
Pakkebehandlingssystemer
Stordata
Datasikkerhet
Permalenke
http://hdl.handle.net/20.500.12242/3187
DOI
http://dx.doi.org/10.1109/BigData55660.2022.10020549
Samling
Articles
Description
2022 IEEE International Conference on Big Data. IEEE (Institute of Electrical and Electronics Engineers) 2023 ISBN 978-1-6654-8045-1.
2137404.pdf
Size: 934k
Sammendrag
Development and evaluation of data-driven capabilities for both threat hunting and intrusion detection require high-quality and up-to-date datasets. The generation of such datasets poses multiple challenges, which has led to a general lack of suitable datasets for this domain.One such difficulty is the ability to correctly label each datapoint at a suitable level of granularity. In this paper, we argue that the challenges faced when labelling datasets can to some degree be decoupled from realistic emulations of up-to-date attacks and benign behaviours. We propose a modular labelling approach that can be combined with existing emulation platforms that provide the necessary details used for labelling. A proof-of-concept implementation is provided with our LADEMU (Labelled Apt Datasets from EMUlations) tool, which is integrated with the Mitre CALDERA emulation platform and uses the GHOSTS framework for benign behaviour. LADEMU captures both host and network logs and labels them at a sufficient level of detail to separate the various attack steps. This provides dataset support for the development of data-driven APT, multi-step and kill-chain capabilities. As a case, LADEMU is used to generate a labelled dataset from an intelligence-driven emulation plan of an advanced persistent threat (APT) group.
View Meta Data