Large Scale Malware Analysis

Seminare	2 SWS / 5,0 ECTS
Veranstalter:	Mohammad Reza Norouzian and Bojan Kolosnjaji
Beginn:	2017-10-17

Important Dates

Kick-off Meeting: 05.07.2017 - 4pm in the room 01.08.033

News

Topic

Security companies are reporting an exponential growth in the number and variety of malicious executables and domains that need to be analyzed on a daily basis. In order to properly detect and analyze millions of samples, engineers need to make use of technologies stemming from areas like Big Data and Machine Learning/Data Mining. These technologies are potentially helpful in automating reverse engineering and analyzing malware on a large scale, enabling malware analysts to focus their efforts properly and design countermeasures in appropriate time.

There is an increasing number of papers from academia and industry in this direction and we will be studying them in this seminar. The topic of the seminar is very useful both for future security experts and data scientists/engineers.

Our papers are classified into 4 subareas:

1) Windows Malware Detection and Analysis

2) Malicious Web Pages and Domains

3) Malware Network Communication

4) Evasion of Machine Learning-based Detectors

The list of papers will be published soon...

News

First seminar meeting is finished. Here are the introductory slides that include instructions for seminar presentation and report. Very soon we will publish the presentation schedule.
Kick-off meeting took place. If you were not present, no problem. You can still apply for the seminar by executing the other necessary registration steps. Furthermore, you can look at the slides from the kick-off meeting here.

Registration

Students should show up in the first kick-off meeting on 05.07.2017 at 4pm in 01.08.033. (as indicated on TUMOnline).
After the kick-off meeting, the application shoud be sent to by e-mail to Bojan Kolosnjaji. An application consists of a short CV indicating your knowledge and/or work experience related to the course (IT Security, Machine Learning, Data Mining, Math...). CVs are to be sent until 14.07.2017! After that we start with the selection.
Students do not need to register on TUMonline personally; this will be done by our chair. However, students must apply for the course through the matching system.

Prerequisites:

Must have: Basic IT Security

Nice to have: Machine Learning/Data Mining

Tasks for students:

Each student will be assigned with two research papers. After studying the papers, each student is required to write a short report about the chosen papers and make a 20 minutes presentation + discussion. Report is 14 pages LNCS in total and the deadline for submission will be given on the first seminar meeting.

Presentations are given on the seminar meetings.

Paper List:

Malware Detection and Analysis

1) Automated Synthesis of Semantic Malware Signatures using Maximum Satisfiability (taken)

http://wp.internetsociety.org/ndss/wp-content/uploads/sites/25/2017/09/ndss2017_03B-2_Feng_paper.pdf

2) Automatic Application Identification from Billions of Files (taken)

https://pdfs.semanticscholar.org/82c9/de0e1f8534f1fb659f2bac32df7fc7b2f9bf.pdf

3) BitShred: Feature Hashing Malware for Scalable Triage and Semantic Analysis

http://www.cs.cmu.edu/afs/cs/Web/People/shobha/research/ccs116-jang.pdf

4) Automatically Inferring Malware Signatures for Anti-Virus Assisted Attacks (taken)

https://www.sec.cs.tu-bs.de/pubs/2017-asiaccs.pdf

5) Comprehensive Analysis and Detection of Flash-based Malware

https://www.sec.cs.tu-bs.de/pubs/2016b-dimva.pdf

Analyzing Network Data to Detect Intrusions, Malware Propagation and Communication

1) A Lustrum of Malware Network Communication: Evolution and Insights

http://astrolavos.gatech.edu/articles/sp17-candia.pdf

2) Catching Worms, Trojan Horses and PUPs: Unsupervised Detection of Silent Delivery Campaigns

http://wp.internetsociety.org/ndss/wp-content/uploads/sites/25/2017/09/ndss2017_03B-5_Kwon_paper.pdf

3) Network Intrusion Detection Based on Semi-supervised Variational Auto-Encoder (taken)

https://link.springer.com/chapter/10.1007/978-3-319-66399-9_19

Malicious Domains and Web Pages

1) Detecting Malicious Domains via Graph Inference

http://link.springer.com/chapter/10.1007%2F978-3-319-11203-9_1

2) Prophiler: A Fast Filter for the Large-Scale Detection of Malicious Web Pages (taken)

https://hal.archives-ouvertes.fr/hal-00727271/document

3) EXPOSURE: Finding Malicious Domains Using Passive DNS Analysis

https://www.iseclab.org/papers/bilge-ndss11.pdf

4) Building a Dynamic Reputation System for DNS (taken)

https://www.usenix.org/legacy/event/sec10/tech/full_papers/Antonakakis.pdf

5) Gossip: Automatically Identifying Malicious Domains from Mailing List Discussions (taken)

https://www.cs.ucsb.edu/~vigna/publications/2017_AsiaCCS_gossip.pdf

Evasion and Poisoning

1) Poisoning Behavioral Malware Clustering

https://www.sec.cs.tu-bs.de/pubs/2014-aisec.pdf

2) Automatically Evading Classifiers (taken)

http://wp.internetsociety.org/ndss/wp-content/uploads/sites/25/2017/09/automatically-evading-classifiers.pdf

3) When a Tree Falls: Using Diversity in Ensemble Classifiers to Identify Evasion in Malware Detectors

http://wp.internetsociety.org/ndss/wp-content/uploads/sites/25/2017/09/when-tree-falls-using-diversity-ensemble-classifiers-identify-evasion-malware-detectors.pdf

4) Adversarial Examples for Malware Detection

https://link.springer.com/chapter/10.1007/978-3-319-66399-9_4

5) Yes, Machine Learning Can Be More Secure! A Case Study on Android Malware Detection

https://www.sec.cs.tu-bs.de/pubs/2017-tdsc.pdf

Schedule:

Title	Speaker	Date
Kick-off meeting	Bojan Kolosnjaji Mohammad Norouzian	05.07.2017.
Automated Synthesis of Semantic Malware Signatures using Maximum Satisfiability	Bakri Bitar	09.01.2018
		09.01.2018
Automatically Inferring Malware Signatures for Anti-Virus Assisted Attacks	Ece Kubilay	16.01.2018.
Network Intrusion Detection Based on Semi-supervised Variational Auto-Encoder	Muhammad Shoaib Khan	16.01.2018.
Prophiler: A Fast Filter for the Large-Scale Detection of Malicious Web Pages	Daniel Wessel	23.01.2018.
Building a Dynamic Reputation System for DNS	Agastya Alfath	23.01.2018.
Gossip: Automatically Identifying Malicious Domains from Mailing List Discussions	Yannick Gehring	30.01.2018.
Automatically Evading Classifiers	Zaryab Khan	30.01.2018.
Automatic Application Identification from Billions of Files	Cai Liu	06.02.2018.
Detecting Malicious Domains via Graph Inference	Youdan Zhang	06.02.2018.

Presentation Guidelines

Each student makes a presentation about the given paper. The time given for the presentation is 30 minutes, including discussion. We recommend to take 20 minutes for actual presentation and leave around 10 minutes for discussion. Presentations should be in a style of conference/workshop talks. A good presentation will:

give correct and accurately displayed information about the paper,
present all the important points of the paper,
contain an understandable explanation for your colleague students, especially about the used method and the results of the paper,
initiate a good discussion.