Large Scale Malware Analysis
Large Scale Malware Analysis
Seminare | 2 SWS / 5,0 ECTS |
Veranstalter: | Mohammad Reza Norouzian and Bojan Kolosnjaji |
Beginn: | 2017-10-17 |
Important Dates
Kick-off Meeting: 05.07.2017 - 4pm in the room 01.08.033
News
Topic
Security companies are reporting an exponential growth in the number and variety of malicious executables and domains that need to be analyzed on a daily basis. In order to properly detect and analyze millions of samples, engineers need to make use of technologies stemming from areas like Big Data and Machine Learning/Data Mining. These technologies are potentially helpful in automating reverse engineering and analyzing malware on a large scale, enabling malware analysts to focus their efforts properly and design countermeasures in appropriate time.
There is an increasing number of papers from academia and industry in this direction and we will be studying them in this seminar. The topic of the seminar is very useful both for future security experts and data scientists/engineers.
Our papers are classified into 4 subareas:
1) Windows Malware Detection and Analysis
2) Malicious Web Pages and Domains
3) Malware Network Communication
4) Evasion of Machine Learning-based Detectors
The list of papers will be published soon...
News
- First seminar meeting is finished. Here are the introductory slides that include instructions for seminar presentation and report. Very soon we will publish the presentation schedule.
- Kick-off meeting took place. If you were not present, no problem. You can still apply for the seminar by executing the other necessary registration steps. Furthermore, you can look at the slides from the kick-off meeting here.
Registration
- Students should show up in the first kick-off meeting on 05.07.2017 at 4pm in 01.08.033. (as indicated on TUMOnline).
- After the kick-off meeting, the application shoud be sent to by e-mail to Bojan Kolosnjaji. An application consists of a short CV indicating your knowledge and/or work experience related to the course (IT Security, Machine Learning, Data Mining, Math...). CVs are to be sent until 14.07.2017! After that we start with the selection.
- Students do not need to register on TUMonline personally; this will be done by our chair. However, students must apply for the course through the matching system.
Prerequisites:
Must have: Basic IT Security
Nice to have: Machine Learning/Data Mining
Tasks for students:
Each student will be assigned with two research papers. After studying the papers, each student is required to write a short report about the chosen papers and make a 20 minutes presentation + discussion. Report is 14 pages LNCS in total and the deadline for submission will be given on the first seminar meeting.
Presentations are given on the seminar meetings.
Paper List:
Malware Detection and Analysis
1) Automated Synthesis of Semantic Malware Signatures using Maximum Satisfiability (taken)
http://wp.internetsociety.org/ndss/wp-content/uploads/sites/25/2017/09/ndss2017_03B-2_Feng_paper.pdf
2) Automatic Application Identification from Billions of Files (taken)
https://pdfs.semanticscholar.org/82c9/de0e1f8534f1fb659f2bac32df7fc7b2f9bf.pdf
3) BitShred: Feature Hashing Malware for Scalable Triage and Semantic Analysis
http://www.cs.cmu.edu/afs/cs/Web/People/shobha/research/ccs116-jang.pdf
4) Automatically Inferring Malware Signatures for Anti-Virus Assisted Attacks (taken)
https://www.sec.cs.tu-bs.de/pubs/2017-asiaccs.pdf
5) Comprehensive Analysis and Detection of Flash-based Malware
https://www.sec.cs.tu-bs.de/pubs/2016b-dimva.pdf
Analyzing Network Data to Detect Intrusions, Malware Propagation and Communication
1) A Lustrum of Malware Network Communication: Evolution and Insights
http://astrolavos.gatech.edu/articles/sp17-candia.pdf
2) Catching Worms, Trojan Horses and PUPs: Unsupervised Detection of Silent Delivery Campaigns
http://wp.internetsociety.org/ndss/wp-content/uploads/sites/25/2017/09/ndss2017_03B-5_Kwon_paper.pdf
3) Network Intrusion Detection Based on Semi-supervised Variational Auto-Encoder (taken)
https://link.springer.com/chapter/10.1007/978-3-319-66399-9_19
Malicious Domains and Web Pages
1) Detecting Malicious Domains via Graph Inference
http://link.springer.com/chapter/10.1007%2F978-3-319-11203-9_1
2) Prophiler: A Fast Filter for the Large-Scale Detection of Malicious Web Pages (taken)
https://hal.archives-ouvertes.fr/hal-00727271/document
3) EXPOSURE: Finding Malicious Domains Using Passive DNS Analysis
https://www.iseclab.org/papers/bilge-ndss11.pdf
4) Building a Dynamic Reputation System for DNS (taken)
https://www.usenix.org/legacy/event/sec10/tech/full_papers/Antonakakis.pdf
5) Gossip: Automatically Identifying Malicious Domains from Mailing List Discussions (taken)
https://www.cs.ucsb.edu/~vigna/publications/2017_AsiaCCS_gossip.pdf
Evasion and Poisoning
1) Poisoning Behavioral Malware Clustering
https://www.sec.cs.tu-bs.de/pubs/2014-aisec.pdf
2) Automatically Evading Classifiers (taken)
3) When a Tree Falls: Using Diversity in Ensemble Classifiers to Identify Evasion in Malware Detectors
4) Adversarial Examples for Malware Detection
https://link.springer.com/chapter/10.1007/978-3-319-66399-9_4
5) Yes, Machine Learning Can Be More Secure! A Case Study on Android Malware Detection
https://www.sec.cs.tu-bs.de/pubs/2017-tdsc.pdf
Schedule:
Title | Speaker | Date |
Kick-off meeting
|
Bojan Kolosnjaji Mohammad Norouzian
|
05.07.2017. |
Automated Synthesis of Semantic Malware Signatures using Maximum Satisfiability | Bakri Bitar | 09.01.2018 |
Automatically Inferring Malware Signatures for Anti-Virus Assisted Attacks | Ece Kubilay | 16.01.2018. |
Network Intrusion Detection Based on Semi-supervised Variational Auto-Encoder | Muhammad Shoaib Khan | |
Prophiler: A Fast Filter for the Large-Scale Detection of Malicious Web Pages | Daniel Wessel | 23.01.2018. |
Building a Dynamic Reputation System for DNS | Agastya Alfath | |
Gossip: Automatically Identifying Malicious Domains from Mailing List Discussions | Yannick Gehring | 30.01.2018. |
Automatically Evading Classifiers | Zaryab Khan | |
Automatic Application Identification from Billions of Files | Cai Liu | 06.02.2018. |
Detecting Malicious Domains via Graph Inference
|
Youdan Zhang | |
Presentation Guidelines
Each student makes a presentation about the given paper. The time given for the presentation is 30 minutes, including discussion. We recommend to take 20 minutes for actual presentation and leave around 10 minutes for discussion. Presentations should be in a style of conference/workshop talks. A good presentation will:
- give correct and accurately displayed information about the paper,
- present all the important points of the paper,
- contain an understandable explanation for your colleague students, especially about the used method and the results of the paper,
- initiate a good discussion.