Directory of MSR Datasets

Directory of MSR Datasets

About this directory

Total out of 196 results shown
Greenlight: Highlighting TensorFlow APIs Energy Footprint

Saurabhsingh Rajput, M. Kechagia, Federica Sarro, Tushar Sharma

DOI: 10.1145/3643991.3644894

Publication year: 2024

Topic 1 - Terms: project, code, source, developer, study, api, model, system, technique, type

Abstract: Deep learning (dl) models are being widely deployed in real-world applications, but their usage remains computationally intensive and energy-hungry. While prior work has examined model-level energy... Read more

Other Data
FAIR Score: 58.33%
A Four-Dimension Gold Standard Dataset for Opinion Mining in Software Engineering

Md Rakibul Islam, Md. Fazle Rabbi, Jo Youngeun, A. I. Champa, Ethan Young, Camden Wilson, Gavin Scott, M. Zibran

DOI: 10.1145/3643991.3644893

Publication year: 2024

Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github

Abstract: We present the first four-dimension gold standard dataset to advance opinion mining focused on the software engineering domain. Through a well-defined sampling and annotation strategy leveraging mu... Read more

Developer Metrics
FAIR Score: 54.17%
DATAR: A Dataset for Tracking App Releases

Yasaman Abedini, Mohammad Hadi Hajihosseini, Abbas Heydarnoori

DOI: 10.1145/3643991.3644892

Publication year: 2024

Topic 10 - Terms: repository, issue, information, system, source, project, github, file, evolution, developer

Abstract: Android apps continuously evolve to meet user expectations and thrive in the competitive environment of app stores. Hence, making informed decisions is crucial for the success of upcoming releases.... Read more

Version Control
FAIR Score: 70.83%
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars

Christian Birchler, Cyrill Rohrbach, Timo Kehrer, Sebastiano Panichella

DOI: 10.1145/3643991.3644891

Publication year: 2024

Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis

Abstract: Developing tools in the context of autonomous systems [22], [24], such as self-driving cars (SDCs), is time-consuming and costly since researchers and practitioners rely on expensive computing hard... Read more

Other Data
FAIR Score: 70.83%
A Dataset of Microservices-based Open-Source Projects

Dario Amoroso d'Aragona, Alexander Bakhtin, Xiaozhou Li, Ruoyu Su, Lauren Adams, Ernesto Aponte, Francis Boyle, Patrick Boyle, Rachel Koerner, Joseph Lee, Fangchao Tian, Yuqing Wang, Jesse Nyyssölä, Ernesto Quevedo, Shahidur Md Rahaman, Amr S. Abdelfattah, Mika Mäntylä, Tomás Cerný, Davide Taibi

DOI: 10.1145/3643991.3644890

Publication year: 2024

Topic 1 - Terms: project, code, source, developer, study, api, model, system, technique, type

Abstract: Researchers in the microservices community often resort to demonstrating the impact of their proposed advancements on custom- made microservices projects. This is a possible source of bias that can... Read more

Version Control
FAIR Score: 58.33%
P3: A Dataset of Partial Program Patches

Dirk Beyer, Lars Grunske, Matthias Kettl, Marian Lingsch-Rosenfeld, Moeketsi Raselimo

DOI: 10.1145/3643991.3644889

Publication year: 2024

Topic 9 - Terms: repository, bug, source, project, code, information, method, developer, system, present

Abstract: Identifying and fixing bugs in programs remains a challenge and is one of the most time-consuming tasks in software development. But even after a bug is identified, and a fix has been proposed by a... Read more

Software Issues
FAIR Score: 35.42%
The PIPr Dataset of Public Infrastructure as Code Programs

Daniel Sokolowski, David Spielmann, Guido Salvaneschi

DOI: 10.1145/3643991.3644888

Publication year: 2024

Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github

Abstract: With Programming Languages Infrastructure as Code (PL-IaC), developers implement IaC programs in popular imperative programming languages like Python and Typescript. Such programs generate the decl... Read more

Other Data
FAIR Score: 75%
Incivility in Open Source Projects: A Comprehensive Annotated Dataset of Locked GitHub Issue Threads

Ramtin Ehsani, M. M. Imran, Robert Zita, Kostadin Damevski, Preetha Chatterjee

DOI: 10.1145/3643991.3644887

Publication year: 2024

Topic 3 - Terms: code, review, project, source, present, change, set, github, repository, quality

Abstract: In the dynamic landscape of open source software (OSS) development, understanding and addressing incivility within issue discussions is crucial for fostering healthy and productive collaborations. ... Read more

Other Data
FAIR Score: 27.08%
MegaVul: A C/C++ Vulnerability Dataset with Comprehensive Code Representations

Chao Ni, Liyu Shen, Xiaohu Yang, Yan Zhu, Shaohua Wang

DOI: 10.1145/3643991.3644886

Publication year: 2024

Topic 1 - Terms: project, code, source, developer, study, api, model, system, technique, type

Abstract: We constructed a newly large-scale and comprehensive C/C++ vulnerability dataset named MegaVul by crawling the Common Vulnerabilities and Exposures (CVE) database and CVE-related open-source projec... Read more

Software Issues
FAIR Score: 27.08%
AW4C: A Commit-Aware C Dataset for Actionable Warning Identification

Zhipeng Liu, Meng Yan, Zhipeng Gao, Dong Li, Xiaohong Zhang, Dan Yang

DOI: 10.1145/3643991.3644885

Publication year: 2024

Topic 9 - Terms: repository, bug, source, project, code, information, method, developer, system, present

Abstract: Excessive non-actionable warnings generated by static program analysis tools can hinder developers from utilizing these tools effectively. Leveraging learning-based approaches for actionable warnin... Read more

Software Issues
FAIR Score: 58.33%
MalwareBench: Malware samples are not enough

Nusrat Zahan, Philipp Burckhardt, Mikola Lysenko, Feross Aboukhadijeh, Laurie Williams

DOI: 10.1145/3643991.3644883

Publication year: 2024

Topic 11 - Terms: source, defect, present, developer, android, test, security, development, project, datasets

Abstract: The prevalent use of third-party components in modern software development, rapid modernization, and digitization have significantly amplified the risk of software supply chain attacks. Popular lar... Read more

Software Issues
FAIR Score: %
DistilKaggle: A Distilled Dataset of Kaggle Jupyter Notebooks

Mojtaba Mostafavi Ghahfarokhi, Arash Asgari, Mohammad Abolnejadian, Abbas Heydarnoori

DOI: 10.1145/3643991.3644882

Publication year: 2024

Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis

Abstract: Jupyter notebooks have become indispensable tools for data analysis and processing in various domains. However, despite their widespread use, there is a notable research gap in understanding and an... Read more

Version Control
FAIR Score: 75%
SATDAUG - A Balanced and Augmented Dataset for Detecting Self-Admitted Technical Debt

E. Sutoyo, Andrea Capiluppi

DOI: 10.1145/3643991.3644880

Publication year: 2024

Topic 12 - Terms: project, source, code, development, clone, set, bug, github, os, datasets

Abstract: Self-admitted technical debt (SATD) refers to a form of technical debt in which developers explicitly acknowledge and document the existence of technical shortcuts, workarounds, or temporary soluti... Read more

Software Issues
FAIR Score: 70.83%
BugsPHP: A dataset for Automated Program Repair in PHP

K. D. Pramod, W.T.N. De Silva, W.U.K. Thabrew, Ridwan Shariffdeen, Sandareka Wickramanayake

DOI: 10.1145/3643991.3644878

Publication year: 2024

Topic 9 - Terms: repository, bug, source, project, code, information, method, developer, system, present

Abstract: Automated Program Repair (APR) improves developer productivity by saving debugging and bug-fixing time. While APR has been extensively explored for C/C++ and Java programs, there is little research... Read more

Software Issues
FAIR Score: 27.08%
Bidirectional Paper-Repository Tracing in Software Engineering

Daniel Garijo, Miguel Arroyo, Esteban Gonzalez, Christoph Treude, Nicola Tarocco

DOI: 10.1145/3643991.3644876

Publication year: 2024

Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github

Abstract: While computer science papers frequently include their associated code repositories, establishing a clear link between papers and their corresponding implementations may be challenging due to the n... Read more

Other Data
FAIR Score: 70.83%
TestDossier: A Dataset of Tested Values Automatically Extracted from Test Execution

André Hora

DOI: 10.1145/3643991.3644875

Publication year: 2024

Topic 2 - Terms: source, test, bug, provide, code, analysis, community, method, study, package

Abstract: Real-world test suites are often complex and may have thousands of test cases. In this scenario, it is not easy to spot what values are actually covered by the tests. Having access to every tested ... Read more

Software Issues
FAIR Score: 70.83%
A Dataset of Atoms of Confusion in the Android Open Source Project

Davi Tabosa, Oton Pinheiro, Lincoln S. Rocha, Windson Viana

DOI: 10.1145/3643991.3644874

Publication year: 2024

Topic 4 - Terms: code, bug, repository, github, tool, study, activity, source, technique, apps

Abstract: Ensuring the readability and comprehension of source code is key for effective software maintenance and evolution, particularly in tasks involving bug fixing, refactoring, and optimization. Previou... Read more

Software Issues
FAIR Score: 66.67%
Curated Email-Based Code Reviews Datasets

Mingzhao Liang, Wachiraphan Charoenwet, Patanamon Thongtanunam

DOI: 10.1145/3643991.3644872

Publication year: 2024

Topic 3 - Terms: code, review, project, source, present, change, set, github, repository, quality

Abstract: Code review is an important practice that improves the overall quality of a proposed patch (i.e. code changes). While much research focused on tool-based code reviews (e.g. a Gerrit code review too... Read more

Developer Metrics
FAIR Score: 54.17%
TrickyBugs: A Dataset of Corner-case Bugs in Plausible Programs

Kaibo Liu, Yudong Han, Yiyang Liu, Zhenpeng Chen, Jie M. Zhang, Federica Sarro, Gang Huang, Yun Ma

DOI: 10.1145/3643991.3644870

Publication year: 2024

Topic 4 - Terms: code, bug, repository, github, tool, study, activity, source, technique, apps

Abstract: We call a program that passes existing tests but still contains bugs as a buggy plausible program. Bugs in such a program can bypass the testing environment and enter the production environment, ca... Read more

Software Issues
FAIR Score: 70.83%
PlayMyData: a curated dataset of multi-platform video games

Andrea D’Angelo, Claudio Di Sipio, Cristiano Politowski, Riccardo Rubei

DOI: 10.1145/3643991.3644869

Publication year: 2024

Topic 12 - Terms: project, source, code, development, clone, set, bug, github, os, datasets

Abstract: Being predominant in digital entertainment for decades, video games have been recognized as valuable software artifacts by the software engineering (SE) community just recently. Such an acknowledgm... Read more

Other Data
FAIR Score: 75%
Dataset: Copy-based Reuse in Open Source Software

Mahmoud Jahanshahi, A. Mockus

DOI: 10.1145/3643991.3644868

Publication year: 2023

Topic 8 - Terms: study, project, model, development, based, repository, source, issue, code, tool

Abstract: In Open Source Software, the source code and any other resources available in a project can be viewed or reused by anyone subject to often permissive licensing restrictions. In contrast to some stu... Read more

Version Control
FAIR Score: %
A dataset of GitHub Actions workflow histories

Guillaume Cardoen, Tom Mens, Alexandre Decan

DOI: 10.1145/3643991.3644867

Publication year: 2024

Topic 10 - Terms: repository, issue, information, system, source, project, github, file, evolution, developer

Abstract: GitHub Actions is the de facto workflow automation tool for GitHub repositories. Its popularity has increased dramatically over the recent years, opening up opportunities for empirical studies rela... Read more

Version Control
FAIR Score: 75%
AndroLibZoo: A Reliable Dataset of Libraries Based on Software Dependency Analysis

Jordan Samhi, Marco Alecci, Tegawend'e F. Bissyand'e, Jacques Klein

DOI: 10.1145/3643991.3644866

Publication year: 2023

Topic 1 - Terms: project, code, source, developer, study, api, model, system, technique, type

Abstract: Android app developers extensively employ code reuse, integrating many third-party libraries into their apps. While such integration is practical for developers, it can be challenging for static an... Read more

Other Data
FAIR Score: 70.83%
Opening the Valve on Pure-Data: Usage Patterns and Programming Practices of a Data-Flow Based Visual Programming Language

Anisha Islam, Kalvin Eng, Abram Hindle

DOI: 10.1145/3643991.3644865

Publication year: 2024

Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github

Abstract: Pure Data (PD), a data-flow based visual programming language utilized for music and sound synthesis, remains underexplored in software engineering research. Existing literature fails to address th... Read more

Semantic Metrics
FAIR Score: 70.83%
CodeLL: A Lifelong Learning Dataset to Support the Co-Evolution of Data and Language Models of Code

M. Weyssow, Claudio Di Sipio, Davide Di Ruscio, H. Sahraoui

DOI: 10.1145/3643991.3644864

Publication year: 2023

Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis

Abstract: Motivated by recent work on lifelong learning applications for language models (LMs) of code, we introduce CodeLL, a lifelong learning dataset focused on code changes. Our contribution addresses a ... Read more

Semantic Metrics
FAIR Score: 75%
AndroZoo: A Retrospective with a Glimpse into the Future

Marco Alecci, Pedro Jesús Ruiz Jiménez, Kevin Allix, Tégawendé F. Bissyandé, Jacques Klein

DOI: 10.1145/3643991.3644863

Publication year: 2024

Topic 10 - Terms: repository, issue, information, system, source, project, github, file, evolution, developer

Abstract: In 2016, we released AndroZoo, a continuously expanding dataset of Android applications that aggregates apps from various sources, including the official Google Play app market. As of today, AndroZ... Read more

Other Data
FAIR Score: %
GitHub OSS Governance File Dataset

Yibo Yan, Seth Frey, Amy Zhang, V. Filkov, Likang Yin

DOI: 10.1109/MSR59073.2023.00089

Publication year: 2023

Topic 12 - Terms: project, source, code, development, clone, set, bug, github, os, datasets

Abstract: Open-source Software (OSS) has become a valuable resource in both industry and academia over the last few decades. Despite the innovative structures they develop to support the projects, OSS projec... Read more

Developer Metrics
FAIR Score: 75%
Defectors: A Large, Diverse Python Dataset for Defect Prediction

Parvez Mahbub, Ohiduzzaman Shuvo, M. M. Rahman

DOI: 10.1109/MSR59073.2023.00085

Publication year: 2023

Topic 1 - Terms: project, code, source, developer, study, api, model, system, technique, type

Abstract: Defect prediction has been a popular research topic where machine learning (ML) and deep learning (DL) have found numerous applications. However, these ML/DL-based defect prediction models are ofte... Read more

Software Issues
FAIR Score: 75%
LLMSecEval: A Dataset of Natural Language Prompts for Security Evaluations

Catherine Tony, Markus Mutas, Nicolás E. Díaz Ferreyra, R. Scandariato

DOI: 10.1109/MSR59073.2023.00084

Publication year: 2023

Topic 3 - Terms: code, review, project, source, present, change, set, github, repository, quality

Abstract: Large Language Models (LLMs) like Codex are powerful tools for performing code completion and code generation tasks as they are trained on billions of lines of code from publicly available sources.... Read more

Semantic Metrics
FAIR Score: 58.33%
Snapshot Testing Dataset

Emily Bui, H. Rocha

DOI: 10.1109/MSR59073.2023.00081

Publication year: 2023

Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github

Abstract: Snapshot testing is a form of software testing that is focused on visual components by highlighting any code changes when compared to a previously stored state. This quick and simple method of test... Read more

Software Issues
FAIR Score: 79.17%
PyMigBench: A Benchmark for Python Library Migration

Mohayeminul Islam, Ajay Kumar Jha, Sarah Nadi, Ildar Akhmetov

DOI: 10.1109/MSR59073.2023.00075

Publication year: 2023

Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis

Abstract: Developers heavily rely on Application Programming Interfaces (APIs) from libraries to build their projects. However, libraries might become obsolete, or new libraries with better APIs might become... Read more

Other Data
FAIR Score: 58.33%
A Dataset of Bot and Human Activities in GitHub

Natarajan Chidambaram, Alexandre Decan, T. Mens

DOI: 10.1109/MSR59073.2023.00070

Publication year: 2023

Topic 4 - Terms: code, bug, repository, github, tool, study, activity, source, technique, apps

Abstract: Software repositories hosted on GitHub frequently use development bots to automate repetitive, effort intensive and error-prone tasks. To understand and study how these bots are used, state-of-the-... Read more

Version Control
FAIR Score: 75%
DACOS—A Manually Annotated Dataset of Code Smells

Himesh Nandani, M. Saad, Tushar Sharma

DOI: 10.1109/MSR59073.2023.00067

Publication year: 2023

Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github

Abstract: Researchers apply machine-learning techniques for code smell detection to counter the subjectivity of many code smells. Such approaches need a large, manually annotated dataset for training and ben... Read more

Software Issues
FAIR Score: 75%
PENTACET data - 23 Million Contextual Code Comments and 250,000 SATD comments

Murali Sridharan, Leevi Rantala, M. Mäntylä

DOI: 10.1109/MSR59073.2023.00063

Publication year: 2023

Topic 13 - Terms: source, system, evolution, code, build, set, project, developer, comment, information

Abstract: Most Self-Admitted Technical Debt (SATD) research utilizes explicit SATD features such as ‘TODO’ and ‘FIXME’ for SATD detection. A closer look reveals several SATD research uses simple SATD (‘Easy ... Read more

Semantic Metrics
FAIR Score: 79.17%
DocMine: A Software Documentation-Related Dataset of 950 GitHub Repositories

Akhila Sri Manasa Venigalla, S. Chimalakonda

DOI: 10.1109/MSR59073.2023.00062

Publication year: 2023

Topic 10 - Terms: repository, issue, information, system, source, project, github, file, evolution, developer

Abstract: Software documentation is one of the critical aspects of a software project, that could support multiple tasks throughout the software development life-cycle. There is extensive research on underst... Read more

Version Control
FAIR Score: 75%
SecretBench: A Dataset of Software Secrets

S. Basak, L. Neil, Bradley Reaves, Laurie A. Williams

DOI: 10.1109/MSR59073.2023.00053

Publication year: 2023

Topic 5 - Terms: github, project, source, vulnerability, apps, development, tool, repository, present, information

Abstract: According to GitGuardian’s monitoring of public GitHub repositories, the exposure of secrets (API keys and other credentials) increased two-fold in 2021 compared to 2020, totaling more than six mil... Read more

Other Data
FAIR Score: %
HasBugs - Handpicked Haskell Bugs

Leonhard Applis, Annibale Panichella

DOI: 10.1109/MSR59073.2023.00040

Publication year: 2023

Topic 2 - Terms: source, test, bug, provide, code, analysis, community, method, study, package

Abstract: We present HasBugs, an extensible and manually-curated dataset of real-world 25 Haskell Bugs from 6 open source repositories. We provide a faulty, tested, and fixed version of each bug in our datas... Read more

Software Issues
FAIR Score: 79.17%
Semantically-enriched Jira Issue Tracking Data

Themistoklis G. Diamantopoulos, Dimitrios Nastos, A. Symeonidis

DOI: 10.1109/MSR59073.2023.00039

Publication year: 2023

Topic 1 - Terms: project, code, source, developer, study, api, model, system, technique, type

Abstract: Current state of practice dictates that software developers host their projects online and employ project management systems to monitor the development of product features, keep track of bugs, and ... Read more

Software Issues
FAIR Score: 79.17%
microSecEnD: A Dataset of Security-Enriched Dataflow Diagrams for Microservice Applications

S. Schneider, Tufan Özen, Michael Chen, R. Scandariato

DOI: 10.1109/MSR59073.2023.00030

Publication year: 2023

Topic 11 - Terms: source, defect, present, developer, android, test, security, development, project, datasets

Abstract: Dataflow diagrams (DFDs) are useful resources in securing applications since they show a software system’s architecture and allow assessing architectural security and weaknesses. Enriching them wit... Read more

Other Data
FAIR Score: 66.67%
GIRT-Data: Sampling GitHub Issue Report Templates

Nafiseh Nikeghbal, Amir Hossein Kargaran, A. Heydarnoori, Hinrich Schutze

DOI: 10.1109/MSR59073.2023.00026

Publication year: 2023

Topic 9 - Terms: repository, bug, source, project, code, information, method, developer, system, present

Abstract: GitHub’s issue reports provide developers with valuable information that is essential to the evolution of a software development project. Contributors can use these reports to perform software engi... Read more

Other Data
FAIR Score: 27.08%
NICHE: A Curated Dataset of Engineered Machine Learning Projects in Python

Ratnadira Widyasari, Zhou Yang, Ferdian Thung, Sheng Qin Sim, Fiona Wee, Camellia Lok, Jack Phan, Haodi Qi, Constance Tan, Qijin Tay, David Lo

DOI: 10.1109/MSR59073.2023.00022

Publication year: 2023

Topic 12 - Terms: project, source, code, development, clone, set, bug, github, os, datasets

Abstract: Machine learning (ML) has gained much attention and has been incorporated into our daily lives. While there are numerous publicly available ML projects on open source platforms such as GitHub, ther... Read more

Version Control
FAIR Score: 54.17%
PTMTorrent: A Dataset for Mining Open-source Pre-trained Model Packages

Wenxin Jiang, Nicholas Synovic, Purvish Jajal, Taylor R. Schorlemmer, Arav Tewari, Bhavesh Pareek, G. Thiruvathukal, James C. Davis

DOI: 10.1109/MSR59073.2023.00021

Publication year: 2023

Topic 8 - Terms: study, project, model, development, based, repository, source, issue, code, tool

Abstract: Due to the cost of developing and training deep learning models from scratch, machine learning engineers have begun to reuse pre-trained models (PTMs) and fine-tune them for downstream tasks. PTM r... Read more

Semantic Metrics
FAIR Score: %
DeepScenario: An Open Driving Scenario Dataset for Autonomous Driving System Testing

Chengjie Lu, T. Yue, Sajid Ali

DOI: 10.1109/MSR59073.2023.00020

Publication year: 2023

Topic 11 - Terms: source, defect, present, developer, android, test, security, development, project, datasets

Abstract: With the rapid development of autonomous driving systems (ADSs), testing ADSs under various environmental conditions has become a key method to ensure the successful deployment of ADS in the real w... Read more

Other Data
FAIR Score: 79.17%
ManyTypes4TypeScript: A Comprehensive TypeScript Dataset for Sequence-Based Type Inference

Kevin Jesse, krjesse

DOI: 10.1145/3524842.3528507

Publication year: 2022

Topic 1 - Terms: project, code, source, developer, study, api, model, system, technique, type

Abstract: In this paper, we present ManyTypes4TypeScript, a very large corpus for training and evaluating machine-learning models for sequence-based type inference in TypeScript. The dataset includes over 9 ... Read more

Semantic Metrics
FAIR Score: 79.17%
TSSB-3M: Mining single statement bugs at massive scale

Cedric Richter, H. Wehrheim

DOI: 10.1145/3524842.3528505

Publication year: 2022

Topic 7 - Terms: bug, project, repository, report, developer, code, study, single, source, activity

Abstract: Single statement bugs are one of the most important ingredients in the evaluation of modern bug detection and automatic program repair methods. By affecting only a single statement, single statemen... Read more

Software Issues
FAIR Score: 79.17%
TwinDroid: A Dataset of Android app System call traces and Trace Generation Pipeline

Asma Razagallah, R. Khoury, Jean-Baptiste Poulet

DOI: 10.1145/3524842.3528502

Publication year: 2022

Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis

Abstract: System call traces are an invaluable source of information about a program's runtime behavior and be particularly useful for malware detection in Android apps. However, the paucity of publicly avai... Read more

Other Data
FAIR Score: 75%
The General Index of Software Engineering Papers

Zeinab Abou Khalil, Stefano Zacchiroli

DOI: 10.1145/3524842.3528494

Publication year: 2022

Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github

Abstract: We introduce the General Index of Software Engineering Papers, a dataset of fulltext-indexed papers from the most prominent scientific venues in the field of Software Engineering. The dataset inclu... Read more

Other Data
FAIR Score: 75%
AndroOBFS: Time-tagged Obfuscated Android Malware Dataset with Family Information

Saurabh Kumar, Debadatta Mishra, Biswabandan Panda, S. Shukla

DOI: 10.1145/3524842.3528493

Publication year: 2022

Topic 9 - Terms: repository, bug, source, project, code, information, method, developer, system, present

Abstract: With the large-scale adaptation of Android OS and ever-increasing contributions in the Android application space, Android has become the number one target of malware writers. In recent years, a lar... Read more

Software Issues
FAIR Score: 66.67%
A Time Series-Based Dataset of Open-Source Software Evolution

B. L. Sousa, Mariza Bigonha, K. Ferreira, G. Franco

DOI: 10.1145/3524842.3528492

Publication year: 2022

Topic 13 - Terms: source, system, evolution, code, build, set, project, developer, comment, information

Abstract: Software evolution is the process of developing, maintaining, and updating software systems. It is known that the software systems tend to increase their complexity and size over their evolution to... Read more

Software Evolution
FAIR Score: %
A Large-scale Dataset of (Open Source) License Text Variants

Stefano Zacchiroli

DOI: 10.1145/3524842.3528491

Publication year: 2022

Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github

Abstract: We introduce a large-scale dataset of the complete texts of free/open source software (FOSS) license variants. To assemble it we have collected from the Software Heritage archive-the largest public... Read more

Other Data
FAIR Score: 79.17%
ReCover: a Curated Dataset for Regression Testing Research

Francesco Altiero, A. Corazza, S. Martino, A. Peron, L. L. L. Starace

DOI: 10.1145/3524842.3528490

Publication year: 2022

Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github

Abstract: It is recognized in the literature that finding representative data to conduct regression testing research is non-trivial. In our experience within this field, existing datasets are often affected ... Read more

Software Issues
FAIR Score: 66.67%
The Unsolvable Problem or the Unheard Answer? A Dataset of 24,669 Open-Source Software Conference Talks

Kimberly Truong, Courtney Miller, Bogdan Vasilescu, Christian Kästner

DOI: 10.1145/3524842.3528488

Publication year: 2022

Topic 2 - Terms: source, test, bug, provide, code, analysis, community, method, study, package

Abstract: Talks at practitioner-focused open-source software conferences are a valuable source of information for software engineering researchers. They provide a pulse of the community and are valuable sour... Read more

Other Data
FAIR Score: 58.33%
SOSum: A Dataset of Stack Overflow Post Summaries

Bonan Kou, Yifeng Di, Muhao Chen, Tianyi Zhang

DOI: 10.1145/3524842.3528487

Publication year: 2022

Topic 1 - Terms: project, code, source, developer, study, api, model, system, technique, type

Abstract: Stack Overflow (SO) is becoming an indispensable part of modern software development workflow. However, given the limited time, attention, and memory capacity of programmers, navigating SO posts an... Read more

Semantic Metrics
FAIR Score: 27.08%
An Alternative Issue Tracking Dataset of Public Jira Repositories

Lloyd Montgomery, C. Luders, W. Maalej

DOI: 10.1145/3524842.3528486

Publication year: 2022

Topic 10 - Terms: repository, issue, information, system, source, project, github, file, evolution, developer

Abstract: Organisations use issue tracking systems (ITSs) to track and document their projects' work in units called issues. This style of documentation encourages evolutionary refinement, as each issue can ... Read more

Software Issues
FAIR Score: 79.17%
Vul4J: A Dataset of Reproducible Java Vulnerabilities Geared Towards the Study of Program Repair Techniques

Quang-Cuong Bui, R. Scandariato, N. E. D. Ferreyra

DOI: 10.1145/3524842.3528482

Publication year: 2022

Topic 5 - Terms: github, project, source, vulnerability, apps, development, tool, repository, present, information

Abstract: In this work we present Vul4j, a Java vulnerability dataset where each vulnerability is associated to a patch and, most importantly, to a Proof of Vulnerability (PoV) test case. We analyzed 1803 fi... Read more

Software Issues
FAIR Score: 27.08%
FixJS: A Dataset of Bug-fixing JavaScript Commits

Viktor Csuvik, László Vidács

DOI: 10.1145/3524842.3528480

Publication year: 2022

Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis

Abstract: The field of Automated Program Repair (APR) has received increasing attention in recent years both from the academic world and from leading IT companies. Its main goal is to repair software bugs au... Read more

Software Issues
FAIR Score: 58.33%
The OCEAN mailing list data set: Network analysis spanning mailing lists and code repositories

Melanie Warrick, Samuel F. Rosenblatt, Jean-Gabriel Young, Amanda Casari, Laurent H'ebert-Dufresne, J. Bagrow

DOI: 10.1145/3524842.3528479

Publication year: 2022

Topic 11 - Terms: source, defect, present, developer, android, test, security, development, project, datasets

Abstract: Communication surrounding the development of an open source project largely occurs outside the software repository itself. Historically, large communities often used a collection of mailing lists t... Read more

Developer Metrics
FAIR Score: 58.33%
A Versatile Dataset of Agile Open Source Software Projects

Vali Tawosi, A. Al-Subaihin, Rebecca Moussa, Federica Sarro

DOI: 10.1145/3524842.3528029

Publication year: 2022

Topic 10 - Terms: repository, issue, information, system, source, project, github, file, evolution, developer

Abstract: Agile software development is nowadays a widely adopted practise in both open-source and industrial software projects. Agile teams typically heavily rely on issue management tools to document new i... Read more

Software Issues
FAIR Score: 27.08%
ECench: An Energy Bug Benchmark of Ethereum Client Software

Jinyoung Kim, Misoo Kim, Eunseok Lee

DOI: 10.1145/3524842.3528028

Publication year: 2022

Topic 7 - Terms: bug, project, repository, report, developer, code, study, single, source, activity

Abstract: With the introduction of smart contacts, Ethereum has become one of the most popular blockchain networks. In the wake of its popularity, an increasing number of Ethereum-based software have been de... Read more

Software Issues
FAIR Score: 27.08%
TriggerZoo: A Dataset of Android Applications Automatically Infected with Logic Bombs

Jordan Samhi, Tegawend'e F. Bissyand'e, Jacques Klein

DOI: 10.1145/3524842.3528020

Publication year: 2022

Topic 4 - Terms: code, bug, repository, github, tool, study, activity, source, technique, apps

Abstract: Many Android apps analyzers rely, among other techniques, on dynamic analysis to monitor their runtime behavior and detect potential security threats. However, malicious developers use subtle, thou... Read more

Software Issues
FAIR Score: %
DISCO: A Dataset of Discord Chat Conversations for Software Engineering Research

Keerthana Muthu Subash, L. P. Kumar, Srinivas Vadlamani, Preetha Chatterjee, Olga Baysal

DOI: 10.1145/3524842.3528018

Publication year: 2022

Topic 1 - Terms: project, code, source, developer, study, api, model, system, technique, type

Abstract: Today, software developers work on complex and fast-moving projects that often require instant assistance from other domain and subject matter experts. Chat servers such as Discord facilitate live ... Read more

Developer Metrics
FAIR Score: 62.5%
Dataset: Dependency Networks of Open Source Libraries Available Through CocoaPods, Carthage and Swift PM

Kristiina Rahkema, Dietmar Pfahl

DOI: 10.1145/3524842.3528016

Publication year: 2022

Topic 11 - Terms: source, defect, present, developer, android, test, security, development, project, datasets

Abstract: Third party libraries are used to integrate existing solutions for common problems and help speed up development. The use of third party libraries, however, can carry risks, for example through vul... Read more

Semantic Metrics
FAIR Score: 79.17%
Constructing Dataset of Functionally Equivalent Java Methods Using Automated Test Generation Techniques

Yoshiki Higo, S. Matsumoto, S. Kusumoto, Kazuya Yasuda

DOI: 10.1145/3524842.3528015

Publication year: 2022

Topic 1 - Terms: project, code, source, developer, study, api, model, system, technique, type

Abstract: Since programming languages offer a wide variety of grammers, desired functions can be implemented in a variety of ways. We consider that there is a large amount of source code that has different i... Read more

Semantic Metrics
FAIR Score: 75%
METHODS2TEST: A dataset of focal methods mapped to test cases

Michele Tufano, Shao Kun Deng, Neel Sundaresan, Alexey Svyatkovskiy

DOI: 10.1145/3524842.3528009

Publication year: 2022

Topic 9 - Terms: repository, bug, source, project, code, information, method, developer, system, present

Abstract: Unit testing is an essential part of the software development process, which helps to identify issues with source code in early stages of development and prevent regressions. Machine learning has e... Read more

Software Issues
FAIR Score: 27.08%
The Unexplored Treasure Trove of Phabricator Code Reviews

Gunnar Kudrjavets, Nachiappan Nagappan, Ayushi Rastogi

DOI: 10.1145/3524842.3528005

Publication year: 2022

Topic 3 - Terms: code, review, project, source, present, change, set, github, repository, quality

Abstract: Phabricator is a modern code collaboration tool used by popular projects like FreeBSD and Mozilla. However, unlike the other well-known code review environments, such as Gerrit or GitHub, there is ... Read more

Developer Metrics
FAIR Score: 54.17%
DaSEA - A Dataset for Software Ecosystem Analysis

Petya Buchkova, Joakim Hey Hinnerskov, Kasper Olsen, R. Pfeiffer

DOI: 10.1145/3524842.3528004

Publication year: 2022

Topic 2 - Terms: source, test, bug, provide, code, analysis, community, method, study, package

Abstract: Software package managers facilitate reuse and rapid construction of software systems. Since evermore software is distributed via package managers, researchers and practitioners require explicit da... Read more

Version Control
FAIR Score: 75%
GitDelver Enterprise Dataset (GDED): An Industrial Closed-source Dataset for Socio-Technical Research

Nicolas Riquet, Xavier Devroey, B. Vanderose

DOI: 10.1145/3524842.3528003

Publication year: 2022

Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github

Abstract: Conducting socio-technical software engineering research on closed-source software is difficult as most organizations do not want to give access to their code repositories. Most experiments and pub... Read more

Developer Metrics
FAIR Score: 79.17%
SLNET: A Redistributable Corpus of 3rd-party Simulink Models

S. L. Shrestha, Shafiul Azam Chowdhury, Christoph Csallner

DOI: 10.1145/3524842.3528001

Publication year: 2022

Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis

Abstract: MATLAB/Simulink is widely used for model-based design. Engineers create Simulink models and compile them to embedded code, often to control safety-critical cyber-physical systems in automotive, aer... Read more

Other Data
FAIR Score: 79.17%
ApacheJIT: A Large Dataset for Just-In-Time Defect Prediction

Hossein Keshavarz, M. Nagappan

DOI: 10.1145/3524842.3527996

Publication year: 2022

Topic 9 - Terms: repository, bug, source, project, code, information, method, developer, system, present

Abstract: In this paper, we present ApacheJIT, a large dataset for Just-In-Time (JIT) defect prediction. ApacheJIT consists of clean and bug-inducing software changes in 14 popular Apache projects. ApacheJIT... Read more

Software Issues
FAIR Score: 79.17%
EqBench: A Dataset of Equivalent and Non-equivalent Program Pairs

Sahar Badihi, Yi Li, J. Rubin

DOI: 10.1109/MSR52588.2021.00084

Publication year: 2021

Topic 12 - Terms: project, source, code, development, clone, set, bug, github, os, datasets

Abstract: Equivalence checking techniques help establish whether two versions of a program exhibit the same behavior. The majority of popular techniques for formally proving/refuting equivalence are evaluate... Read more

Other Data
FAIR Score: 45.83%
GE526: A Dataset of Open-Source Game Engines

Dheeraj Vagavolu, Vartika Agrahari, S. Chimalakonda, Akhila Sri Manasa Venigalla

DOI: 10.1109/MSR52588.2021.00083

Publication year: 2021

Topic 7 - Terms: bug, project, repository, report, developer, code, study, single, source, activity

Abstract: Game engines, are frameworks that provide a platform for developers to build games with an interface tailored to handle the complexity of game development. Though there is extensive empirical resea... Read more

Other Data
FAIR Score: 33.33%
Andror2: A Dataset of Manually-Reproduced Bug Reports for Android apps

Tyler Wendland, Jingyang Sun, Junayed Mahmud, S M Hasan Mansur, Steven Huang, Kevin Moran, J. Rubin, M. Fazzini

DOI: 10.1109/MSR52588.2021.00082

Publication year: 2021

Topic 7 - Terms: bug, project, repository, report, developer, code, study, single, source, activity

Abstract: Software maintenance constitutes a large portion of the software development lifecycle. To carry out maintenance tasks, developers often need to understand and reproduce bug reports. As such, there... Read more

Software Issues
FAIR Score: 79.17%
Apache Software Foundation Incubator Project Sustainability Dataset

Likang Yin, Zhiyuan Zhang, Qi Xuan, V. Filkov

DOI: 10.1109/MSR52588.2021.00081

Publication year: 2021

Topic 12 - Terms: project, source, code, development, clone, set, bug, github, os, datasets

Abstract: Open Source Software success and sustainability is critically important for the digital infrastructure as OSS is used broadly and yet 83+% of such projects fail. To increase chances of success many... Read more

Developer Metrics
FAIR Score: 75%
QScored: A Large Dataset of Code Smells and Quality Metrics

Tushar Sharma, Marouane Kessentini

DOI: 10.1109/MSR52588.2021.00080

Publication year: 2021

Topic 3 - Terms: code, review, project, source, present, change, set, github, repository, quality

Abstract: Code quality aspects such as code smells and code quality metrics are widely used in exploratory and empirical software engineering research. In such studies, researchers spend a substantial amount... Read more

Software Issues
FAIR Score: 79.17%
ManyTypes4Py: A Benchmark Python Dataset for Machine Learning-based Type Inference

A. Mir, Evaldas Latoskinas, Georgios Gousios

DOI: 10.1109/MSR52588.2021.00079

Publication year: 2021

Topic 1 - Terms: project, code, source, developer, study, api, model, system, technique, type

Abstract: In this paper, we present ManyTypes4Py, a large Python dataset for machine learning (ML)-based type inference. The dataset contains a total of 5,382 Python projects with more than 869K type annotat... Read more

Semantic Metrics
FAIR Score: 79.17%
Andromeda: A Dataset of Ansible Galaxy Roles and Their Evolution

R. Opdebeeck, Ahmed Zerouali, Coen De Roover

DOI: 10.1109/MSR52588.2021.00078

Publication year: 2021

Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis

Abstract: Cloud-native applications increasingly provision infrastructure resources programmatically through Infrastructure as Code (IaC) scripts. These scripts have in turn become the subject of empirical s... Read more

Other Data
FAIR Score: 54.17%
Search4Code: Code Search Intent Classification Using Weak Supervision

N. Rao, Chetan Bansal, Joe Guan

DOI: 10.1109/MSR52588.2021.00077

Publication year: 2021

Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github

Abstract: Developers use search for various tasks such as finding code, documentation, debugging information, etc. In particular, web search is heavily used by developers for finding code examples and snippe... Read more

Semantic Metrics
FAIR Score: 27.08%
AndroCT: Ten Years of App Call Traces in Android

Wen Li, Xiaoqin Fu, Haipeng Cai

DOI: 10.1109/MSR52588.2021.00076

Publication year: 2021

Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github

Abstract: Data-driven approaches have proven to be promising in mobile software analysis, yet these approaches rely on sizable and quality datasets. For Android app analysis in particular, there have been se... Read more

Other Data
FAIR Score: 79.17%
The Wonderless Dataset for Serverless Computing

Nafise Eskandani, G. Salvaneschi

DOI: 10.1109/MSR52588.2021.00075

Publication year: 2021

Topic 9 - Terms: repository, bug, source, project, code, information, method, developer, system, present

Abstract: Function as a Service (FaaS) has grown in popularity in recent years, with an increasing number of applications following the Serverless computing model. Serverless computing supports out of the bo... Read more

Version Control
FAIR Score: 79.17%
Sampling Projects in GitHub for MSR Studies

Ozren Dabić, Emad Aghajani, G. Bavota

DOI: 10.1109/MSR52588.2021.00074

Publication year: 2021

Topic 8 - Terms: study, project, model, development, based, repository, source, issue, code, tool

Abstract: Almost every Mining Software Repositories (MSR) study requires, as first step, the selection of the subject software repositories. These repositories are usually collected from hosting services lik... Read more

Other Data
FAIR Score: 79.17%
A Traceability Dataset for Open Source Systems

Mouna Hammoudi, Christoph Mayr-Dorn, A. Mashkoor, Alexander Egyed

DOI: 10.1109/MSR52588.2021.00073

Publication year: 2021

Topic 9 - Terms: repository, bug, source, project, code, information, method, developer, system, present

Abstract: Software engineers use requirement-to-method trace matrices to indicate the methods implementing different system requirements. Requirement-to-method trace matrices pinpoint the exact method implem... Read more

Other Data
FAIR Score: 75%
KGTorrent: A Dataset of Python Jupyter Notebooks from Kaggle

L. Quaranta, Fabio Calefato, F. Lanubile

DOI: 10.1109/MSR52588.2021.00072

Publication year: 2021

Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis

Abstract: Computational notebooks have become the tool of choice for many data scientists and practitioners for performing analyses and disseminating results. Despite their increasing popularity, the researc... Read more

Other Data
FAIR Score: 79.17%
Duets: A Dataset of Reproducible Pairs of Java Library-Clients

Thomas Durieux, César Soto-Valero, B. Baudry

DOI: 10.1109/MSR52588.2021.00071

Publication year: 2021

Topic 1 - Terms: project, code, source, developer, study, api, model, system, technique, type

Abstract: Software engineering researchers look for software artifacts to study their characteristics or to evaluate new techniques. In this paper, we introduce Duets, a new dataset of software libraries and... Read more

Software Issues
FAIR Score: 27.08%
Denchmark: A Bug Benchmark of Deep Learning-related Software

Misoo Kim, Youngkyoung Kim, Eunseok Lee

DOI: 10.1109/MSR52588.2021.00070

Publication year: 2021

Topic 8 - Terms: study, project, model, development, based, repository, source, issue, code, tool

Abstract: A growing interest in deep learning (DL) has instigated a concomitant rise in DL-related software (DLSW). Therefore, the importance of DLSW quality has emerged as a vital issue. Simultaneously, res... Read more

Software Issues
FAIR Score: 27.08%
AndroidCompass: A Dataset of Android Compatibility Checks in Code Repositories

Sebastian Nielebock, Paul Blockhaus, J. Krüger, F. Ortmeier

DOI: 10.1109/MSR52588.2021.00069

Publication year: 2021

Topic 11 - Terms: source, defect, present, developer, android, test, security, development, project, datasets

Abstract: Many developers and organizations implement apps for Android, the most widely used operating system for mobile devices. Common problems developers face are the various hardware devices, customized ... Read more

Other Data
FAIR Score: 79.17%
AndroZooOpen: Collecting Large-scale Open Source Android Apps for the Research Community

Pei Liu, Li Li, Yanjie Zhao, Xiaoyu Sun, J. Grundy

DOI: 10.1145/3379597.3387503

Publication year: 2020

Topic 5 - Terms: github, project, source, vulnerability, apps, development, tool, repository, present, information

Abstract: It is critical for research to have an open, well-curated, representative set of apps for analysis. We present a collection of open-source Android apps collected from several sources, including Git... Read more

Other Data
FAIR Score: %
A C/C++ Code Vulnerability Dataset with Code Changes and CVE Summaries

Jiahao Fan, Yi Li, Shaohua Wang, T. Nguyen

DOI: 10.1145/3379597.3387501

Publication year: 2020

Topic 5 - Terms: github, project, source, vulnerability, apps, development, tool, repository, present, information

Abstract: We collected a large C/C++ code vulnerability dataset from open-source Github projects, namely Big-Vul. We crawled the public Common Vulnerabilities and Exposures (CVE) database and CVE-related sou... Read more

Software Issues
FAIR Score: 27.08%
A Dataset and an Approach for Identity Resolution of 38 Million Author IDs extracted from 2B Git Commits

Tanner Fry, Tapajit Dey, Andrey Karnauch, A. Mockus

DOI: 10.1145/3379597.3387500

Publication year: 2020

Topic 7 - Terms: bug, project, repository, report, developer, code, study, single, source, activity

Abstract: The data collected from open source projects provide means to model large software ecosystems, but often suffer from data quality issues, specifically, multiple author identification strings in cod... Read more

Developer Metrics
FAIR Score: %
A Complete Set of Related Git Repositories Identified via Community Detection Approaches Based on Shared Commits

A. Mockus, D. Spinellis, Zoe Kotti, G. J. Dusing

DOI: 10.1145/3379597.3387499

Publication year: 2020

Topic 7 - Terms: bug, project, repository, report, developer, code, study, single, source, activity

Abstract: In order to understand the state and evolution of the entirety of open source software we need to get a handle on the set of distinct software projects. Most of open source projects presently utili... Read more

Developer Metrics
FAIR Score: %
A Dataset of Dockerfiles

Jordan Henkel, C. Bird, Shuvendu K. Lahiri, T. Reps

DOI: 10.1145/3379597.3387498

Publication year: 2020

Topic 10 - Terms: repository, issue, information, system, source, project, github, file, evolution, developer

Abstract: Dockerfiles are one of the most prevalent kinds of DevOps artifacts used in industry. Despite their prevalence, there is a lack of sophisticated semantics-aware static analysis of Dockerfiles. In t... Read more

Other Data
FAIR Score: 66.67%
Hall-of-Apps: The Top Android Apps Metadata Archive

Laura Bello-Jiménez, Camilo Escobar-Velásquez, Anamaria Mojica-Hanke, S. Cortés-Fernández, Mario Linares-Vásquez

DOI: 10.1145/3379597.3387497

Publication year: 2020

Topic 9 - Terms: repository, bug, source, project, code, information, method, developer, system, present

Abstract: The amount of Android apps available for download is constantly increasing, exerting a continuous pressure on developers to publish outstanding apps. Google Play (GP) is the default distribution ch... Read more

Other Data
FAIR Score: 75%
A Dataset for GitHub Repository Deduplication

D. Spinellis, Zoe Kotti, A. Mockus

DOI: 10.1145/3379597.3387496

Publication year: 2020

Topic 12 - Terms: project, source, code, development, clone, set, bug, github, os, datasets

Abstract: GitHub projects can be easily replicated through the site's fork process or through a Git clone-push sequence. This is a problem for empirical software engineering, because it can lead to skewed re... Read more

Software Issues
FAIR Score: 79.17%
A Dataset of Enterprise-Driven Open Source Software

D. Spinellis, Zoe Kotti, Konstantinos Kravvaritis, Georgios Theodorou, Panos Louridas

DOI: 10.1145/3379597.3387495

Publication year: 2020

Topic 5 - Terms: github, project, source, vulnerability, apps, development, tool, repository, present, information

Abstract: We present a dataset of open source software developed mainly by enterprises rather than volunteers. This can be used to address known generalizability concerns, and, also, to perform research on o... Read more

Version Control
FAIR Score: 79.17%
GitterCom - A Dataset of Open Source Developer Communications in Gitter

Esteban Parra, A. Ellis, S. Haiduc

DOI: 10.1145/3379597.3387494

Publication year: 2020

Topic 13 - Terms: source, system, evolution, code, build, set, project, developer, comment, information

Abstract: Team communication is essential for the development of modern software systems. For distributed software development teams, such as those found in many open source projects, this communication usua... Read more

Developer Metrics
FAIR Score: 27.08%
Software-related Slack Chats with Disentangled Conversations

Preetha Chatterjee, Kostadin Damevski, Nicholas A. Kraft, L. Pollock

DOI: 10.1145/3379597.3387493

Publication year: 2020

Topic 5 - Terms: github, project, source, vulnerability, apps, development, tool, repository, present, information

Abstract: More than ever, developers are participating in public chat communities to ask and answer software development questions. With over ten million daily active users, Slack is one of the most popular ... Read more

Developer Metrics
FAIR Score: 58.33%
A Mixed Graph-Relational Dataset of Socio-technical Interactions in Open Source Systems

Usman Ashraf, Christoph Mayr-Dorn, Alexander Egyed, Sebastiano Panichella

DOI: 10.1145/3379597.3387492

Publication year: 2020

Topic 10 - Terms: repository, issue, information, system, source, project, github, file, evolution, developer

Abstract: Several researchers have studied that developers contributing to open source systems tend to self-organize in “emerging” teams. The structure of these latent teams has a significant impact on softw... Read more

Developer Metrics
FAIR Score: 75%
How Often Do Single-Statement Bugs Occur? The ManySStuBs4J Dataset

Rafael-Michael Karampatsis, Charles Sutton

DOI: 10.1145/3379597.3387491

Publication year: 2019

Topic 7 - Terms: bug, project, repository, report, developer, code, study, single, source, activity

Abstract: Program repair is an important but difficult software engineering problem. One way to achieve acceptable performance is to focus on classes of simple bugs, such as bugs with single statement fixes,... Read more

Software Issues
FAIR Score: 79.17%
Employing Contribution and Quality Metrics for Quantifying the Software Development Process

Themistoklis G. Diamantopoulos, Michail D. Papamichail, Thomas Karanikiotis, Kyriakos C. Chatzidimitriou, A. Symeonidis

DOI: 10.1145/3379597.3387490

Publication year: 2020

Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis

Abstract: The full integration of online repositories in contemporary soft-ware development promotes remote work and collaboration. Apart from the apparent benefits, online repositories offer a deluge of dat... Read more

Version Control
FAIR Score: 75%
On the Shoulders of Giants: A New Dataset for Pull-based Development Research

Xunhui Zhang, Ayushi Rastogi, Yue Yu

DOI: 10.1145/3379597.3387489

Publication year: 2020

Topic 8 - Terms: study, project, model, development, based, repository, source, issue, code, tool

Abstract: Pull-based development is a widely adopted paradigm for collaboration in distributed software development, attracting eyeballs from both academic and industry. To better study pull-based developmen... Read more

Version Control
FAIR Score: 79.17%
TestRoutes: A Manually Curated Method Level Dataset for Test-to-Code Traceability

András Kicsi, László Vidács, T. Gyimóthy

DOI: 10.1145/3379597.3387488

Publication year: 2020

Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis

Abstract: High test-to-code traceability can be an important aspect of quality assurance and can contribute to bug localization and code mainte-nance. Several existing techniques and a considerable effort fr... Read more

Version Control
FAIR Score: 70.83%
20-MAD - 20 Years of Issues and Commits of Mozilla and Apache Development

Maëlick Claes, M. Mäntylä

DOI: 10.1145/3379597.3387487

Publication year: 2020

Topic 10 - Terms: repository, issue, information, system, source, project, github, file, evolution, developer

Abstract: Data of long-lived and high profile projects is valuable for research on successful software engineering in the wild. Having a dataset with different linked software repositories of such projects, ... Read more

Version Control
FAIR Score: 56.25%
Dataset of Video Game Development Problems

Cristiano Politowski, Fábio Petrillo, G. Ullmann, Josias de Andrade Werly, Yann-Gaël Guéhéneuc

DOI: 10.1145/3379597.3387486

Publication year: 2020

Topic 11 - Terms: source, defect, present, developer, android, test, security, development, project, datasets

Abstract: Different from traditional software development, there is little in-formation about the software-engineering process and techniques in video-game development. One popular way to share knowledge amo... Read more

Other Data
FAIR Score: 27.08%
LogChunks: A Data Set for Build Log Analysis

C. Brandt, Annibale Panichella, A. Zaidman, M. Beller

DOI: 10.1145/3379597.3387485

Publication year: 2020

Topic 13 - Terms: source, system, evolution, code, build, set, project, developer, comment, information

Abstract: Build logs are textual by-products that a software build process creates, often as part of its Continuous Integration (CI) pipeline. Build logs are a paramount source of information for developers ... Read more

Software Issues
FAIR Score: 79.17%
JTeC: A Large Collection of Java Test Classes for Test Code Analysis and Processing

Federico Coró, Roberto Verdecchia, Emilio Cruciani, Breno Miranda, A. Bertolino

DOI: 10.1145/3379597.3387484

Publication year: 2020

Topic 2 - Terms: source, test, bug, provide, code, analysis, community, method, study, package

Abstract: The recent push towards test automation and test-driven development continues to scale up the dimensions of test code that needs to be maintained, analysed, and processed side-by-side with pro-duct... Read more

Software Issues
FAIR Score: 79.17%
An Empirical History of Permission Requests and Mistakes in Open Source Android Apps

Gian Luca Scoccia, Anthony S Peruma, Virginia Pujols, Ben Christians, Daniel E. Krutz

DOI: 10.1109/MSR.2019.00090

Publication year: 2019

Topic 9 - Terms: repository, bug, source, project, code, information, method, developer, system, present

Abstract: Android applications (apps) rely upon proper permission usage to ensure that the user's privacy and security are adequately protected. Unfortunately, developers frequently misuse app permissions in... Read more

Software Issues
FAIR Score: 4.17%
RapidRelease - A Dataset of Projects and Issues on Github with Rapid Releases

Saket Joshi, S. Chimalakonda

DOI: 10.1109/MSR.2019.00088

Publication year: 2019

Topic 12 - Terms: project, source, code, development, clone, set, bug, github, os, datasets

Abstract: In the recent years, there has been a surge in the adoption of agile development model and continuous integration (CI) in software development. Recent trends have reduced average release cycle leng... Read more

Software Evolution
FAIR Score: 79.17%
A Benchmark of Data Loss Bugs for Android Apps

O. Riganelli, M. Mobilio, D. Micucci, L. Mariani

DOI: 10.1109/MSR.2019.00087

Publication year: 2019

Topic 5 - Terms: github, project, source, vulnerability, apps, development, tool, repository, present, information

Abstract: Android apps must be able to deal with both stop events, which require immediately stopping the execution of the app without losing state information, and start events, which require resuming the e... Read more

Software Issues
FAIR Score: 35.42%
Boa Meets Python: A Boa Dataset of Data Science Software in Python Language

Sumon Biswas, Md Johirul Islam, Yijia Huang, Hridesh Rajan

DOI: 10.1109/MSR.2019.00086

Publication year: 2019

Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github

Abstract: The popularity of Python programming language has surged in recent years due to its increasing usage in Data Science. The availability of Python repositories in Github presents an opportunity for m... Read more

Version Control
FAIR Score: 4.17%
SeSaMe: A Data Set of Semantically Similar Java Methods

Marius Kamp, Patrick Kreutzer, M. Philippsen

DOI: 10.1109/MSR.2019.00079

Publication year: 2019

Topic 12 - Terms: project, source, code, development, clone, set, bug, github, os, datasets

Abstract: In the past, techniques for detecting similarly behaving code fragments were often only evaluated with small, artificial oracles or with code originating from programming competitions. Such code fr... Read more

Semantic Metrics
FAIR Score: 70.83%
RmvDroid: Towards A Reliable Android Malware Dataset with App Metadata

Haoyu Wang, Junjun Si, Hao Li, Yao Guo

DOI: 10.1109/MSR.2019.00067

Publication year: 2019

Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github

Abstract: A large number of research studies have been focused on detecting Android malware in recent years. As a result, a reliable and large-scale malware dataset is essential to build effective malware cl... Read more

Software Issues
FAIR Score: %
A Dataset of Non-Functional Bugs

Aida Radu, Sarah Nadi

DOI: 10.1109/MSR.2019.00066

Publication year: 2019

Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis

Abstract: While several researchers have published bug data sets in the past, there has been less focus on bugs related to non-functional requirements. Non-functional requirements describe the quality attrib... Read more

Software Issues
FAIR Score: 27.08%
A Manually-Curated Dataset of Fixes to Vulnerabilities of Open-Source Software

Serena Elisa Ponta, H. Plate, A. Sabetta, M. Bezzi, Cédric Dangremont

DOI: 10.1109/MSR.2019.00064

Publication year: 2019

Topic 5 - Terms: github, project, source, vulnerability, apps, development, tool, repository, present, information

Abstract: Advancing our understanding of software vulnerabilities, automating their identification, the analysis of their impact, and ultimately their mitigation is necessary to enable the development of sof... Read more

Software Issues
FAIR Score: 27.08%
The Maven Dependency Graph: A Temporal Graph-Based Representation of Maven Central

Amine Benelallam, Nicolas Harrand, César Soto-Valero, B. Baudry, Olivier Barais

DOI: 10.1109/MSR.2019.00060

Publication year: 2019

Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis

Abstract: The Maven Central Repository provides an extraordinary source of data to understand complex architecture and evolution phenomena among Java applications. As of September 6, 2018, this repository in... Read more

Version Control
FAIR Score: 79.17%
A Panel Data Set of Cryptocurrency Development Activity on GitHub

Rijnard van Tonder, Asher Trockman, Claire Le Goues

DOI: 10.1109/MSR.2019.00037

Publication year: 2019

Topic 5 - Terms: github, project, source, vulnerability, apps, development, tool, repository, present, information

Abstract: Cryptocurrencies are a significant development in recent years, featuring in global news, the financial sector, and academic research. They also hold a significant presence in open source developme... Read more

Other Data
FAIR Score: 79.17%
GreenSource: A Large-Scale Collection of Android Code, Tests and Energy Metrics

Rui Rua, Marco Couto, J. Saraiva

DOI: 10.1109/MSR.2019.00035

Publication year: 2019

Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis

Abstract: This paper presents the GreenSource infrastructure: a large body of open source code, executable Android applications, and curated dataset containing energy code metrics. The dataset contains energ... Read more

Other Data
FAIR Score: 4.17%
GreenHub Farmer: Real-World Data for Android Energy Mining

Hugo Matalonga, Bruno Cabral, F. C. Filho, Marco Couto, Rui Pereira, S. Sousa, J. Fernandes

DOI: 10.1109/MSR.2019.00034

Publication year: 2019

Topic 8 - Terms: study, project, model, development, based, repository, source, issue, code, tool

Abstract: As mobile devices are supporting more and more of our daily activities, it is vital to widen their battery up-time as much as possible. In fact, according to the Wall Street Journal, 9/10 users suf... Read more

Other Data
FAIR Score: 18.75%
The Software Heritage Graph Dataset: Public Software Development Under One Roof

Antoine Pietri, D. Spinellis, Stefano Zacchiroli

DOI: 10.1109/MSR.2019.00030

Publication year: 2019

Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github

Abstract: Software Heritage is the largest existing public archive of software source code and accompanying development history: it currently spans more than five billion unique source code files and one bil... Read more

Version Control
FAIR Score: 79.17%
A Data Set of Program Invariants and Error Paths

Dirk Beyer

DOI: 10.1109/MSR.2019.00026

Publication year: 2019

Topic 1 - Terms: project, code, source, developer, study, api, model, system, technique, type

Abstract: The analysis of correctness proofs and counterexamples of program source code is an important way to gain insights into methods that could make it easier in the future to find invariants to prove a... Read more

Software Issues
FAIR Score: 79.17%
A Dataset of Parametric Cryptographic Misuses

A. Wickert, Michael Reif, Michael Eichberg, Anam Dodhy, M. Mezini

DOI: 10.1109/MSR.2019.00023

Publication year: 2019

Topic 4 - Terms: code, bug, repository, github, tool, study, activity, source, technique, apps

Abstract: Cryptographic APIs (Crypto APIs) provide the foundations for the development of secure applications. Unfortunately, most applications do not use Crypto APIs securely and end up being insecure, e.g.... Read more

Software Issues
FAIR Score: %
Cleaning StackOverflow for Machine Translation

Musfiqur Rahman, Peter C. Rigby, Dharani Palani, T. Nguyen

DOI: 10.1109/MSR.2019.00021

Publication year: 2019

Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github

Abstract: Generating source code API sequences from an English query using Machine Translation (MT) has gained much interest in recent years. For any kind of MT, the model needs to be trained on a parallel c... Read more

Semantic Metrics
FAIR Score: %
Semantic Source Code Models Using Identifier Embeddings

V. Efstathiou, D. Spinellis

DOI: 10.1109/MSR.2019.00015

Publication year: 2019

Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github

Abstract: The emergence of online open source repositories in the recent years has led to an explosion in the volume of openly available source code, coupled with metadata that relate to a variety of softwar... Read more

Semantic Metrics
FAIR Score: 75%
Documented Unix Facilities over 48 Years

D. Spinellis

DOI: 10.1145/3196398.3196476

Publication year: 2018

Topic 13 - Terms: source, system, evolution, code, build, set, project, developer, comment, information

Abstract: The documented Unix facilities data set provides the details regarding the evolution of 15596 unique facilities through 93 versions of Unix over a period of 48 years. It is based on the manual tran... Read more

Software Evolution
FAIR Score: 27.08%
A Multi-level Dataset of Linux Kernel Patchwork

Yulin Xu, Minghui Zhou

DOI: 10.1145/3196398.3196475

Publication year: 2018

Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis

Abstract: In many open source software projects (e.g., the Linux kernel), people contribute by sending code patches to the community. The community evaluates these contributions and decides whether to integr... Read more

Software Evolution
FAIR Score: 75%
Bugs.jar: A Large-Scale, Diverse Dataset of Real-World Java Bugs

Ripon K. Saha, Yingjun Lyu, Wing Lam, H. Yoshida, M. Prasad

DOI: 10.1145/3196398.3196473

Publication year: 2018

Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis

Abstract: We present Bugs.jar, a large-scale dataset for research in automated debugging, patching, and testing of Java programs. Bugs.jar is comprised of 1,158 bugs and patches, drawn from 8 large, popular ... Read more

Software Issues
FAIR Score: 27.08%
CROP: Linking Code Reviews to Source Code Changes

M. Paixão, J. Krinke, Donggyun Han, M. Harman

DOI: 10.1145/3196398.3196466

Publication year: 2018

Topic 3 - Terms: code, review, project, source, present, change, set, github, repository, quality

Abstract: Code review has been widely adopted by both industrial and open source software development communities. Research in code review is highly dependant on real-world data, and although existing resear... Read more

Developer Metrics
FAIR Score: 4.17%
npm-Miner: An Infrastructure for Measuring the Quality of the npm Registry

Kyriakos C. Chatzidimitriou, Michail D. Papamichail, Themistoklis G. Diamantopoulos, Michail Tsapanos, A. Symeonidis

DOI: 10.1145/3196398.3196465

Publication year: 2018

Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis

Abstract: As the popularity of the JavaScript language is constantly increasing, one of the most important challenges today is to assess the quality of JavaScript packages. Developers often employ tools for ... Read more

Version Control
FAIR Score: 70.83%
Public Git Archive: A Big Code Dataset for All

Vadim Markovtsev, Waren Long

DOI: 10.1145/3196398.3196464

Publication year: 2018

Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github

Abstract: The number of open source software projects has been growing exponentially. The major online software repository host, GitHub, has accumulated tens of millions of publicly available Git version-con... Read more

Version Control
FAIR Score: 27.08%
A Graph-Based Dataset of Commit History of Real-World Android apps

F. Geiger, I. Malavolta, L. Pascarella, Fabio Palomba, Dario Di Nucci, Alberto Bacchelli

DOI: 10.1145/3196398.3196460

Publication year: 2018

Topic 4 - Terms: code, bug, repository, github, tool, study, activity, source, technique, apps

Abstract: Obtaining a good dataset to conduct empirical studies on the engineering of Android apps is an open challenge. To start tackling this challenge, we present AndroidTimeMachine, the rst, self-contain... Read more

Version Control
FAIR Score: 18.75%
Developer Interaction Traces Backed by IDE Screen Recordings from Think Aloud Sessions

A. Yamashita, Fábio Petrillo, Foutse Khomh, Yann-Gaël Guéhéneuc

DOI: 10.1145/3196398.3196457

Publication year: 2018

Topic 1 - Terms: project, code, source, developer, study, api, model, system, technique, type

Abstract: There are two well-known difficulties to test and interpret methodologies for mining developer interaction traces: first, the lack of enough large datasets needed by mining or machine learning appr... Read more

Other Data
FAIR Score: 79.17%
Structured Information on State and Evolution of Dockerfiles on GitHub

Gerald Schermann, Sali Zumberi, Jürgen Cito

DOI: 10.1145/3196398.3196456

Publication year: 2018

Topic 13 - Terms: source, system, evolution, code, build, set, project, developer, comment, information

Abstract: Docker containers are standardized, self-contained units of applications, packaged with their dependencies and execution environment. The environment is defined in a Dockerfile that specifies the s... Read more

Other Data
FAIR Score: 27.08%
A Dataset of Duplicate Pull-Requests in GitHub

Yue Yu, Zhixing Li, Gang Yin, Tao Wang, Huaimin Wang

DOI: 10.1145/3196398.3196455

Publication year: 2018

Topic 9 - Terms: repository, bug, source, project, code, information, method, developer, system, present

Abstract: In GitHub, the pull-based development model enables community contributors to collaborate in a more efficient way. However, the distributed and parallel characteristics of this model pose a potenti... Read more

Software Issues
FAIR Score: 27.08%
VulinOSS: A Dataset of Security Vulnerabilities in Open-Source Systems

Antonios Gkortzis, Dimitris Mitropoulos, D. Spinellis

DOI: 10.1145/3196398.3196454

Publication year: 2018

Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis

Abstract: Examining the different characteristics of open-source software in relation to security vulnerabilities, can provide the research community with findings that can lead to the development of more se... Read more

Software Issues
FAIR Score: 27.08%
A Gold Standard for Emotion Annotation in Stack Overflow

Nicole Novielli, Fabio Calefato, F. Lanubile

DOI: 10.1145/3196398.3196453

Publication year: 2018

Topic 7 - Terms: bug, project, repository, report, developer, code, study, single, source, activity

Abstract: Software developers experience and share a wide range of emotions throughout a rich ecosystem of communication channels. A recent trend that has emerged in empirical software engineering studies is... Read more

Semantic Metrics
FAIR Score: 27.08%
JBench: A Dataset of Data Races for Concurrency Testing

Jian Gao, Xin Yang, Yu Jiang, Han Liu, Weiliang Ying, Xian Zhang

DOI: 10.1145/3196398.3196451

Publication year: 2018

Topic 9 - Terms: repository, bug, source, project, code, information, method, developer, system, present

Abstract: Race detection is increasingly popular, both in the academic research and in industrial practice. However, there is no specialized and comprehensive dataset of the data race, making it difficult to... Read more

Software Issues
FAIR Score: 27.08%
50K-C: A Dataset of Compilable, and Compiled, Java Projects

Pedro Martins, Rohan Achar, C. Lopes

DOI: 10.1145/3196398.3196450

Publication year: 2018

Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github

Abstract: We provide a repository of 50,000 compilable Java projects. Each project in this dataset comes with references to all the dependencies required to compile it, the resulting bytecode, and the script... Read more

Version Control
FAIR Score: 4.17%
Word Embeddings for the Software Engineering Domain

V. Efstathiou, Christos Chatzilenas, D. Spinellis

DOI: 10.1145/3196398.3196448

Publication year: 2018

Topic 11 - Terms: source, defect, present, developer, android, test, security, development, project, datasets

Abstract: The software development process produces vast amounts of textual data expressed in natural language. Outcomes from the natural language processing community have been adapted in software engineeri... Read more

Semantic Metrics
FAIR Score: 75%
A Data Set of OCL Expressions on GitHub

Jeroen Noten, J. Mengerink, Alexander Serebrenik

DOI: 10.1109/MSR.2017.52

Publication year: 2017

Topic 8 - Terms: study, project, model, development, based, repository, source, issue, code, tool

Abstract: In model driven engineering (MDE), meta-models are the central artifacts. As a complement, the Object Constraint Language (OCL) is a language used to express constraints and operations on meta-mode... Read more

Other Data
FAIR Score: 27.08%
Rediscovery Datasets: Connecting Duplicate Reports

Mefta Sadat, A. Bener, A. Miranskyy

DOI: 10.1109/MSR.2017.50

Publication year: 2017

Topic 11 - Terms: source, defect, present, developer, android, test, security, development, project, datasets

Abstract: The same defect can be rediscovered by multiple clients, causing unplanned outages and leading to reduced customer satisfaction. In the case of popular open source software, high volume of defects ... Read more

Software Issues
FAIR Score: 83.33%
A Dataset for Dynamic Discovery of Semantic Changes in Version Controlled Software Histories

Chenguang Zhu, Yi Li, J. Rubin, M. Chechik

DOI: 10.1109/MSR.2017.49

Publication year: 2017

Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github

Abstract: Over the last few years, researchers proposed several semantic history slicing approaches that identify the set of semantically-related commits implementing a particular software functionality. How... Read more

Version Control
FAIR Score: 27.08%
An Extensive Dataset of UML Models in GitHub

G. Robles, Truong Ho-Quang, R. Hebig, M. Chaudron, M. A. Fernández

DOI: 10.1109/MSR.2017.48

Publication year: 2017

Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github

Abstract: The Unified Modeling Language (UML) is widely taught in academia and has good acceptance in industry. However, there is not an ample dataset of UML diagrams publicly available. Our aim is to offer ... Read more

Other Data
FAIR Score: 4.17%
Continuous Defect Prediction: The Idea and a Related Dataset

L. Madeyski, M. Kawalerowicz

DOI: 10.1109/MSR.2017.46

Publication year: 2017

Topic 10 - Terms: repository, issue, information, system, source, project, github, file, evolution, developer

Abstract: We would like to present the idea of our Continuous Defect Prediction (CDP) research and a related dataset that we created and share. Our dataset is currently a set of more than 11 million data row... Read more

Software Issues
FAIR Score: 27.08%
A Dataset of Scratch Programs: Scraped, Shaped and Scored

Efthimia Aivaloglou, F. Hermans, J. Moreno-León, G. Robles

DOI: 10.1109/MSR.2017.45

Publication year: 2017

Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis

Abstract: Scratch is increasingly popular, both as an introductory programming language and as a research target in the computing education research field. In this paper, we present a dataset of 250K recent ... Read more

Other Data
FAIR Score: 27.08%
Software Evolution and Quality Data from Controlled, Multiple, Industrial Case Studies

A. Yamashita, S. Amirhossein Abtahizadeh, Foutse Khomh, Yann-Gaël Guéhéneuc

DOI: 10.1109/MSR.2017.44

Publication year: 2017

Topic 1 - Terms: project, code, source, developer, study, api, model, system, technique, type

Abstract: A main difficulty to study the evolution and quality of real-life software systems is the effect of moderator factors, such as: programming skill, type of maintenance task, and learning effect. Exp... Read more

Software Evolution
FAIR Score: 79.17%
Data Sets: The Circle of Life in Ruby Hosting, 2003-2015

Megan Squire

DOI: 10.1145/2901739.2903509

Publication year: 2016

Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github

Abstract: Studying software repositories and hosting services can provide valuable insights into the behaviors of large groups of software developers and their projects. Traditionally, most analysis of metad... Read more

Software Evolution
FAIR Score: 8.33%
AndroZoo: Collecting Millions of Android Apps for the Research Community

Kevin Allix, Tégawendé F. Bissyandé, Jacques Klein, Y. L. Traon

DOI: 10.1145/2901739.2903508

Publication year: 2016

Topic 5 - Terms: github, project, source, vulnerability, apps, development, tool, repository, present, information

Abstract: We present a growing collection of Android Applications col-lected from several sources, including the official GooglePlay app market. Our dataset, AndroZoo, currently contains more than three mill... Read more

Other Data
FAIR Score: 18.75%
A Dataset of Simplified Syntax Trees for C#

Sebastian Proksch, Sven Amann, Sarah Nadi, M. Mezini

DOI: 10.1145/2901739.2903507

Publication year: 2016

Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github

Abstract: In this paper, we present a curated collection of 2833 C# solutions taken from Github. We encode the data in a new intermediate representation (IR) that facilitates further analysis by restricting ... Read more

Other Data
FAIR Score: 4.17%
MUBench: A Benchmark for API-Misuse Detectors

Sven Amann, Sarah Nadi, H. Nguyen, T. Nguyen, M. Mezini

DOI: 10.1145/2901739.2903506

Publication year: 2016

Topic 1 - Terms: project, code, source, developer, study, api, model, system, technique, type

Abstract: Over the last few years, researchers proposed a multitude of automated bug-detection approaches that mine a class of bugs that we call API misuses. Evaluations on a variety of software products sho... Read more

Software Issues
FAIR Score: 27.08%
The Emotional Side of Software Developers in JIRA

Marco Ortu, Alessandro Murgia, Giuseppe Destefanis, Parastou Tourani, R. Tonelli, M. Marchesi, Bram Adams

DOI: 10.1145/2901739.2903505

Publication year: 2016

Topic 13 - Terms: source, system, evolution, code, build, set, project, developer, comment, information

Abstract: Issue tracking systems store valuable data for testing hypotheses concerning maintenance, building statistical prediction models and (recently) investigating developer affectiveness. For the latter... Read more

Developer Metrics
FAIR Score: 4.17%
Mining the Modern Code Review Repositories: A Dataset of People, Process and Product

Xin Yang, R. Kula, Norihiro Yoshida, Hajimu Iida

DOI: 10.1145/2901739.2903504

Publication year: 2016

Topic 3 - Terms: code, review, project, source, present, change, set, github, repository, quality

Abstract: In this paper, we present a collection of Modern Code Review data for five open source projects. The data showcases mined data from both an integrated peer review system and source code repositorie... Read more

Developer Metrics
FAIR Score: 31.25%
Multi-extract and Multi-level Dataset of Mozilla Issue Tracking History

Jiaxin Zhu, Minghui Zhou, Hong Mei

DOI: 10.1145/2901739.2903502

Publication year: 2016

Topic 1 - Terms: project, code, source, developer, study, api, model, system, technique, type

Abstract: Many studies analyze issue tracking repositories to understand and support software development. To facilitate the analyses, we share a Mozilla issue tracking dataset covering a 15-year history. Th... Read more

Version Control
FAIR Score: 27.08%
A Dataset of Open-Source Android Applications

Daniel E. Krutz, Mehdi Mirakhorli, Samuel A. Malachowsky, Andres Ruiz, Jacob Peterson, Andrew Filipski, Jared Smith

DOI: 10.1109/MSR.2015.79

Publication year: 2015

Topic 5 - Terms: github, project, source, vulnerability, apps, development, tool, repository, present, information

Abstract: Android has grown to be the world's most popular mobile platform with apps that are capable of doing everything from checking sports scores to purchasing stocks. In order to assist researchers and ... Read more

Version Control
FAIR Score: 8.33%
A Dataset of High Impact Bugs: Manually-Classified Issue Reports

M. Ohira, Yutaro Kashiwa, Yosuke Yamatani, Hayato Yoshiyuki, Yoshiya Maeda, Nachai Limsettho, K. Fujino, Hideaki Hata, Akinori Ihara, Ken-ichi Matsumoto

DOI: 10.1109/MSR.2015.78

Publication year: 2015

Topic 7 - Terms: bug, project, repository, report, developer, code, study, single, source, activity

Abstract: The importance of supporting test and maintenance activities in software development has been increasing, since recent software systems have become large and complex. Although in the field of Minin... Read more

Software Issues
FAIR Score: %
A Data Set for Social Diversity Studies of GitHub Teams

Bogdan Vasilescu, Alexander Serebrenik, V. Filkov

DOI: 10.1109/MSR.2015.77

Publication year: 2015

Topic 3 - Terms: code, review, project, source, present, change, set, github, repository, quality

Abstract: Like any other team oriented activity, the software development process is effected by social diversity in the programmer teams. The effect of team diversity can be significant, but also complex, e... Read more

Developer Metrics
FAIR Score: %
Generating the Blueprints of the Java Ecosystem

Vassilios Karakoidas, Dimitris Mitropoulos, P. Louridas, Georgios Gousios, D. Spinellis

DOI: 10.1109/MSR.2015.76

Publication year: 2015

Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis

Abstract: Examining a large number of software artifacts can provide the research community with data regarding quality and design. We present a dataset obtained by statically analyzing 22730 jar files taken... Read more

Version Control
FAIR Score: 27.08%
A Dataset for API Usage

A. Sawant, Alberto Bacchelli

DOI: 10.1109/MSR.2015.75

Publication year: 2015

Topic 1 - Terms: project, code, source, developer, study, api, model, system, technique, type

Abstract: An Application Programming Interface (API) provides a specific set of functionalities to a developer. The main aim of an API is to encourage the reuse of already existing functionality. There has b... Read more

Version Control
FAIR Score: 54.17%
An Architectural Evolution Dataset

M. Wermelinger, Y. Yu

DOI: 10.1109/MSR.2015.74

Publication year: 2015

Topic 7 - Terms: bug, project, repository, report, developer, code, study, single, source, activity

Abstract: A good evolution process and a good architecture can greatly support the maintainability of long-lived, large software systems. We present AREVOL, a dataset for the empirical study of architectural... Read more

Other Data
FAIR Score: 4.17%
The Firefox Temporal Defect Dataset

Mayy Habayeb, A. Miranskyy, Syed Shariyar Murtaza, Leotis Buchanan, A. Bener

DOI: 10.1109/MSR.2015.73

Publication year: 2015

Topic 7 - Terms: bug, project, repository, report, developer, code, study, single, source, activity

Abstract: The bug tracking repositories of software projects capture initial defect (bug) reports and the history of interactions among developers, testers, and customers. Extracting and mining information f... Read more

Software Issues
FAIR Score: %
A Novel Industry Grade Dataset for Fault Prediction Based on Model-Driven Developed Automotive Embedded Software

Harald Altinger, S. Siegl, Y. Dajsuren, F. Wotawa

DOI: 10.1109/MSR.2015.72

Publication year: 2015

Topic 8 - Terms: study, project, model, development, based, repository, source, issue, code, tool

Abstract: In this paper, we present a novel industry dataset on static software and change metrics for Matlab/Simulink models and their corresponding auto-generated C source code. The data set comprises data... Read more

Software Issues
FAIR Score: %
Dataset of Developer-Labeled Commit Messages

Andreas Mauczka, Florian Brosch, Christian Schanes, T. Grechenig

DOI: 10.1109/MSR.2015.71

Publication year: 2015

Topic 3 - Terms: code, review, project, source, present, change, set, github, repository, quality

Abstract: Current research on change classification centers around automated and semi-automated approaches which are based on evaluation by either the researchers themselves or external experts. In most case... Read more

Semantic Metrics
FAIR Score: %
Fuse: A Reproducible, Extendable, Internet-Scale Corpus of Spreadsheets

Titus Barik, Kevin Lubick, Justin Smith, John Slankas, E. Murphy-Hill

DOI: 10.1109/MSR.2015.70

Publication year: 2015

Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github

Abstract: Spreadsheets are perhaps the most ubiquitous form of end-user programming software. This paper describes a corpus, called Fuse, containing 2,127,284 URLs that return spreadsheets (and their HTTP se... Read more

Other Data
FAIR Score: 4.17%
Landfill: An Open Dataset of Code Smells with Public Evaluation

Fabio Palomba, Dario Di Nucci, Michele Tufano, G. Bavota, Rocco Oliveto, D. Poshyvanyk, A. D. Lucia

DOI: 10.1109/MSR.2015.69

Publication year: 2015

Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis

Abstract: Code smells are symptoms of poor design and implementation choices that may hinder code comprehension and possibly increase change- and fault-proneness of source code. Several techniques have been ... Read more

Software Issues
FAIR Score: %
The MetricsGrimoire Database Collection

Jesus M. Gonzalez-Barahona, G. Robles, Daniel Izquierdo-Cortazar

DOI: 10.1109/MSR.2015.68

Publication year: 2015

Topic 10 - Terms: repository, issue, information, system, source, project, github, file, evolution, developer

Abstract: The Metrics Grimoire system is composed by a set of tools designed to retrieve data from repositories related to software development. Their aim is to produce organized databases suitable for easy ... Read more

Version Control
FAIR Score: %
StORMeD: Stack Overflow Ready Made Data

Luca Ponzanelli, Andrea Mocci, Michele Lanza

DOI: 10.1109/MSR.2015.67

Publication year: 2015

Topic 13 - Terms: source, system, evolution, code, build, set, project, developer, comment, information

Abstract: Stack Overflow is the de facto Question and Answer (Q&A) website for developers, and it has been used in many approaches by software engineering researchers to mine useful data. However, the conten... Read more

Semantic Metrics
FAIR Score: 4.17%
A Dataset of the Activity of the Git Super-repository of Linux in 2012

D. Germán, Bram Adams, A. Hassan

DOI: 10.1109/MSR.2015.66

Publication year: 2015

Topic 10 - Terms: repository, issue, information, system, source, project, github, file, evolution, developer

Abstract: This dataset documents the activity in the public portion of the git Super-repository of the Linux kernel during 2012. In a distributed version control system, such as git, the Super-repository is ... Read more

Version Control
FAIR Score: %
The Debsources Dataset: Two Decades of Debian Source Code Metadata

Stefano Zacchiroli

DOI: 10.1109/MSR.2015.65

Publication year: 2015

Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis

Abstract: We present the Debsources Dataset: distribution metadata and source code metrics spanning two decades of Free and Open Source Software (FOSS) history, seen through the lens of the Debian distributi... Read more

Software Evolution
FAIR Score: 4.17%
A Repository with 44 Years of Unix Evolution

D. Spinellis

DOI: 10.1109/MSR.2015.64

Publication year: 2015

Topic 10 - Terms: repository, issue, information, system, source, project, github, file, evolution, developer

Abstract: The evolution of the Unix operating system is made available as a version-control repository, covering the period from its inception in 1972 as a five thousand line kernel, to 2015 as a widely-used... Read more

Software Evolution
FAIR Score: 70.83%
Understanding software evolution: the maisqual ant data set

B. Baldassari, P. Preux

DOI: 10.1145/2597073.2597136

Publication year: 2014

Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis

Abstract: Software engineering is a maturing discipline which has seen many drastic advances in the last years. However, some studies still point to the lack of rigorous and mathematically grounded methods t... Read more

Software Evolution
FAIR Score: %
OpenHub: a scalable architecture for the analysis of software quality attributes

G. Farah, Juan Sebastian Tejada, Darío Correal

DOI: 10.1145/2597073.2597135

Publication year: 2014

Topic 12 - Terms: project, source, code, development, clone, set, bug, github, os, datasets

Abstract: There is currently a vast array of open source projects available on the web, and although they are searchable by name or description in the search engines, there is no way to search for projects b... Read more

Version Control
FAIR Score: %
A dataset for maven artifacts and bug patterns found in them

V. Saini, Hitesh Sajnani, Joel Ossher, C. Lopes

DOI: 10.1145/2597073.2597134

Publication year: 2014

Topic 9 - Terms: repository, bug, source, project, code, information, method, developer, system, present

Abstract: In this paper, we present data downloaded from Maven, one of the most popular component repositories. The data includes the binaries of 186,392 components, along with source code for 161,025. We id... Read more

Software Issues
FAIR Score: %
A dataset of clone references with gaps

Hiroaki Murakami, Yoshiki Higo, S. Kusumoto

DOI: 10.1145/2597073.2597133

Publication year: 2014

Topic 12 - Terms: project, source, code, development, clone, set, bug, github, os, datasets

Abstract: This paper introduces a new dataset of clone references, which is a set of correct clones consisting of their locational information with their gapped lines. Bellon's dataset is one of widely used ... Read more

Software Issues
FAIR Score: %
Models of OSS project meta-information: a dataset of three forges

James R. Williams, D. D. Ruscio, N. Matragkas, Juri Di Rocco, D. Kolovos

DOI: 10.1145/2597073.2597132

Publication year: 2014

Topic 12 - Terms: project, source, code, development, clone, set, bug, github, os, datasets

Abstract: The process of selecting open-source software (OSS) for adoption is not straightforward as it involves exploring various sources of information to determine the quality, maturity, activity, and use... Read more

Version Control
FAIR Score: 27.08%
Gentoo package dependencies over time

Remco Bloemen, C. Amrit, S. Kuhlmann, Gonzalo Ordóñez‐Matamoros

DOI: 10.1145/2597073.2597131

Publication year: 2014

Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis

Abstract: Open source distributions such as Gentoo need to accurately track dependency relations between software packages in order to install working systems. To do this, Gentoo has a carefully authored dat... Read more

Software Evolution
FAIR Score: %
A green miner's dataset: mining the impact of software change on energy consumption

Chenlei Zhang, Abram Hindle

DOI: 10.1145/2597073.2597130

Publication year: 2014

Topic 9 - Terms: repository, bug, source, project, code, information, method, developer, system, present

Abstract: With the advent of mobile computing, the responsibility of software developers to update and ship energy efficient applications has never been more pronounced. Green mining attempts to address this... Read more

Other Data
FAIR Score: 27.08%
FLOSS 2013: a survey dataset about free software contributors: challenges for curating, sharing, and combining

G. Robles, L. Reina, Alexander Serebrenik, Bogdan Vasilescu, Jesus M. Gonzalez-Barahona

DOI: 10.1145/2597073.2597129

Publication year: 2014

Topic 3 - Terms: code, review, project, source, present, change, set, github, repository, quality

Abstract: In this data paper we describe a data set obtained by means of performing an on-line survey to over 2,000 Free Libre Open Source Software (FLOSS) contributors. The survey includes questions related... Read more

Developer Metrics
FAIR Score: %
Generating duplicate bug datasets

A. Lazar, Sarah Ritchey, Bonita Sharif

DOI: 10.1145/2597073.2597128

Publication year: 2014

Topic 12 - Terms: project, source, code, development, clone, set, bug, github, os, datasets

Abstract: Automatic identification of duplicate bug reports is an important research problem in the mining software repositories field. This paper presents a collection of bug datasets collected, cleaned and... Read more

Software Issues
FAIR Score: %
A code clone oracle

Daniel E. Krutz, Wei Le

DOI: 10.1145/2597073.2597127

Publication year: 2014

Topic 12 - Terms: project, source, code, development, clone, set, bug, github, os, datasets

Abstract: Code clones are functionally equivalent code segments. De- tecting code clones is important for determining bugs, fixes and software reuse. Code clone detection is also essential for developing fas... Read more

Software Issues
FAIR Score: %
Lean GHTorrent: GitHub data on demand

Georgios Gousios, Bogdan Vasilescu, Alexander Serebrenik, A. Zaidman

DOI: 10.1145/2597073.2597126

Publication year: 2014

Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github

Abstract: In recent years, GitHub has become the largest code host in the world, with more than 5M developers collaborating across 10M repositories. Numerous popular open source projects (such as Ruby on Rai... Read more

Version Control
FAIR Score: %
Kataribe: a hosting service of historage repositories

Kenji Fujiwara, Hideaki Hata, Erina Makihara, Yusuke Fujihara, Naoki Nakayama, Hajimu Iida, Ken-ichi Matsumoto

DOI: 10.1145/2597073.2597125

Publication year: 2014

Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github

Abstract: In the research of Mining Software Repositories, code repository is one of the core source since it contains the product of software development. Code repository stores the versions of files, and m... Read more

Version Control
FAIR Score: %
A dataset of feature additions and feature removals from the Linux kernel

L. Passos, K. Czarnecki

DOI: 10.1145/2597073.2597124

Publication year: 2014

Topic 10 - Terms: repository, issue, information, system, source, project, github, file, evolution, developer

Abstract: This paper describes a dataset of feature additions and removals in the Linux kernel evolution history, spanning over seven years of kernel development. Features, in this context, denote configurab... Read more

Version Control
FAIR Score: %
The bug catalog of the maven ecosystem

Dimitris Mitropoulos, Vassilios Karakoidas, P. Louridas, Georgios Gousios, D. Spinellis

DOI: 10.1145/2597073.2597123

Publication year: 2014

Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis

Abstract: Examining software ecosystems can provide the research community with data regarding artifacts, processes, and communities. We present a dataset obtained from the Maven central repository ecosystem... Read more

Software Issues
FAIR Score: 27.08%
A dataset for pull-based development research

Georgios Gousios, A. Zaidman

DOI: 10.1145/2597073.2597122

Publication year: 2014

Topic 12 - Terms: project, source, code, development, clone, set, bug, github, os, datasets

Abstract: Pull requests form a new method for collaborating in distributed software development. To study the pull request distributed development model, we constructed a dataset of almost 900 projects and 3... Read more

Version Control
FAIR Score: 27.08%
INVocD: Identifier name vocabulary dataset

Simon Butler, M. Wermelinger, Y. Yu, H. Sharp

DOI: 10.1109/MSR.2013.6624056

Publication year: 2013

Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github

Abstract: INVocD is a database of the identifier name declarations and vocabulary found in 60 FLOSS Java projects where the source code structure is recorded and the identifier name vocabulary is made direct... Read more

Semantic Metrics
FAIR Score: 39.58%
A dataset for evaluating identifier splitters

D. Binkley, Dawn J Lawrie, L. Pollock, Emily Hill, K. Vijay-Shanker

DOI: 10.1109/MSR.2013.6624055

Publication year: 2013

Topic 1 - Terms: project, code, source, developer, study, api, model, system, technique, type

Abstract: Software engineering and evolution techniques have recently started to exploit the natural language information in source code. A key step in doing so is splitting identifiers into their constituen... Read more

Semantic Metrics
FAIR Score: 4.17%
A historical dataset of software engineering conferences

Bogdan Vasilescu, Alexander Serebrenik, T. Mens

DOI: 10.1109/MSR.2013.6624051

Publication year: 2013

Topic 9 - Terms: repository, bug, source, project, code, information, method, developer, system, present

Abstract: The Mining Software Repositories community typically focuses on data from software configuration management tools, mailing lists, and bug tracking repositories to uncover interesting and actionable... Read more

Other Data
FAIR Score: 27.08%
An unabridged source code dataset for research in software reuse

Werner Janjic, Oliver Hummel, M. Schumacher, C. Atkinson

DOI: 10.1109/MSR.2013.6624047

Publication year: 2013

Topic 13 - Terms: source, system, evolution, code, build, set, project, developer, comment, information

Abstract: This paper describes a large, unabridged data-set of Java source code gathered and shared as part of the Merobase Component Finder project of the Software-Engineering Group at the University of Man... Read more

Version Control
FAIR Score: 4.17%
Apache-affiliated Twitter screen names: A dataset

Megan Squire

DOI: 10.1109/MSR.2013.6624043

Publication year: 2013

Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis

Abstract: This paper describes a new dataset containing Twitter screen names for members of the projects affiliated with the Apache Software Foundation (ASF). The dataset includes the confirmed Twitter scree... Read more

Developer Metrics
FAIR Score: 4.17%
Project roles in the Apache Software Foundation: A dataset

Megan Squire

DOI: 10.1109/MSR.2013.6624042

Publication year: 2013

Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis

Abstract: This paper outlines the steps in the creation and maintenance of a new dataset listing leaders of the various projects of the Apache Software Foundation (ASF). Included in this dataset are differen... Read more

Developer Metrics
FAIR Score: 4.17%
The GHTorent dataset and tool suite

Georgios Gousios

DOI: 10.1109/MSR.2013.6624034

Publication year: 2013

Topic 5 - Terms: github, project, source, vulnerability, apps, development, tool, repository, present, information

Abstract: During the last few years, GitHub has emerged as a popular project hosting, mirroring and collaboration platform. GitHub provides an extensive REST API, which enables researchers to retrieve high-q... Read more

Version Control
FAIR Score: %
A network of Rails a graph dataset of Ruby on Rails and associated projects

Patrick Wagstrom, C. Jergensen, A. Sarma

DOI: 10.1109/MSR.2013.6624033

Publication year: 2013

Topic 12 - Terms: project, source, code, development, clone, set, bug, github, os, datasets

Abstract: Software projects, whether open source, proprietary, or a combination thereof, rarely exist in isolation. Rather, most projects build on a network of people and ideas from dozens, hundreds, or even... Read more

Developer Metrics
FAIR Score: 27.08%
A historical dataset for the Gnome ecosystem

M. Goeminne, Maëlick Claes, T. Mens

DOI: 10.1109/MSR.2013.6624032

Publication year: 2013

Topic 9 - Terms: repository, bug, source, project, code, information, method, developer, system, present

Abstract: We present a dataset of the open source software ecosystem Gnome from a social point of view. We have collected historical data about the contributors to all Gnome projects stored on git.gnome.org,... Read more

Software Evolution
FAIR Score: 4.17%
The Maven repository dataset of metrics, changes, and dependencies

S. Raemaekers, A. Deursen, Joost Visser

DOI: 10.1109/MSR.2013.6624031

Publication year: 2013

Topic 9 - Terms: repository, bug, source, project, code, information, method, developer, system, present

Abstract: We present the Maven Dependency Dataset (MDD), containing metrics, changes and dependencies of 148,253 jar files. Metrics and changes have been calculated at the level of individual methods, classe... Read more

Version Control
FAIR Score: 64.58%
The Eclipse and Mozilla defect tracking dataset: A genuine dataset for mining bug information

Ahmed Lamkanfi, Javier Pérez, S. Demeyer

DOI: 10.1109/MSR.2013.6624028

Publication year: 2013

Topic 7 - Terms: bug, project, repository, report, developer, code, study, single, source, activity

Abstract: The analysis of bug reports is an important subfield within the mining software repositories community. It explores the rich data available in defect tracking systems to uncover interesting and act... Read more

Software Issues
FAIR Score: 27.08%
Apache commits: Social network dataset

Alexander C. MacLean, C. Knutson

DOI: 10.1109/MSR.2013.6624020

Publication year: 2013

Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github

Abstract: Building non-trivial software is a social endeavor. Therefore, understanding the social network of developers is key to the study of software development organizations. We present a graph represent... Read more

Developer Metrics
FAIR Score: %
A dataset from change history to support evaluation of software maintenance tasks

Bogdan Dit, Andrew Holtzhauer, D. Poshyvanyk, Huzefa H. Kagdi

DOI: 10.1109/MSR.2013.6624019

Publication year: 2013

Topic 12 - Terms: project, source, code, development, clone, set, bug, github, os, datasets

Abstract: Approaches that support software maintenance need to be evaluated and compared against existing ones, in order to demonstrate their usefulness in practice. However, oftentimes the lack of well-esta... Read more

Software Evolution
FAIR Score: 4.17%
Who does what during a code review? Datasets of OSS peer review repositories

Kazuki Hamasaki, R. Kula, Norihiro Yoshida, Ana Erika Camargo Cruz, Kenji Fujiwara, Hajimu Iida

DOI: 10.1109/MSR.2013.6624003

Publication year: 2013

Topic 3 - Terms: code, review, project, source, present, change, set, github, repository, quality

Abstract: We present four datasets that are focused on the general roles of OSS peer review members. With data mined from both an integrated peer review system and code source repositories, our rich datasets... Read more

Developer Metrics
FAIR Score: %
Gerrit software code review data from Android

M. Mukadam, C. Bird, Peter C. Rigby

DOI: 10.1109/MSR.2013.6624002

Publication year: 2013

Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis

Abstract: Over the past decade, a number of tools and systems have been developed to manage various aspects of the software development lifecycle. Until now, tool supported code review, an important aspect o... Read more

Developer Metrics
FAIR Score: 27.08%