Saurabhsingh Rajput, M. Kechagia, Federica Sarro, Tushar Sharma
Publication year: 2024
Topic 1 - Terms: project, code, source, developer, study, api, model, system, technique, type
Abstract: Deep learning (dl) models are being widely deployed in real-world applications, but their usage remains computationally intensive and energy-hungry. While prior work has examined model-level energy... Read more
Md Rakibul Islam, Md. Fazle Rabbi, Jo Youngeun, A. I. Champa, Ethan Young, Camden Wilson, Gavin Scott, M. Zibran
Publication year: 2024
Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github
Abstract: We present the first four-dimension gold standard dataset to advance opinion mining focused on the software engineering domain. Through a well-defined sampling and annotation strategy leveraging mu... Read more
Yasaman Abedini, Mohammad Hadi Hajihosseini, Abbas Heydarnoori
Publication year: 2024
Topic 10 - Terms: repository, issue, information, system, source, project, github, file, evolution, developer
Abstract: Android apps continuously evolve to meet user expectations and thrive in the competitive environment of app stores. Hence, making informed decisions is crucial for the success of upcoming releases.... Read more
Christian Birchler, Cyrill Rohrbach, Timo Kehrer, Sebastiano Panichella
Publication year: 2024
Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis
Abstract: Developing tools in the context of autonomous systems [22], [24], such as self-driving cars (SDCs), is time-consuming and costly since researchers and practitioners rely on expensive computing hard... Read more
Dario Amoroso d'Aragona, Alexander Bakhtin, Xiaozhou Li, Ruoyu Su, Lauren Adams, Ernesto Aponte, Francis Boyle, Patrick Boyle, Rachel Koerner, Joseph Lee, Fangchao Tian, Yuqing Wang, Jesse Nyyssölä, Ernesto Quevedo, Shahidur Md Rahaman, Amr S. Abdelfattah, Mika Mäntylä, Tomás Cerný, Davide Taibi
Publication year: 2024
Topic 1 - Terms: project, code, source, developer, study, api, model, system, technique, type
Abstract: Researchers in the microservices community often resort to demonstrating the impact of their proposed advancements on custom- made microservices projects. This is a possible source of bias that can... Read more
Dirk Beyer, Lars Grunske, Matthias Kettl, Marian Lingsch-Rosenfeld, Moeketsi Raselimo
Publication year: 2024
Topic 9 - Terms: repository, bug, source, project, code, information, method, developer, system, present
Abstract: Identifying and fixing bugs in programs remains a challenge and is one of the most time-consuming tasks in software development. But even after a bug is identified, and a fix has been proposed by a... Read more
Daniel Sokolowski, David Spielmann, Guido Salvaneschi
Publication year: 2024
Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github
Abstract: With Programming Languages Infrastructure as Code (PL-IaC), developers implement IaC programs in popular imperative programming languages like Python and Typescript. Such programs generate the decl... Read more
Ramtin Ehsani, M. M. Imran, Robert Zita, Kostadin Damevski, Preetha Chatterjee
Publication year: 2024
Topic 3 - Terms: code, review, project, source, present, change, set, github, repository, quality
Abstract: In the dynamic landscape of open source software (OSS) development, understanding and addressing incivility within issue discussions is crucial for fostering healthy and productive collaborations. ... Read more
Chao Ni, Liyu Shen, Xiaohu Yang, Yan Zhu, Shaohua Wang
Publication year: 2024
Topic 1 - Terms: project, code, source, developer, study, api, model, system, technique, type
Abstract: We constructed a newly large-scale and comprehensive C/C++ vulnerability dataset named MegaVul by crawling the Common Vulnerabilities and Exposures (CVE) database and CVE-related open-source projec... Read more
Zhipeng Liu, Meng Yan, Zhipeng Gao, Dong Li, Xiaohong Zhang, Dan Yang
Publication year: 2024
Topic 9 - Terms: repository, bug, source, project, code, information, method, developer, system, present
Abstract: Excessive non-actionable warnings generated by static program analysis tools can hinder developers from utilizing these tools effectively. Leveraging learning-based approaches for actionable warnin... Read more
Nusrat Zahan, Philipp Burckhardt, Mikola Lysenko, Feross Aboukhadijeh, Laurie Williams
Publication year: 2024
Topic 11 - Terms: source, defect, present, developer, android, test, security, development, project, datasets
Abstract: The prevalent use of third-party components in modern software development, rapid modernization, and digitization have significantly amplified the risk of software supply chain attacks. Popular lar... Read more
Mojtaba Mostafavi Ghahfarokhi, Arash Asgari, Mohammad Abolnejadian, Abbas Heydarnoori
Publication year: 2024
Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis
Abstract: Jupyter notebooks have become indispensable tools for data analysis and processing in various domains. However, despite their widespread use, there is a notable research gap in understanding and an... Read more
E. Sutoyo, Andrea Capiluppi
Publication year: 2024
Topic 12 - Terms: project, source, code, development, clone, set, bug, github, os, datasets
Abstract: Self-admitted technical debt (SATD) refers to a form of technical debt in which developers explicitly acknowledge and document the existence of technical shortcuts, workarounds, or temporary soluti... Read more
K. D. Pramod, W.T.N. De Silva, W.U.K. Thabrew, Ridwan Shariffdeen, Sandareka Wickramanayake
Publication year: 2024
Topic 9 - Terms: repository, bug, source, project, code, information, method, developer, system, present
Abstract: Automated Program Repair (APR) improves developer productivity by saving debugging and bug-fixing time. While APR has been extensively explored for C/C++ and Java programs, there is little research... Read more
Daniel Garijo, Miguel Arroyo, Esteban Gonzalez, Christoph Treude, Nicola Tarocco
Publication year: 2024
Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github
Abstract: While computer science papers frequently include their associated code repositories, establishing a clear link between papers and their corresponding implementations may be challenging due to the n... Read more
André Hora
Publication year: 2024
Topic 2 - Terms: source, test, bug, provide, code, analysis, community, method, study, package
Abstract: Real-world test suites are often complex and may have thousands of test cases. In this scenario, it is not easy to spot what values are actually covered by the tests. Having access to every tested ... Read more
Davi Tabosa, Oton Pinheiro, Lincoln S. Rocha, Windson Viana
Publication year: 2024
Topic 4 - Terms: code, bug, repository, github, tool, study, activity, source, technique, apps
Abstract: Ensuring the readability and comprehension of source code is key for effective software maintenance and evolution, particularly in tasks involving bug fixing, refactoring, and optimization. Previou... Read more
Mingzhao Liang, Wachiraphan Charoenwet, Patanamon Thongtanunam
Publication year: 2024
Topic 3 - Terms: code, review, project, source, present, change, set, github, repository, quality
Abstract: Code review is an important practice that improves the overall quality of a proposed patch (i.e. code changes). While much research focused on tool-based code reviews (e.g. a Gerrit code review too... Read more
Kaibo Liu, Yudong Han, Yiyang Liu, Zhenpeng Chen, Jie M. Zhang, Federica Sarro, Gang Huang, Yun Ma
Publication year: 2024
Topic 4 - Terms: code, bug, repository, github, tool, study, activity, source, technique, apps
Abstract: We call a program that passes existing tests but still contains bugs as a buggy plausible program. Bugs in such a program can bypass the testing environment and enter the production environment, ca... Read more
Andrea D’Angelo, Claudio Di Sipio, Cristiano Politowski, Riccardo Rubei
Publication year: 2024
Topic 12 - Terms: project, source, code, development, clone, set, bug, github, os, datasets
Abstract: Being predominant in digital entertainment for decades, video games have been recognized as valuable software artifacts by the software engineering (SE) community just recently. Such an acknowledgm... Read more
Mahmoud Jahanshahi, A. Mockus
Publication year: 2023
Topic 8 - Terms: study, project, model, development, based, repository, source, issue, code, tool
Abstract: In Open Source Software, the source code and any other resources available in a project can be viewed or reused by anyone subject to often permissive licensing restrictions. In contrast to some stu... Read more
Guillaume Cardoen, Tom Mens, Alexandre Decan
Publication year: 2024
Topic 10 - Terms: repository, issue, information, system, source, project, github, file, evolution, developer
Abstract: GitHub Actions is the de facto workflow automation tool for GitHub repositories. Its popularity has increased dramatically over the recent years, opening up opportunities for empirical studies rela... Read more
Jordan Samhi, Marco Alecci, Tegawend'e F. Bissyand'e, Jacques Klein
Publication year: 2023
Topic 1 - Terms: project, code, source, developer, study, api, model, system, technique, type
Abstract: Android app developers extensively employ code reuse, integrating many third-party libraries into their apps. While such integration is practical for developers, it can be challenging for static an... Read more
Anisha Islam, Kalvin Eng, Abram Hindle
Publication year: 2024
Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github
Abstract: Pure Data (PD), a data-flow based visual programming language utilized for music and sound synthesis, remains underexplored in software engineering research. Existing literature fails to address th... Read more
M. Weyssow, Claudio Di Sipio, Davide Di Ruscio, H. Sahraoui
Publication year: 2023
Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis
Abstract: Motivated by recent work on lifelong learning applications for language models (LMs) of code, we introduce CodeLL, a lifelong learning dataset focused on code changes. Our contribution addresses a ... Read more
Marco Alecci, Pedro Jesús Ruiz Jiménez, Kevin Allix, Tégawendé F. Bissyandé, Jacques Klein
Publication year: 2024
Topic 10 - Terms: repository, issue, information, system, source, project, github, file, evolution, developer
Abstract: In 2016, we released AndroZoo, a continuously expanding dataset of Android applications that aggregates apps from various sources, including the official Google Play app market. As of today, AndroZ... Read more
Yibo Yan, Seth Frey, Amy Zhang, V. Filkov, Likang Yin
DOI: 10.1109/MSR59073.2023.00089
Publication year: 2023
Topic 12 - Terms: project, source, code, development, clone, set, bug, github, os, datasets
Abstract: Open-source Software (OSS) has become a valuable resource in both industry and academia over the last few decades. Despite the innovative structures they develop to support the projects, OSS projec... Read more
Parvez Mahbub, Ohiduzzaman Shuvo, M. M. Rahman
DOI: 10.1109/MSR59073.2023.00085
Publication year: 2023
Topic 1 - Terms: project, code, source, developer, study, api, model, system, technique, type
Abstract: Defect prediction has been a popular research topic where machine learning (ML) and deep learning (DL) have found numerous applications. However, these ML/DL-based defect prediction models are ofte... Read more
Catherine Tony, Markus Mutas, Nicolás E. Díaz Ferreyra, R. Scandariato
DOI: 10.1109/MSR59073.2023.00084
Publication year: 2023
Topic 3 - Terms: code, review, project, source, present, change, set, github, repository, quality
Abstract: Large Language Models (LLMs) like Codex are powerful tools for performing code completion and code generation tasks as they are trained on billions of lines of code from publicly available sources.... Read more
Emily Bui, H. Rocha
DOI: 10.1109/MSR59073.2023.00081
Publication year: 2023
Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github
Abstract: Snapshot testing is a form of software testing that is focused on visual components by highlighting any code changes when compared to a previously stored state. This quick and simple method of test... Read more
Mohayeminul Islam, Ajay Kumar Jha, Sarah Nadi, Ildar Akhmetov
DOI: 10.1109/MSR59073.2023.00075
Publication year: 2023
Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis
Abstract: Developers heavily rely on Application Programming Interfaces (APIs) from libraries to build their projects. However, libraries might become obsolete, or new libraries with better APIs might become... Read more
Natarajan Chidambaram, Alexandre Decan, T. Mens
DOI: 10.1109/MSR59073.2023.00070
Publication year: 2023
Topic 4 - Terms: code, bug, repository, github, tool, study, activity, source, technique, apps
Abstract: Software repositories hosted on GitHub frequently use development bots to automate repetitive, effort intensive and error-prone tasks. To understand and study how these bots are used, state-of-the-... Read more
Himesh Nandani, M. Saad, Tushar Sharma
DOI: 10.1109/MSR59073.2023.00067
Publication year: 2023
Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github
Abstract: Researchers apply machine-learning techniques for code smell detection to counter the subjectivity of many code smells. Such approaches need a large, manually annotated dataset for training and ben... Read more
Murali Sridharan, Leevi Rantala, M. Mäntylä
DOI: 10.1109/MSR59073.2023.00063
Publication year: 2023
Topic 13 - Terms: source, system, evolution, code, build, set, project, developer, comment, information
Abstract: Most Self-Admitted Technical Debt (SATD) research utilizes explicit SATD features such as ‘TODO’ and ‘FIXME’ for SATD detection. A closer look reveals several SATD research uses simple SATD (‘Easy ... Read more
Akhila Sri Manasa Venigalla, S. Chimalakonda
DOI: 10.1109/MSR59073.2023.00062
Publication year: 2023
Topic 10 - Terms: repository, issue, information, system, source, project, github, file, evolution, developer
Abstract: Software documentation is one of the critical aspects of a software project, that could support multiple tasks throughout the software development life-cycle. There is extensive research on underst... Read more
S. Basak, L. Neil, Bradley Reaves, Laurie A. Williams
DOI: 10.1109/MSR59073.2023.00053
Publication year: 2023
Topic 5 - Terms: github, project, source, vulnerability, apps, development, tool, repository, present, information
Abstract: According to GitGuardian’s monitoring of public GitHub repositories, the exposure of secrets (API keys and other credentials) increased two-fold in 2021 compared to 2020, totaling more than six mil... Read more
Leonhard Applis, Annibale Panichella
DOI: 10.1109/MSR59073.2023.00040
Publication year: 2023
Topic 2 - Terms: source, test, bug, provide, code, analysis, community, method, study, package
Abstract: We present HasBugs, an extensible and manually-curated dataset of real-world 25 Haskell Bugs from 6 open source repositories. We provide a faulty, tested, and fixed version of each bug in our datas... Read more
Themistoklis G. Diamantopoulos, Dimitrios Nastos, A. Symeonidis
DOI: 10.1109/MSR59073.2023.00039
Publication year: 2023
Topic 1 - Terms: project, code, source, developer, study, api, model, system, technique, type
Abstract: Current state of practice dictates that software developers host their projects online and employ project management systems to monitor the development of product features, keep track of bugs, and ... Read more
S. Schneider, Tufan Özen, Michael Chen, R. Scandariato
DOI: 10.1109/MSR59073.2023.00030
Publication year: 2023
Topic 11 - Terms: source, defect, present, developer, android, test, security, development, project, datasets
Abstract: Dataflow diagrams (DFDs) are useful resources in securing applications since they show a software system’s architecture and allow assessing architectural security and weaknesses. Enriching them wit... Read more
Nafiseh Nikeghbal, Amir Hossein Kargaran, A. Heydarnoori, Hinrich Schutze
DOI: 10.1109/MSR59073.2023.00026
Publication year: 2023
Topic 9 - Terms: repository, bug, source, project, code, information, method, developer, system, present
Abstract: GitHub’s issue reports provide developers with valuable information that is essential to the evolution of a software development project. Contributors can use these reports to perform software engi... Read more
Ratnadira Widyasari, Zhou Yang, Ferdian Thung, Sheng Qin Sim, Fiona Wee, Camellia Lok, Jack Phan, Haodi Qi, Constance Tan, Qijin Tay, David Lo
DOI: 10.1109/MSR59073.2023.00022
Publication year: 2023
Topic 12 - Terms: project, source, code, development, clone, set, bug, github, os, datasets
Abstract: Machine learning (ML) has gained much attention and has been incorporated into our daily lives. While there are numerous publicly available ML projects on open source platforms such as GitHub, ther... Read more
Wenxin Jiang, Nicholas Synovic, Purvish Jajal, Taylor R. Schorlemmer, Arav Tewari, Bhavesh Pareek, G. Thiruvathukal, James C. Davis
DOI: 10.1109/MSR59073.2023.00021
Publication year: 2023
Topic 8 - Terms: study, project, model, development, based, repository, source, issue, code, tool
Abstract: Due to the cost of developing and training deep learning models from scratch, machine learning engineers have begun to reuse pre-trained models (PTMs) and fine-tune them for downstream tasks. PTM r... Read more
Chengjie Lu, T. Yue, Sajid Ali
DOI: 10.1109/MSR59073.2023.00020
Publication year: 2023
Topic 11 - Terms: source, defect, present, developer, android, test, security, development, project, datasets
Abstract: With the rapid development of autonomous driving systems (ADSs), testing ADSs under various environmental conditions has become a key method to ensure the successful deployment of ADS in the real w... Read more
Kevin Jesse, krjesse
Publication year: 2022
Topic 1 - Terms: project, code, source, developer, study, api, model, system, technique, type
Abstract: In this paper, we present ManyTypes4TypeScript, a very large corpus for training and evaluating machine-learning models for sequence-based type inference in TypeScript. The dataset includes over 9 ... Read more
Cedric Richter, H. Wehrheim
Publication year: 2022
Topic 7 - Terms: bug, project, repository, report, developer, code, study, single, source, activity
Abstract: Single statement bugs are one of the most important ingredients in the evaluation of modern bug detection and automatic program repair methods. By affecting only a single statement, single statemen... Read more
Asma Razagallah, R. Khoury, Jean-Baptiste Poulet
Publication year: 2022
Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis
Abstract: System call traces are an invaluable source of information about a program's runtime behavior and be particularly useful for malware detection in Android apps. However, the paucity of publicly avai... Read more
Zeinab Abou Khalil, Stefano Zacchiroli
Publication year: 2022
Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github
Abstract: We introduce the General Index of Software Engineering Papers, a dataset of fulltext-indexed papers from the most prominent scientific venues in the field of Software Engineering. The dataset inclu... Read more
Saurabh Kumar, Debadatta Mishra, Biswabandan Panda, S. Shukla
Publication year: 2022
Topic 9 - Terms: repository, bug, source, project, code, information, method, developer, system, present
Abstract: With the large-scale adaptation of Android OS and ever-increasing contributions in the Android application space, Android has become the number one target of malware writers. In recent years, a lar... Read more
B. L. Sousa, Mariza Bigonha, K. Ferreira, G. Franco
Publication year: 2022
Topic 13 - Terms: source, system, evolution, code, build, set, project, developer, comment, information
Abstract: Software evolution is the process of developing, maintaining, and updating software systems. It is known that the software systems tend to increase their complexity and size over their evolution to... Read more
Stefano Zacchiroli
Publication year: 2022
Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github
Abstract: We introduce a large-scale dataset of the complete texts of free/open source software (FOSS) license variants. To assemble it we have collected from the Software Heritage archive-the largest public... Read more
Francesco Altiero, A. Corazza, S. Martino, A. Peron, L. L. L. Starace
Publication year: 2022
Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github
Abstract: It is recognized in the literature that finding representative data to conduct regression testing research is non-trivial. In our experience within this field, existing datasets are often affected ... Read more
Kimberly Truong, Courtney Miller, Bogdan Vasilescu, Christian Kästner
Publication year: 2022
Topic 2 - Terms: source, test, bug, provide, code, analysis, community, method, study, package
Abstract: Talks at practitioner-focused open-source software conferences are a valuable source of information for software engineering researchers. They provide a pulse of the community and are valuable sour... Read more
Bonan Kou, Yifeng Di, Muhao Chen, Tianyi Zhang
Publication year: 2022
Topic 1 - Terms: project, code, source, developer, study, api, model, system, technique, type
Abstract: Stack Overflow (SO) is becoming an indispensable part of modern software development workflow. However, given the limited time, attention, and memory capacity of programmers, navigating SO posts an... Read more
Lloyd Montgomery, C. Luders, W. Maalej
Publication year: 2022
Topic 10 - Terms: repository, issue, information, system, source, project, github, file, evolution, developer
Abstract: Organisations use issue tracking systems (ITSs) to track and document their projects' work in units called issues. This style of documentation encourages evolutionary refinement, as each issue can ... Read more
Quang-Cuong Bui, R. Scandariato, N. E. D. Ferreyra
Publication year: 2022
Topic 5 - Terms: github, project, source, vulnerability, apps, development, tool, repository, present, information
Abstract: In this work we present Vul4j, a Java vulnerability dataset where each vulnerability is associated to a patch and, most importantly, to a Proof of Vulnerability (PoV) test case. We analyzed 1803 fi... Read more
Viktor Csuvik, László Vidács
Publication year: 2022
Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis
Abstract: The field of Automated Program Repair (APR) has received increasing attention in recent years both from the academic world and from leading IT companies. Its main goal is to repair software bugs au... Read more
Melanie Warrick, Samuel F. Rosenblatt, Jean-Gabriel Young, Amanda Casari, Laurent H'ebert-Dufresne, J. Bagrow
Publication year: 2022
Topic 11 - Terms: source, defect, present, developer, android, test, security, development, project, datasets
Abstract: Communication surrounding the development of an open source project largely occurs outside the software repository itself. Historically, large communities often used a collection of mailing lists t... Read more
Vali Tawosi, A. Al-Subaihin, Rebecca Moussa, Federica Sarro
Publication year: 2022
Topic 10 - Terms: repository, issue, information, system, source, project, github, file, evolution, developer
Abstract: Agile software development is nowadays a widely adopted practise in both open-source and industrial software projects. Agile teams typically heavily rely on issue management tools to document new i... Read more
Jinyoung Kim, Misoo Kim, Eunseok Lee
Publication year: 2022
Topic 7 - Terms: bug, project, repository, report, developer, code, study, single, source, activity
Abstract: With the introduction of smart contacts, Ethereum has become one of the most popular blockchain networks. In the wake of its popularity, an increasing number of Ethereum-based software have been de... Read more
Jordan Samhi, Tegawend'e F. Bissyand'e, Jacques Klein
Publication year: 2022
Topic 4 - Terms: code, bug, repository, github, tool, study, activity, source, technique, apps
Abstract: Many Android apps analyzers rely, among other techniques, on dynamic analysis to monitor their runtime behavior and detect potential security threats. However, malicious developers use subtle, thou... Read more
Keerthana Muthu Subash, L. P. Kumar, Srinivas Vadlamani, Preetha Chatterjee, Olga Baysal
Publication year: 2022
Topic 1 - Terms: project, code, source, developer, study, api, model, system, technique, type
Abstract: Today, software developers work on complex and fast-moving projects that often require instant assistance from other domain and subject matter experts. Chat servers such as Discord facilitate live ... Read more
Kristiina Rahkema, Dietmar Pfahl
Publication year: 2022
Topic 11 - Terms: source, defect, present, developer, android, test, security, development, project, datasets
Abstract: Third party libraries are used to integrate existing solutions for common problems and help speed up development. The use of third party libraries, however, can carry risks, for example through vul... Read more
Yoshiki Higo, S. Matsumoto, S. Kusumoto, Kazuya Yasuda
Publication year: 2022
Topic 1 - Terms: project, code, source, developer, study, api, model, system, technique, type
Abstract: Since programming languages offer a wide variety of grammers, desired functions can be implemented in a variety of ways. We consider that there is a large amount of source code that has different i... Read more
Michele Tufano, Shao Kun Deng, Neel Sundaresan, Alexey Svyatkovskiy
Publication year: 2022
Topic 9 - Terms: repository, bug, source, project, code, information, method, developer, system, present
Abstract: Unit testing is an essential part of the software development process, which helps to identify issues with source code in early stages of development and prevent regressions. Machine learning has e... Read more
Gunnar Kudrjavets, Nachiappan Nagappan, Ayushi Rastogi
Publication year: 2022
Topic 3 - Terms: code, review, project, source, present, change, set, github, repository, quality
Abstract: Phabricator is a modern code collaboration tool used by popular projects like FreeBSD and Mozilla. However, unlike the other well-known code review environments, such as Gerrit or GitHub, there is ... Read more
Petya Buchkova, Joakim Hey Hinnerskov, Kasper Olsen, R. Pfeiffer
Publication year: 2022
Topic 2 - Terms: source, test, bug, provide, code, analysis, community, method, study, package
Abstract: Software package managers facilitate reuse and rapid construction of software systems. Since evermore software is distributed via package managers, researchers and practitioners require explicit da... Read more
Nicolas Riquet, Xavier Devroey, B. Vanderose
Publication year: 2022
Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github
Abstract: Conducting socio-technical software engineering research on closed-source software is difficult as most organizations do not want to give access to their code repositories. Most experiments and pub... Read more
S. L. Shrestha, Shafiul Azam Chowdhury, Christoph Csallner
Publication year: 2022
Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis
Abstract: MATLAB/Simulink is widely used for model-based design. Engineers create Simulink models and compile them to embedded code, often to control safety-critical cyber-physical systems in automotive, aer... Read more
Hossein Keshavarz, M. Nagappan
Publication year: 2022
Topic 9 - Terms: repository, bug, source, project, code, information, method, developer, system, present
Abstract: In this paper, we present ApacheJIT, a large dataset for Just-In-Time (JIT) defect prediction. ApacheJIT consists of clean and bug-inducing software changes in 14 popular Apache projects. ApacheJIT... Read more
Sahar Badihi, Yi Li, J. Rubin
DOI: 10.1109/MSR52588.2021.00084
Publication year: 2021
Topic 12 - Terms: project, source, code, development, clone, set, bug, github, os, datasets
Abstract: Equivalence checking techniques help establish whether two versions of a program exhibit the same behavior. The majority of popular techniques for formally proving/refuting equivalence are evaluate... Read more
Dheeraj Vagavolu, Vartika Agrahari, S. Chimalakonda, Akhila Sri Manasa Venigalla
DOI: 10.1109/MSR52588.2021.00083
Publication year: 2021
Topic 7 - Terms: bug, project, repository, report, developer, code, study, single, source, activity
Abstract: Game engines, are frameworks that provide a platform for developers to build games with an interface tailored to handle the complexity of game development. Though there is extensive empirical resea... Read more
Tyler Wendland, Jingyang Sun, Junayed Mahmud, S M Hasan Mansur, Steven Huang, Kevin Moran, J. Rubin, M. Fazzini
DOI: 10.1109/MSR52588.2021.00082
Publication year: 2021
Topic 7 - Terms: bug, project, repository, report, developer, code, study, single, source, activity
Abstract: Software maintenance constitutes a large portion of the software development lifecycle. To carry out maintenance tasks, developers often need to understand and reproduce bug reports. As such, there... Read more
Likang Yin, Zhiyuan Zhang, Qi Xuan, V. Filkov
DOI: 10.1109/MSR52588.2021.00081
Publication year: 2021
Topic 12 - Terms: project, source, code, development, clone, set, bug, github, os, datasets
Abstract: Open Source Software success and sustainability is critically important for the digital infrastructure as OSS is used broadly and yet 83+% of such projects fail. To increase chances of success many... Read more
Tushar Sharma, Marouane Kessentini
DOI: 10.1109/MSR52588.2021.00080
Publication year: 2021
Topic 3 - Terms: code, review, project, source, present, change, set, github, repository, quality
Abstract: Code quality aspects such as code smells and code quality metrics are widely used in exploratory and empirical software engineering research. In such studies, researchers spend a substantial amount... Read more
A. Mir, Evaldas Latoskinas, Georgios Gousios
DOI: 10.1109/MSR52588.2021.00079
Publication year: 2021
Topic 1 - Terms: project, code, source, developer, study, api, model, system, technique, type
Abstract: In this paper, we present ManyTypes4Py, a large Python dataset for machine learning (ML)-based type inference. The dataset contains a total of 5,382 Python projects with more than 869K type annotat... Read more
R. Opdebeeck, Ahmed Zerouali, Coen De Roover
DOI: 10.1109/MSR52588.2021.00078
Publication year: 2021
Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis
Abstract: Cloud-native applications increasingly provision infrastructure resources programmatically through Infrastructure as Code (IaC) scripts. These scripts have in turn become the subject of empirical s... Read more
N. Rao, Chetan Bansal, Joe Guan
DOI: 10.1109/MSR52588.2021.00077
Publication year: 2021
Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github
Abstract: Developers use search for various tasks such as finding code, documentation, debugging information, etc. In particular, web search is heavily used by developers for finding code examples and snippe... Read more
Wen Li, Xiaoqin Fu, Haipeng Cai
DOI: 10.1109/MSR52588.2021.00076
Publication year: 2021
Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github
Abstract: Data-driven approaches have proven to be promising in mobile software analysis, yet these approaches rely on sizable and quality datasets. For Android app analysis in particular, there have been se... Read more
Nafise Eskandani, G. Salvaneschi
DOI: 10.1109/MSR52588.2021.00075
Publication year: 2021
Topic 9 - Terms: repository, bug, source, project, code, information, method, developer, system, present
Abstract: Function as a Service (FaaS) has grown in popularity in recent years, with an increasing number of applications following the Serverless computing model. Serverless computing supports out of the bo... Read more
Ozren Dabić, Emad Aghajani, G. Bavota
DOI: 10.1109/MSR52588.2021.00074
Publication year: 2021
Topic 8 - Terms: study, project, model, development, based, repository, source, issue, code, tool
Abstract: Almost every Mining Software Repositories (MSR) study requires, as first step, the selection of the subject software repositories. These repositories are usually collected from hosting services lik... Read more
Mouna Hammoudi, Christoph Mayr-Dorn, A. Mashkoor, Alexander Egyed
DOI: 10.1109/MSR52588.2021.00073
Publication year: 2021
Topic 9 - Terms: repository, bug, source, project, code, information, method, developer, system, present
Abstract: Software engineers use requirement-to-method trace matrices to indicate the methods implementing different system requirements. Requirement-to-method trace matrices pinpoint the exact method implem... Read more
L. Quaranta, Fabio Calefato, F. Lanubile
DOI: 10.1109/MSR52588.2021.00072
Publication year: 2021
Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis
Abstract: Computational notebooks have become the tool of choice for many data scientists and practitioners for performing analyses and disseminating results. Despite their increasing popularity, the researc... Read more
Thomas Durieux, César Soto-Valero, B. Baudry
DOI: 10.1109/MSR52588.2021.00071
Publication year: 2021
Topic 1 - Terms: project, code, source, developer, study, api, model, system, technique, type
Abstract: Software engineering researchers look for software artifacts to study their characteristics or to evaluate new techniques. In this paper, we introduce Duets, a new dataset of software libraries and... Read more
Misoo Kim, Youngkyoung Kim, Eunseok Lee
DOI: 10.1109/MSR52588.2021.00070
Publication year: 2021
Topic 8 - Terms: study, project, model, development, based, repository, source, issue, code, tool
Abstract: A growing interest in deep learning (DL) has instigated a concomitant rise in DL-related software (DLSW). Therefore, the importance of DLSW quality has emerged as a vital issue. Simultaneously, res... Read more
Sebastian Nielebock, Paul Blockhaus, J. Krüger, F. Ortmeier
DOI: 10.1109/MSR52588.2021.00069
Publication year: 2021
Topic 11 - Terms: source, defect, present, developer, android, test, security, development, project, datasets
Abstract: Many developers and organizations implement apps for Android, the most widely used operating system for mobile devices. Common problems developers face are the various hardware devices, customized ... Read more
Pei Liu, Li Li, Yanjie Zhao, Xiaoyu Sun, J. Grundy
Publication year: 2020
Topic 5 - Terms: github, project, source, vulnerability, apps, development, tool, repository, present, information
Abstract: It is critical for research to have an open, well-curated, representative set of apps for analysis. We present a collection of open-source Android apps collected from several sources, including Git... Read more
Jiahao Fan, Yi Li, Shaohua Wang, T. Nguyen
Publication year: 2020
Topic 5 - Terms: github, project, source, vulnerability, apps, development, tool, repository, present, information
Abstract: We collected a large C/C++ code vulnerability dataset from open-source Github projects, namely Big-Vul. We crawled the public Common Vulnerabilities and Exposures (CVE) database and CVE-related sou... Read more
Tanner Fry, Tapajit Dey, Andrey Karnauch, A. Mockus
Publication year: 2020
Topic 7 - Terms: bug, project, repository, report, developer, code, study, single, source, activity
Abstract: The data collected from open source projects provide means to model large software ecosystems, but often suffer from data quality issues, specifically, multiple author identification strings in cod... Read more
A. Mockus, D. Spinellis, Zoe Kotti, G. J. Dusing
Publication year: 2020
Topic 7 - Terms: bug, project, repository, report, developer, code, study, single, source, activity
Abstract: In order to understand the state and evolution of the entirety of open source software we need to get a handle on the set of distinct software projects. Most of open source projects presently utili... Read more
Jordan Henkel, C. Bird, Shuvendu K. Lahiri, T. Reps
Publication year: 2020
Topic 10 - Terms: repository, issue, information, system, source, project, github, file, evolution, developer
Abstract: Dockerfiles are one of the most prevalent kinds of DevOps artifacts used in industry. Despite their prevalence, there is a lack of sophisticated semantics-aware static analysis of Dockerfiles. In t... Read more
Laura Bello-Jiménez, Camilo Escobar-Velásquez, Anamaria Mojica-Hanke, S. Cortés-Fernández, Mario Linares-Vásquez
Publication year: 2020
Topic 9 - Terms: repository, bug, source, project, code, information, method, developer, system, present
Abstract: The amount of Android apps available for download is constantly increasing, exerting a continuous pressure on developers to publish outstanding apps. Google Play (GP) is the default distribution ch... Read more
D. Spinellis, Zoe Kotti, A. Mockus
Publication year: 2020
Topic 12 - Terms: project, source, code, development, clone, set, bug, github, os, datasets
Abstract: GitHub projects can be easily replicated through the site's fork process or through a Git clone-push sequence. This is a problem for empirical software engineering, because it can lead to skewed re... Read more
D. Spinellis, Zoe Kotti, Konstantinos Kravvaritis, Georgios Theodorou, Panos Louridas
Publication year: 2020
Topic 5 - Terms: github, project, source, vulnerability, apps, development, tool, repository, present, information
Abstract: We present a dataset of open source software developed mainly by enterprises rather than volunteers. This can be used to address known generalizability concerns, and, also, to perform research on o... Read more
Esteban Parra, A. Ellis, S. Haiduc
Publication year: 2020
Topic 13 - Terms: source, system, evolution, code, build, set, project, developer, comment, information
Abstract: Team communication is essential for the development of modern software systems. For distributed software development teams, such as those found in many open source projects, this communication usua... Read more
Preetha Chatterjee, Kostadin Damevski, Nicholas A. Kraft, L. Pollock
Publication year: 2020
Topic 5 - Terms: github, project, source, vulnerability, apps, development, tool, repository, present, information
Abstract: More than ever, developers are participating in public chat communities to ask and answer software development questions. With over ten million daily active users, Slack is one of the most popular ... Read more
Usman Ashraf, Christoph Mayr-Dorn, Alexander Egyed, Sebastiano Panichella
Publication year: 2020
Topic 10 - Terms: repository, issue, information, system, source, project, github, file, evolution, developer
Abstract: Several researchers have studied that developers contributing to open source systems tend to self-organize in “emerging” teams. The structure of these latent teams has a significant impact on softw... Read more
Rafael-Michael Karampatsis, Charles Sutton
Publication year: 2019
Topic 7 - Terms: bug, project, repository, report, developer, code, study, single, source, activity
Abstract: Program repair is an important but difficult software engineering problem. One way to achieve acceptable performance is to focus on classes of simple bugs, such as bugs with single statement fixes,... Read more
Themistoklis G. Diamantopoulos, Michail D. Papamichail, Thomas Karanikiotis, Kyriakos C. Chatzidimitriou, A. Symeonidis
Publication year: 2020
Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis
Abstract: The full integration of online repositories in contemporary soft-ware development promotes remote work and collaboration. Apart from the apparent benefits, online repositories offer a deluge of dat... Read more
Xunhui Zhang, Ayushi Rastogi, Yue Yu
Publication year: 2020
Topic 8 - Terms: study, project, model, development, based, repository, source, issue, code, tool
Abstract: Pull-based development is a widely adopted paradigm for collaboration in distributed software development, attracting eyeballs from both academic and industry. To better study pull-based developmen... Read more
András Kicsi, László Vidács, T. Gyimóthy
Publication year: 2020
Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis
Abstract: High test-to-code traceability can be an important aspect of quality assurance and can contribute to bug localization and code mainte-nance. Several existing techniques and a considerable effort fr... Read more
Maëlick Claes, M. Mäntylä
Publication year: 2020
Topic 10 - Terms: repository, issue, information, system, source, project, github, file, evolution, developer
Abstract: Data of long-lived and high profile projects is valuable for research on successful software engineering in the wild. Having a dataset with different linked software repositories of such projects, ... Read more
Cristiano Politowski, Fábio Petrillo, G. Ullmann, Josias de Andrade Werly, Yann-Gaël Guéhéneuc
Publication year: 2020
Topic 11 - Terms: source, defect, present, developer, android, test, security, development, project, datasets
Abstract: Different from traditional software development, there is little in-formation about the software-engineering process and techniques in video-game development. One popular way to share knowledge amo... Read more
C. Brandt, Annibale Panichella, A. Zaidman, M. Beller
Publication year: 2020
Topic 13 - Terms: source, system, evolution, code, build, set, project, developer, comment, information
Abstract: Build logs are textual by-products that a software build process creates, often as part of its Continuous Integration (CI) pipeline. Build logs are a paramount source of information for developers ... Read more
Federico Coró, Roberto Verdecchia, Emilio Cruciani, Breno Miranda, A. Bertolino
Publication year: 2020
Topic 2 - Terms: source, test, bug, provide, code, analysis, community, method, study, package
Abstract: The recent push towards test automation and test-driven development continues to scale up the dimensions of test code that needs to be maintained, analysed, and processed side-by-side with pro-duct... Read more
Gian Luca Scoccia, Anthony S Peruma, Virginia Pujols, Ben Christians, Daniel E. Krutz
Publication year: 2019
Topic 9 - Terms: repository, bug, source, project, code, information, method, developer, system, present
Abstract: Android applications (apps) rely upon proper permission usage to ensure that the user's privacy and security are adequately protected. Unfortunately, developers frequently misuse app permissions in... Read more
Saket Joshi, S. Chimalakonda
Publication year: 2019
Topic 12 - Terms: project, source, code, development, clone, set, bug, github, os, datasets
Abstract: In the recent years, there has been a surge in the adoption of agile development model and continuous integration (CI) in software development. Recent trends have reduced average release cycle leng... Read more
O. Riganelli, M. Mobilio, D. Micucci, L. Mariani
Publication year: 2019
Topic 5 - Terms: github, project, source, vulnerability, apps, development, tool, repository, present, information
Abstract: Android apps must be able to deal with both stop events, which require immediately stopping the execution of the app without losing state information, and start events, which require resuming the e... Read more
Sumon Biswas, Md Johirul Islam, Yijia Huang, Hridesh Rajan
Publication year: 2019
Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github
Abstract: The popularity of Python programming language has surged in recent years due to its increasing usage in Data Science. The availability of Python repositories in Github presents an opportunity for m... Read more
Marius Kamp, Patrick Kreutzer, M. Philippsen
Publication year: 2019
Topic 12 - Terms: project, source, code, development, clone, set, bug, github, os, datasets
Abstract: In the past, techniques for detecting similarly behaving code fragments were often only evaluated with small, artificial oracles or with code originating from programming competitions. Such code fr... Read more
Haoyu Wang, Junjun Si, Hao Li, Yao Guo
Publication year: 2019
Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github
Abstract: A large number of research studies have been focused on detecting Android malware in recent years. As a result, a reliable and large-scale malware dataset is essential to build effective malware cl... Read more
Aida Radu, Sarah Nadi
Publication year: 2019
Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis
Abstract: While several researchers have published bug data sets in the past, there has been less focus on bugs related to non-functional requirements. Non-functional requirements describe the quality attrib... Read more
Serena Elisa Ponta, H. Plate, A. Sabetta, M. Bezzi, Cédric Dangremont
Publication year: 2019
Topic 5 - Terms: github, project, source, vulnerability, apps, development, tool, repository, present, information
Abstract: Advancing our understanding of software vulnerabilities, automating their identification, the analysis of their impact, and ultimately their mitigation is necessary to enable the development of sof... Read more
Amine Benelallam, Nicolas Harrand, César Soto-Valero, B. Baudry, Olivier Barais
Publication year: 2019
Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis
Abstract: The Maven Central Repository provides an extraordinary source of data to understand complex architecture and evolution phenomena among Java applications. As of September 6, 2018, this repository in... Read more
Rijnard van Tonder, Asher Trockman, Claire Le Goues
Publication year: 2019
Topic 5 - Terms: github, project, source, vulnerability, apps, development, tool, repository, present, information
Abstract: Cryptocurrencies are a significant development in recent years, featuring in global news, the financial sector, and academic research. They also hold a significant presence in open source developme... Read more
Rui Rua, Marco Couto, J. Saraiva
Publication year: 2019
Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis
Abstract: This paper presents the GreenSource infrastructure: a large body of open source code, executable Android applications, and curated dataset containing energy code metrics. The dataset contains energ... Read more
Hugo Matalonga, Bruno Cabral, F. C. Filho, Marco Couto, Rui Pereira, S. Sousa, J. Fernandes
Publication year: 2019
Topic 8 - Terms: study, project, model, development, based, repository, source, issue, code, tool
Abstract: As mobile devices are supporting more and more of our daily activities, it is vital to widen their battery up-time as much as possible. In fact, according to the Wall Street Journal, 9/10 users suf... Read more
Antoine Pietri, D. Spinellis, Stefano Zacchiroli
Publication year: 2019
Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github
Abstract: Software Heritage is the largest existing public archive of software source code and accompanying development history: it currently spans more than five billion unique source code files and one bil... Read more
Dirk Beyer
Publication year: 2019
Topic 1 - Terms: project, code, source, developer, study, api, model, system, technique, type
Abstract: The analysis of correctness proofs and counterexamples of program source code is an important way to gain insights into methods that could make it easier in the future to find invariants to prove a... Read more
A. Wickert, Michael Reif, Michael Eichberg, Anam Dodhy, M. Mezini
Publication year: 2019
Topic 4 - Terms: code, bug, repository, github, tool, study, activity, source, technique, apps
Abstract: Cryptographic APIs (Crypto APIs) provide the foundations for the development of secure applications. Unfortunately, most applications do not use Crypto APIs securely and end up being insecure, e.g.... Read more
Musfiqur Rahman, Peter C. Rigby, Dharani Palani, T. Nguyen
Publication year: 2019
Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github
Abstract: Generating source code API sequences from an English query using Machine Translation (MT) has gained much interest in recent years. For any kind of MT, the model needs to be trained on a parallel c... Read more
V. Efstathiou, D. Spinellis
Publication year: 2019
Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github
Abstract: The emergence of online open source repositories in the recent years has led to an explosion in the volume of openly available source code, coupled with metadata that relate to a variety of softwar... Read more
D. Spinellis
Publication year: 2018
Topic 13 - Terms: source, system, evolution, code, build, set, project, developer, comment, information
Abstract: The documented Unix facilities data set provides the details regarding the evolution of 15596 unique facilities through 93 versions of Unix over a period of 48 years. It is based on the manual tran... Read more
Yulin Xu, Minghui Zhou
Publication year: 2018
Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis
Abstract: In many open source software projects (e.g., the Linux kernel), people contribute by sending code patches to the community. The community evaluates these contributions and decides whether to integr... Read more
Ripon K. Saha, Yingjun Lyu, Wing Lam, H. Yoshida, M. Prasad
Publication year: 2018
Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis
Abstract: We present Bugs.jar, a large-scale dataset for research in automated debugging, patching, and testing of Java programs. Bugs.jar is comprised of 1,158 bugs and patches, drawn from 8 large, popular ... Read more
M. Paixão, J. Krinke, Donggyun Han, M. Harman
Publication year: 2018
Topic 3 - Terms: code, review, project, source, present, change, set, github, repository, quality
Abstract: Code review has been widely adopted by both industrial and open source software development communities. Research in code review is highly dependant on real-world data, and although existing resear... Read more
Kyriakos C. Chatzidimitriou, Michail D. Papamichail, Themistoklis G. Diamantopoulos, Michail Tsapanos, A. Symeonidis
Publication year: 2018
Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis
Abstract: As the popularity of the JavaScript language is constantly increasing, one of the most important challenges today is to assess the quality of JavaScript packages. Developers often employ tools for ... Read more
Vadim Markovtsev, Waren Long
Publication year: 2018
Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github
Abstract: The number of open source software projects has been growing exponentially. The major online software repository host, GitHub, has accumulated tens of millions of publicly available Git version-con... Read more
F. Geiger, I. Malavolta, L. Pascarella, Fabio Palomba, Dario Di Nucci, Alberto Bacchelli
Publication year: 2018
Topic 4 - Terms: code, bug, repository, github, tool, study, activity, source, technique, apps
Abstract: Obtaining a good dataset to conduct empirical studies on the engineering of Android apps is an open challenge. To start tackling this challenge, we present AndroidTimeMachine, the rst, self-contain... Read more
A. Yamashita, Fábio Petrillo, Foutse Khomh, Yann-Gaël Guéhéneuc
Publication year: 2018
Topic 1 - Terms: project, code, source, developer, study, api, model, system, technique, type
Abstract: There are two well-known difficulties to test and interpret methodologies for mining developer interaction traces: first, the lack of enough large datasets needed by mining or machine learning appr... Read more
Gerald Schermann, Sali Zumberi, Jürgen Cito
Publication year: 2018
Topic 13 - Terms: source, system, evolution, code, build, set, project, developer, comment, information
Abstract: Docker containers are standardized, self-contained units of applications, packaged with their dependencies and execution environment. The environment is defined in a Dockerfile that specifies the s... Read more
Yue Yu, Zhixing Li, Gang Yin, Tao Wang, Huaimin Wang
Publication year: 2018
Topic 9 - Terms: repository, bug, source, project, code, information, method, developer, system, present
Abstract: In GitHub, the pull-based development model enables community contributors to collaborate in a more efficient way. However, the distributed and parallel characteristics of this model pose a potenti... Read more
Antonios Gkortzis, Dimitris Mitropoulos, D. Spinellis
Publication year: 2018
Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis
Abstract: Examining the different characteristics of open-source software in relation to security vulnerabilities, can provide the research community with findings that can lead to the development of more se... Read more
Nicole Novielli, Fabio Calefato, F. Lanubile
Publication year: 2018
Topic 7 - Terms: bug, project, repository, report, developer, code, study, single, source, activity
Abstract: Software developers experience and share a wide range of emotions throughout a rich ecosystem of communication channels. A recent trend that has emerged in empirical software engineering studies is... Read more
Jian Gao, Xin Yang, Yu Jiang, Han Liu, Weiliang Ying, Xian Zhang
Publication year: 2018
Topic 9 - Terms: repository, bug, source, project, code, information, method, developer, system, present
Abstract: Race detection is increasingly popular, both in the academic research and in industrial practice. However, there is no specialized and comprehensive dataset of the data race, making it difficult to... Read more
Pedro Martins, Rohan Achar, C. Lopes
Publication year: 2018
Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github
Abstract: We provide a repository of 50,000 compilable Java projects. Each project in this dataset comes with references to all the dependencies required to compile it, the resulting bytecode, and the script... Read more
V. Efstathiou, Christos Chatzilenas, D. Spinellis
Publication year: 2018
Topic 11 - Terms: source, defect, present, developer, android, test, security, development, project, datasets
Abstract: The software development process produces vast amounts of textual data expressed in natural language. Outcomes from the natural language processing community have been adapted in software engineeri... Read more
Jeroen Noten, J. Mengerink, Alexander Serebrenik
Publication year: 2017
Topic 8 - Terms: study, project, model, development, based, repository, source, issue, code, tool
Abstract: In model driven engineering (MDE), meta-models are the central artifacts. As a complement, the Object Constraint Language (OCL) is a language used to express constraints and operations on meta-mode... Read more
Mefta Sadat, A. Bener, A. Miranskyy
Publication year: 2017
Topic 11 - Terms: source, defect, present, developer, android, test, security, development, project, datasets
Abstract: The same defect can be rediscovered by multiple clients, causing unplanned outages and leading to reduced customer satisfaction. In the case of popular open source software, high volume of defects ... Read more
Chenguang Zhu, Yi Li, J. Rubin, M. Chechik
Publication year: 2017
Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github
Abstract: Over the last few years, researchers proposed several semantic history slicing approaches that identify the set of semantically-related commits implementing a particular software functionality. How... Read more
G. Robles, Truong Ho-Quang, R. Hebig, M. Chaudron, M. A. Fernández
Publication year: 2017
Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github
Abstract: The Unified Modeling Language (UML) is widely taught in academia and has good acceptance in industry. However, there is not an ample dataset of UML diagrams publicly available. Our aim is to offer ... Read more
L. Madeyski, M. Kawalerowicz
Publication year: 2017
Topic 10 - Terms: repository, issue, information, system, source, project, github, file, evolution, developer
Abstract: We would like to present the idea of our Continuous Defect Prediction (CDP) research and a related dataset that we created and share. Our dataset is currently a set of more than 11 million data row... Read more
Efthimia Aivaloglou, F. Hermans, J. Moreno-León, G. Robles
Publication year: 2017
Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis
Abstract: Scratch is increasingly popular, both as an introductory programming language and as a research target in the computing education research field. In this paper, we present a dataset of 250K recent ... Read more
A. Yamashita, S. Amirhossein Abtahizadeh, Foutse Khomh, Yann-Gaël Guéhéneuc
Publication year: 2017
Topic 1 - Terms: project, code, source, developer, study, api, model, system, technique, type
Abstract: A main difficulty to study the evolution and quality of real-life software systems is the effect of moderator factors, such as: programming skill, type of maintenance task, and learning effect. Exp... Read more
Megan Squire
Publication year: 2016
Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github
Abstract: Studying software repositories and hosting services can provide valuable insights into the behaviors of large groups of software developers and their projects. Traditionally, most analysis of metad... Read more
Kevin Allix, Tégawendé F. Bissyandé, Jacques Klein, Y. L. Traon
Publication year: 2016
Topic 5 - Terms: github, project, source, vulnerability, apps, development, tool, repository, present, information
Abstract: We present a growing collection of Android Applications col-lected from several sources, including the official GooglePlay app market. Our dataset, AndroZoo, currently contains more than three mill... Read more
Sebastian Proksch, Sven Amann, Sarah Nadi, M. Mezini
Publication year: 2016
Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github
Abstract: In this paper, we present a curated collection of 2833 C# solutions taken from Github. We encode the data in a new intermediate representation (IR) that facilitates further analysis by restricting ... Read more
Sven Amann, Sarah Nadi, H. Nguyen, T. Nguyen, M. Mezini
Publication year: 2016
Topic 1 - Terms: project, code, source, developer, study, api, model, system, technique, type
Abstract: Over the last few years, researchers proposed a multitude of automated bug-detection approaches that mine a class of bugs that we call API misuses. Evaluations on a variety of software products sho... Read more
Marco Ortu, Alessandro Murgia, Giuseppe Destefanis, Parastou Tourani, R. Tonelli, M. Marchesi, Bram Adams
Publication year: 2016
Topic 13 - Terms: source, system, evolution, code, build, set, project, developer, comment, information
Abstract: Issue tracking systems store valuable data for testing hypotheses concerning maintenance, building statistical prediction models and (recently) investigating developer affectiveness. For the latter... Read more
Xin Yang, R. Kula, Norihiro Yoshida, Hajimu Iida
Publication year: 2016
Topic 3 - Terms: code, review, project, source, present, change, set, github, repository, quality
Abstract: In this paper, we present a collection of Modern Code Review data for five open source projects. The data showcases mined data from both an integrated peer review system and source code repositorie... Read more
Jiaxin Zhu, Minghui Zhou, Hong Mei
Publication year: 2016
Topic 1 - Terms: project, code, source, developer, study, api, model, system, technique, type
Abstract: Many studies analyze issue tracking repositories to understand and support software development. To facilitate the analyses, we share a Mozilla issue tracking dataset covering a 15-year history. Th... Read more
Daniel E. Krutz, Mehdi Mirakhorli, Samuel A. Malachowsky, Andres Ruiz, Jacob Peterson, Andrew Filipski, Jared Smith
Publication year: 2015
Topic 5 - Terms: github, project, source, vulnerability, apps, development, tool, repository, present, information
Abstract: Android has grown to be the world's most popular mobile platform with apps that are capable of doing everything from checking sports scores to purchasing stocks. In order to assist researchers and ... Read more
M. Ohira, Yutaro Kashiwa, Yosuke Yamatani, Hayato Yoshiyuki, Yoshiya Maeda, Nachai Limsettho, K. Fujino, Hideaki Hata, Akinori Ihara, Ken-ichi Matsumoto
Publication year: 2015
Topic 7 - Terms: bug, project, repository, report, developer, code, study, single, source, activity
Abstract: The importance of supporting test and maintenance activities in software development has been increasing, since recent software systems have become large and complex. Although in the field of Minin... Read more
Bogdan Vasilescu, Alexander Serebrenik, V. Filkov
Publication year: 2015
Topic 3 - Terms: code, review, project, source, present, change, set, github, repository, quality
Abstract: Like any other team oriented activity, the software development process is effected by social diversity in the programmer teams. The effect of team diversity can be significant, but also complex, e... Read more
Vassilios Karakoidas, Dimitris Mitropoulos, P. Louridas, Georgios Gousios, D. Spinellis
Publication year: 2015
Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis
Abstract: Examining a large number of software artifacts can provide the research community with data regarding quality and design. We present a dataset obtained by statically analyzing 22730 jar files taken... Read more
A. Sawant, Alberto Bacchelli
Publication year: 2015
Topic 1 - Terms: project, code, source, developer, study, api, model, system, technique, type
Abstract: An Application Programming Interface (API) provides a specific set of functionalities to a developer. The main aim of an API is to encourage the reuse of already existing functionality. There has b... Read more
M. Wermelinger, Y. Yu
Publication year: 2015
Topic 7 - Terms: bug, project, repository, report, developer, code, study, single, source, activity
Abstract: A good evolution process and a good architecture can greatly support the maintainability of long-lived, large software systems. We present AREVOL, a dataset for the empirical study of architectural... Read more
Mayy Habayeb, A. Miranskyy, Syed Shariyar Murtaza, Leotis Buchanan, A. Bener
Publication year: 2015
Topic 7 - Terms: bug, project, repository, report, developer, code, study, single, source, activity
Abstract: The bug tracking repositories of software projects capture initial defect (bug) reports and the history of interactions among developers, testers, and customers. Extracting and mining information f... Read more
Harald Altinger, S. Siegl, Y. Dajsuren, F. Wotawa
Publication year: 2015
Topic 8 - Terms: study, project, model, development, based, repository, source, issue, code, tool
Abstract: In this paper, we present a novel industry dataset on static software and change metrics for Matlab/Simulink models and their corresponding auto-generated C source code. The data set comprises data... Read more
Andreas Mauczka, Florian Brosch, Christian Schanes, T. Grechenig
Publication year: 2015
Topic 3 - Terms: code, review, project, source, present, change, set, github, repository, quality
Abstract: Current research on change classification centers around automated and semi-automated approaches which are based on evaluation by either the researchers themselves or external experts. In most case... Read more
Titus Barik, Kevin Lubick, Justin Smith, John Slankas, E. Murphy-Hill
Publication year: 2015
Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github
Abstract: Spreadsheets are perhaps the most ubiquitous form of end-user programming software. This paper describes a corpus, called Fuse, containing 2,127,284 URLs that return spreadsheets (and their HTTP se... Read more
Fabio Palomba, Dario Di Nucci, Michele Tufano, G. Bavota, Rocco Oliveto, D. Poshyvanyk, A. D. Lucia
Publication year: 2015
Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis
Abstract: Code smells are symptoms of poor design and implementation choices that may hinder code comprehension and possibly increase change- and fault-proneness of source code. Several techniques have been ... Read more
Jesus M. Gonzalez-Barahona, G. Robles, Daniel Izquierdo-Cortazar
Publication year: 2015
Topic 10 - Terms: repository, issue, information, system, source, project, github, file, evolution, developer
Abstract: The Metrics Grimoire system is composed by a set of tools designed to retrieve data from repositories related to software development. Their aim is to produce organized databases suitable for easy ... Read more
Luca Ponzanelli, Andrea Mocci, Michele Lanza
Publication year: 2015
Topic 13 - Terms: source, system, evolution, code, build, set, project, developer, comment, information
Abstract: Stack Overflow is the de facto Question and Answer (Q&A) website for developers, and it has been used in many approaches by software engineering researchers to mine useful data. However, the conten... Read more
D. Germán, Bram Adams, A. Hassan
Publication year: 2015
Topic 10 - Terms: repository, issue, information, system, source, project, github, file, evolution, developer
Abstract: This dataset documents the activity in the public portion of the git Super-repository of the Linux kernel during 2012. In a distributed version control system, such as git, the Super-repository is ... Read more
Stefano Zacchiroli
Publication year: 2015
Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis
Abstract: We present the Debsources Dataset: distribution metadata and source code metrics spanning two decades of Free and Open Source Software (FOSS) history, seen through the lens of the Debian distributi... Read more
D. Spinellis
Publication year: 2015
Topic 10 - Terms: repository, issue, information, system, source, project, github, file, evolution, developer
Abstract: The evolution of the Unix operating system is made available as a version-control repository, covering the period from its inception in 1972 as a five thousand line kernel, to 2015 as a widely-used... Read more
B. Baldassari, P. Preux
Publication year: 2014
Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis
Abstract: Software engineering is a maturing discipline which has seen many drastic advances in the last years. However, some studies still point to the lack of rigorous and mathematically grounded methods t... Read more
G. Farah, Juan Sebastian Tejada, Darío Correal
Publication year: 2014
Topic 12 - Terms: project, source, code, development, clone, set, bug, github, os, datasets
Abstract: There is currently a vast array of open source projects available on the web, and although they are searchable by name or description in the search engines, there is no way to search for projects b... Read more
V. Saini, Hitesh Sajnani, Joel Ossher, C. Lopes
Publication year: 2014
Topic 9 - Terms: repository, bug, source, project, code, information, method, developer, system, present
Abstract: In this paper, we present data downloaded from Maven, one of the most popular component repositories. The data includes the binaries of 186,392 components, along with source code for 161,025. We id... Read more
Hiroaki Murakami, Yoshiki Higo, S. Kusumoto
Publication year: 2014
Topic 12 - Terms: project, source, code, development, clone, set, bug, github, os, datasets
Abstract: This paper introduces a new dataset of clone references, which is a set of correct clones consisting of their locational information with their gapped lines. Bellon's dataset is one of widely used ... Read more
James R. Williams, D. D. Ruscio, N. Matragkas, Juri Di Rocco, D. Kolovos
Publication year: 2014
Topic 12 - Terms: project, source, code, development, clone, set, bug, github, os, datasets
Abstract: The process of selecting open-source software (OSS) for adoption is not straightforward as it involves exploring various sources of information to determine the quality, maturity, activity, and use... Read more
Remco Bloemen, C. Amrit, S. Kuhlmann, Gonzalo Ordóñez‐Matamoros
Publication year: 2014
Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis
Abstract: Open source distributions such as Gentoo need to accurately track dependency relations between software packages in order to install working systems. To do this, Gentoo has a carefully authored dat... Read more
Chenlei Zhang, Abram Hindle
Publication year: 2014
Topic 9 - Terms: repository, bug, source, project, code, information, method, developer, system, present
Abstract: With the advent of mobile computing, the responsibility of software developers to update and ship energy efficient applications has never been more pronounced. Green mining attempts to address this... Read more
G. Robles, L. Reina, Alexander Serebrenik, Bogdan Vasilescu, Jesus M. Gonzalez-Barahona
Publication year: 2014
Topic 3 - Terms: code, review, project, source, present, change, set, github, repository, quality
Abstract: In this data paper we describe a data set obtained by means of performing an on-line survey to over 2,000 Free Libre Open Source Software (FLOSS) contributors. The survey includes questions related... Read more
A. Lazar, Sarah Ritchey, Bonita Sharif
Publication year: 2014
Topic 12 - Terms: project, source, code, development, clone, set, bug, github, os, datasets
Abstract: Automatic identification of duplicate bug reports is an important research problem in the mining software repositories field. This paper presents a collection of bug datasets collected, cleaned and... Read more
Daniel E. Krutz, Wei Le
Publication year: 2014
Topic 12 - Terms: project, source, code, development, clone, set, bug, github, os, datasets
Abstract: Code clones are functionally equivalent code segments. De- tecting code clones is important for determining bugs, fixes and software reuse. Code clone detection is also essential for developing fas... Read more
Georgios Gousios, Bogdan Vasilescu, Alexander Serebrenik, A. Zaidman
Publication year: 2014
Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github
Abstract: In recent years, GitHub has become the largest code host in the world, with more than 5M developers collaborating across 10M repositories. Numerous popular open source projects (such as Ruby on Rai... Read more
Kenji Fujiwara, Hideaki Hata, Erina Makihara, Yusuke Fujihara, Naoki Nakayama, Hajimu Iida, Ken-ichi Matsumoto
Publication year: 2014
Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github
Abstract: In the research of Mining Software Repositories, code repository is one of the core source since it contains the product of software development. Code repository stores the versions of files, and m... Read more
L. Passos, K. Czarnecki
Publication year: 2014
Topic 10 - Terms: repository, issue, information, system, source, project, github, file, evolution, developer
Abstract: This paper describes a dataset of feature additions and removals in the Linux kernel evolution history, spanning over seven years of kernel development. Features, in this context, denote configurab... Read more
Dimitris Mitropoulos, Vassilios Karakoidas, P. Louridas, Georgios Gousios, D. Spinellis
Publication year: 2014
Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis
Abstract: Examining software ecosystems can provide the research community with data regarding artifacts, processes, and communities. We present a dataset obtained from the Maven central repository ecosystem... Read more
Georgios Gousios, A. Zaidman
Publication year: 2014
Topic 12 - Terms: project, source, code, development, clone, set, bug, github, os, datasets
Abstract: Pull requests form a new method for collaborating in distributed software development. To study the pull request distributed development model, we constructed a dataset of almost 900 projects and 3... Read more
Simon Butler, M. Wermelinger, Y. Yu, H. Sharp
Publication year: 2013
Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github
Abstract: INVocD is a database of the identifier name declarations and vocabulary found in 60 FLOSS Java projects where the source code structure is recorded and the identifier name vocabulary is made direct... Read more
D. Binkley, Dawn J Lawrie, L. Pollock, Emily Hill, K. Vijay-Shanker
Publication year: 2013
Topic 1 - Terms: project, code, source, developer, study, api, model, system, technique, type
Abstract: Software engineering and evolution techniques have recently started to exploit the natural language information in source code. A key step in doing so is splitting identifiers into their constituen... Read more
Bogdan Vasilescu, Alexander Serebrenik, T. Mens
Publication year: 2013
Topic 9 - Terms: repository, bug, source, project, code, information, method, developer, system, present
Abstract: The Mining Software Repositories community typically focuses on data from software configuration management tools, mailing lists, and bug tracking repositories to uncover interesting and actionable... Read more
Werner Janjic, Oliver Hummel, M. Schumacher, C. Atkinson
Publication year: 2013
Topic 13 - Terms: source, system, evolution, code, build, set, project, developer, comment, information
Abstract: This paper describes a large, unabridged data-set of Java source code gathered and shared as part of the Merobase Component Finder project of the Software-Engineering Group at the University of Man... Read more
Megan Squire
Publication year: 2013
Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis
Abstract: This paper describes a new dataset containing Twitter screen names for members of the projects affiliated with the Apache Software Foundation (ASF). The dataset includes the confirmed Twitter scree... Read more
Megan Squire
Publication year: 2013
Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis
Abstract: This paper outlines the steps in the creation and maintenance of a new dataset listing leaders of the various projects of the Apache Software Foundation (ASF). Included in this dataset are differen... Read more
Georgios Gousios
Publication year: 2013
Topic 5 - Terms: github, project, source, vulnerability, apps, development, tool, repository, present, information
Abstract: During the last few years, GitHub has emerged as a popular project hosting, mirroring and collaboration platform. GitHub provides an extensive REST API, which enables researchers to retrieve high-q... Read more
Patrick Wagstrom, C. Jergensen, A. Sarma
Publication year: 2013
Topic 12 - Terms: project, source, code, development, clone, set, bug, github, os, datasets
Abstract: Software projects, whether open source, proprietary, or a combination thereof, rarely exist in isolation. Rather, most projects build on a network of people and ideas from dozens, hundreds, or even... Read more
M. Goeminne, Maëlick Claes, T. Mens
Publication year: 2013
Topic 9 - Terms: repository, bug, source, project, code, information, method, developer, system, present
Abstract: We present a dataset of the open source software ecosystem Gnome from a social point of view. We have collected historical data about the contributors to all Gnome projects stored on git.gnome.org,... Read more
S. Raemaekers, A. Deursen, Joost Visser
Publication year: 2013
Topic 9 - Terms: repository, bug, source, project, code, information, method, developer, system, present
Abstract: We present the Maven Dependency Dataset (MDD), containing metrics, changes and dependencies of 148,253 jar files. Metrics and changes have been calculated at the level of individual methods, classe... Read more
Ahmed Lamkanfi, Javier Pérez, S. Demeyer
Publication year: 2013
Topic 7 - Terms: bug, project, repository, report, developer, code, study, single, source, activity
Abstract: The analysis of bug reports is an important subfield within the mining software repositories community. It explores the rich data available in defect tracking systems to uncover interesting and act... Read more
Alexander C. MacLean, C. Knutson
Publication year: 2013
Topic 6 - Terms: code, repository, source, project, datasets, file, analysis, developer, language, github
Abstract: Building non-trivial software is a social endeavor. Therefore, understanding the social network of developers is key to the study of software development organizations. We present a graph represent... Read more
Bogdan Dit, Andrew Holtzhauer, D. Poshyvanyk, Huzefa H. Kagdi
Publication year: 2013
Topic 12 - Terms: project, source, code, development, clone, set, bug, github, os, datasets
Abstract: Approaches that support software maintenance need to be evaluated and compared against existing ones, in order to demonstrate their usefulness in practice. However, oftentimes the lack of well-esta... Read more
Kazuki Hamasaki, R. Kula, Norihiro Yoshida, Ana Erika Camargo Cruz, Kenji Fujiwara, Hajimu Iida
Publication year: 2013
Topic 3 - Terms: code, review, project, source, present, change, set, github, repository, quality
Abstract: We present four datasets that are focused on the general roles of OSS peer review members. With data mined from both an integrated peer review system and code source repositories, our rich datasets... Read more
M. Mukadam, C. Bird, Peter C. Rigby
Publication year: 2013
Topic 0 - Terms: code, source, project, library, tool, android, metric, bug, test, analysis
Abstract: Over the past decade, a number of tools and systems have been developed to manage various aspects of the software development lifecycle. Until now, tool supported code review, an important aspect o... Read more