Datasets
AGORA Projects
There are several lists of projects that can be used to populate AGORA. All lists are available in the folder projectlists of the AGORA Elasticsearch Client repository. The datasets are analyzed below:
- test_projects.txt: list of 10 test projects to be used when first installing the service in order to check that everything works as expected
- 100_most_starred_projects.txt: list of the 100 most starred Java projects of GitHub
- 3000_most_starred_projects.txt: list of the 3000 most starred Java projects of GitHub
- projects.txt: list of the 3000 most starred Java projects of GitHub, cleaned up by removing forked projects and invalid projects (e.g. without java code)
When setting up the service, one can initially use the list of the test projects. After that, one can clear up the index (by issuing a delete_index command) and populate it using either the lists of projects shown above or any other list of projects.
Currently, the service at the address http://agora.ee.auth.gr contains the projects of the list projects.txt.
Dataset
AGORA has been submitted to the SoftwareX Journal, where it is currently under review. This page contains the dataset used for our evaluation in a component reuse scenario that is presented in this manuscript. The dataset includes the list of projects contained in AGORA when performing our tests as well as the list of queries that were performed at the service.
The dataset is available here.