headingimage FUSIX

headingimage FUSIX Overview

FUSIX is a natural language code search library directed at Feature Location, a result of research, designed to facilitate source code maintenance activities. To accomplish that, it combines information retrieval techniques with semantic information available in source code and version control systems. Some features of FUSIX:

 

• Distinct Data Sources

Retrieve semantic information from source code and version control systems to enhance the search.
Combine and filter these sources to achieve the best result.

• Coarse and Fine Granularity

FUSIX allows to locate source code components of file and method-level granularity.

• Natural Language Queries

Locate interesting source code components using natural language, Google-like queries.

 

Feature location is the task of finding the source code that implements specific functionality in software systems. A common approach is to leverage textual information in source code against a query, using Information Retrieval (IR) techniques. To address the paucity of meaningful terms in source code, alternative, relevant source-code descriptions, like change-sets could be leveraged. The FUSIX library illustrates this using the ACIR approach: associating change-sets with the code they touch, using the change-set descriptions as the lexicons for that code. ACIR is the name of this approach and it uses VSM along with TF*IDF on the resultant bag-of-words associated with the code elements. It can work on all change-sets, on only the most recent change-sets, at class or at method-level granularity.

 

The demo below allows you to trial ACIR for a set of systems, selecting the query, the granularity of the analysis and whether you use all change sets or just the most recent one. It also lets you use a ‘baseline’ technique where the same VSM, TF*IDF technique is applied to the lexicons in code (identifiers and comments) to compare the 2 approaches. Likewise it allow you to trial the ‘hybrid’ technique, where the findings from both techniques are merged, typically to give better results.

 

headingimage Try Fusix Demo

(may take a minute to load)

[Total: 2    Average: 4.5/5]

1 = Very Poor / 2 = Poor / 3 = Good / 4 = Very Good / 5 = Excellent

 If you wish to vote again simply refresh the page.

 

headingimage Set Up and Running of the Tool

UI

FUSIX UI allows users to search for a particular component within source code. It presents users with various search options to select,  when all change set descriptions are selected, it concatenates them to produce a document of search results to the user. The list of components that the library finds are ranked by the most highly relevant to the least.

The UI comprises of the following

Query - What the users is searching for

Project - The 8 projects used in this study

Granularity - Two types can be chosen. 'File' the whole document is checked. 'Method' only the methods are checked to find where the functionality is implemented.

Recentness - 'Recent' being the most recent change sets and 'Historical' being all change sets.

App type - This is where you select or mix the comments and the change set description. 'Base type' is the base line i.e. comments and identifers, textual tokens concatenate them together to build the document. 'Plus' is where both are combined.

 

To create custom search corpus
ExecutorService executor = ... ;

Corpus<List<String>> corpus = Configurations.builder()
		.srcDir(Paths.get(...))
		.indexDir(Paths.get(...))
		.granularity(Granularity.FILE)
		.recentness(Recentness.RECENT)
		.filtered()
		.source(Source.VCS)
		.build();
		
Future<List<String>> future = executor.submit(corpus.create());
To search corpus
String query = ... ;
Future<Set<Component>> future = executor.submit(corpus.search(query));

Set<Component> components = future.get();
components.forEach(System.out::println);