If you are a student, I would be glad to supervise your diploma/course project, on one of the following topics (in no particular order). Also, you may find additional topics here: PainOfOOP, SQM, PMBA, OSBP.
Object-Oriented Benchmark for JVM:
SPECjbb 2015 is an de facto industry standard for benchmarking performance of Java Virtual Machines (JVM). However, there are two problems with this benchmark: 1) it's not open source and 2) it's rather old. For example, it doesn't cover Stream API at all. We can create a new benchmark (a collection of Java scripts) that would test the performance of a JVM with a strong focus on its object-oriented features, in particular dynamic dispatch and object allocation.
Fast In-Memory Graph Manager, in Rust:
In EO, an experimental object-oriented programming language, there is a possibility to represent
any program as a directed graph, where nodes would be objects and edges between them
would be attributed. The SODG format, which we introduced to the EO compiler in 2023,
implements exactly such a representation. REO,
the virtual machine that we implemented to compile SODG and execute, in Rust. Now, we
need a fast in-memory manager of the graph, which would enable adding new nodes,
new edges, removing them and attaching data to some nodes. The manager should be implemented
as an independent Rust library.
Complexity: Middle; Effort: Middle; Stack: Rust
A Comparison of Java vs. C++ Memory Management Performance:
There are different memory management strategies in C++ and Java. While C++
expects programmers to allocate and release memory blocks explicitly, Java
relies on a background process of garbage collection. It would be interesting
to setup a benchmarking experiment and compare the performance of object
allocations in these two languages. It may be assumed that Java will be faster,
but there are no experimental evidence of them have been collected so far (to my
knowledge).
Complexity: High; Effort: Middle; Stack: C++, Java
A Study of Object-Oriented Optimizations in Clang:
Since the invention of OOP, a number of compile-time and run-time optimizations
have been proposed by researchers, including object inlining, method specialization,
inline caching, object combining, and others. To our best knowledge,
there is no benchmark that would check the presence of such optimizations
in a C++ compiler and their effectiveness. We may create such a benchmark
and then analyze Clang.
Complexity: Middle; Effort: Middle; Stack: C++
Study of C++ Virtual Tables Popularity:
Dynamic dispatch is one of the two primary sources of performance inefficiencies
in object-oriented programming (along with on-heap object allocation). It would
be interesting to study the runtime behavior of a number of C++ programs to
find out how many object methods are executed, located through virtual tables
versus those that are statically linked. This study may help compiler designers
understand the importance of devirtualization. We recently published
a similar research for Java.
Complexity: High; Effort: High; Stack: C++
Evaluation of Performance of malloc() in Different Operating Systems:
Heap is the primary storage for variable-sized memory blocks in modern
operating systems and virtual machines. Allocating a slice of bytes in the heap
and then releasing it back is a time-consuming operation, requiring several
hundred CPU cycles. However, the exact number of cycles it takes to allocate
and free memory chunks in different virtual machines and OSs remains unclear,
even though some folklore studies exist.
We suggest studying this subject, performing experiments on a sufficiently large
number of testing platforms, summarizing and analyzing the results, and then
publishing a research paper. Such an analysis might assist creators of
programming languages and compilers in making better design decisions.
Complexity: High; Effort: High; Stack: C++
Study of Preferences of Programmers for Object Extension:
In object-oriented programming, additional functionality can be added to classes
using inheritance, decoration, composition, or by simply expanding existing
classes with new code or methods. It is commonly believed that most
programmers, especially those with over 10 years of practical coding
experience, prefer decoration or composition, as these methods typically result
in superior design. To validate this assumption, we propose conducting a survey
among a sizeable group of programmers. We’ll present them with various code
snippets and ask them to choose a method for modifications. The findings from
our research might offer insights to designers of new programming languages
about programmers’ perceptions of OOP.
Complexity: Low; Effort: Middle; Stack: Java
Study of Data Presence in Design Patterns:
In object-oriented programming, many design patterns are recommended for use.
It’s commonly believed that if programmers adhere to these patterns in their
code, the code quality will improve due to clearer design. We hypothesize that
most design patterns emphasize the encapsulation of behaviors rather than data.
In other words, the objects participating in design patterns are
typically “dataless” objects. We propose studying this subject through a
Systematic Literature Review (SLR) of existing literature on design patterns,
aiming to either confirm or refute our hypothesis. The results of our research
might be useful for compiler and programming language designers, prompting them
to treat objects differently if they are dataless.
Complexity: Low; Effort: Middle; Stack: Java
SQLite Backend for Factbase:
Factbase
is an existing open source NoSQL database engine with LISP-ish query
interface, which is implemented in pure Ruby. Its backend has to replaced by
SQLite, in order to make the system faster.
Complexity: High; Effort: High; Stack: Ruby
REPL for EO:
EO is a compiled object-oriented programming language. We need to create REPL
for it, enabling interactive usage of it. The implementation language is not
important, but something simple as Bash or Python would be the most preferred
options.
Complexity: Middle; Effort: Middle; Stack: JavaScript
JSmith:
CSmith is a famous tool for random C code generation.
We want to create a similar tool, but
for Java language. It should be a command line open source tool, generating
Java code according to provided configuration params. We already created a
draft (you will have to implement
its entire functionality).
Complexity: Middle; Effort: Low; Stack: Java
CaM Dataset:
CaM is an open source dataset of 800K Java classes collected automatically from GitHub, using Python. We need to enrich it with additional metrics and perform a study of relationships between metrics, then publishing a research paper with conclusions.
Complexity: Middle; Effort: Middle; Stack: Python
EOdoc:
EO is an experimental object-oriented programming language. We need to create an automated documentation generator for it, which will read source code files and generate HTML pages with the documentation, similar to how javadoc and rustdoc work.
Complexity: Low; Effort: Low; Stack: JavaScript
EOfmt:
Would be nice to create an auto-formatter for EO programming language. Similar to rustfmt, it should take an .eo
file and reformat it.
Complexity: Low; Effort: Low; Stack: JavaScript
Style Checker of EOLANG Programs (EOlint):
Similar to cpplint and pylint, EO language needs a linter: a command line tool that would take a collection of .eo
files as an input and emit complaints about they style, their anti-patterns, high complexity, low readability, etc.
Complexity: Middle; Effort: Middle; Stack: JavaScript OR Java
Repository: objectionary/lints
0PDD:
0PDD is a hosted open source GitHub chatbot that helps programmers decompose their tasks on-fly, via special TODO markers in their source code, known as "puzzles". We need to make it use Machine Learning in order to prioritize the backlogs of GitHub repositories. You will need to use Ruby and its existing ML frameworks, most probably rumale.
Complexity: High; Effort: Middle; Stack: Ruby
Xembly:
Xembly, created in 2013, is an imperative lanuage for XML manipulations and a Java library that implements the language. Even though the language and the library work, they never were compared with other libraries for performance and usability. We can perform such a comparison and then publish an academic paper about it.
Complexity: Low; Effort: Middle; Stack: Java
Port Xembly to JavaScript/Python/C++/C#:
Xembly, created in 2013, is an imperative lanuage for XML manipulations and a Java library that implements the language. Currently, there are only two implementations of Xembly: in Java and Ruby. Would be great to port it to a few other programming languages, like C#, C++, Python, JavaScript, Go, and maybe others.
Complexity: Moderate; Effort: Middle; Stack: Any
Type Inference for EO:
There are no types in EO, an experimental object-oriented programming language. It would be interesting to create a type inference subsystem, which would "guess" types of objects (we call them "formas") with high enough precision. Similar type inference systems exist, for example, for Python (pytype) and JavaScript (flow).
Complexity: High; Effort: High; Stack: Java
Simple-XSL:
XSLT is a functional language for XML transformations. It is widely used in
web development, ETL pipelines, and even in language design (EO-to-Java compiler is written in XSLT). However, the complexity of the language is often a barrier for programmers not familiar with XML. We may create a new language, with exactly the same grammar and semantic as XSLT 3.0, but with a simpler syntax. A good example of such approach is
HAML, which is a simplified version of HTML.
Complexity: Middle; Effort: High; Stack: any
IntelliJ IDEA Plugin for EO:
A few years ago, we've created an EO plugin for IntelliJ IDEA. It highlights the syntax of EO and helps detect syntax errors. However, its functionality is pretty limited: it doesn't support code completion, compilation, debugging, and many other features a user would expect. Would be nice to improve the plugin and release its updated version with all the needed features.
Complexity: Middle; Effort: Middle; Stack: Java
CTFE for EO:
EO is an object-oriented programming language, where everything is an object, include arithmetic operators. Would be interesting to create a Compile-time_function_executor as a command line tool. It would take an EO program and replace some of its expressions with constants. For example, 2.plus 2
would be replaced with the 4
literal, thus making the program faster.
Complexity: Middle; Effort: Low; Stack: Java
XSL linter:
In our projects, we have many XSL documents. They are essentially XML documents and we validate their formatting using xcop. However, we don't use any XSL static analyzer. The problem is that there is none on the market. Only a prototype exist, which is not so easy to use. We may take this prototype, use it as a basis, and create our own XSL linter.
Complexity: Middle; Effort: Middle; Stack: Java
Goto Eliminator for C/C++:
There is goto
statement in C/C++ programming languages, which can be used freely in any place of the code. The existence of goto
statements often makes code impossible to translate to EO, even though we have goto
object over there. I would be nice to have an automated tool, which would take C/C++ code as an input and generate a new C/C++ code without goto
statements. This paper may be relevant.
Complexity: High; Effort: High; Stack: any
EO Runtime Libraries:
At the moment, EO programming language has a very limited set of objects in its runtime library. Would be nice to add more objects, in order to enable more convenient usage of the language. The following groups of objects are of the highest necessity:
- eo-files: open, close, read, write, delete and other file operations via syscalls
- eo-dom: to build and traverse XML documents, using DOM and XPath
- eo-sax: to scan XML documents, using SAX
- eo-http: to send and receive HTTP
- eo-time: to parse, print, and manage date/time
- eo-exec: to interact with the command line (
exec
andspawn
) - eo-unicode: to manager UTF-8/12 strings
- eo-base64: to encode and decode Base64/Base58 strings
- eo-sprintf: to simulate sprintf
- eo-pgsql: to interact with PostgreSQL
- eo-xembly: to modify XML using Xembly language
- eo-json: to parse and print JSON documents
Complexity: Middle; Effort: Middle; Stack: Java
Graph Algorithms in EO:
We have a simple benchmark of three programming langauges (C++, Java, and EO) in this repository. However, only one algorithm is implemented in EO. Would be great to implement other three too and then run the benchmark. Then, would be interesting to summarize the results and publish an academic paper about it.
Complexity: Middle; Effort: Middle; Stack: Java, C++