Skip to content

Instantly share code, notes, and snippets.

@yegor256
Last active November 20, 2024 14:25
Show Gist options
  • Save yegor256/946c31490a72a89ef043937c9f680258 to your computer and use it in GitHub Desktop.
Save yegor256/946c31490a72a89ef043937c9f680258 to your computer and use it in GitHub Desktop.
student-projects.md

If you are a student, I would be glad to supervise your diploma/course project, on one of the following topics (in no particular order). Also, you may find additional topics here: PainOfOOP, SQM, PMBA, OSBP.

Object-Oriented Benchmark for JVM:
SPECjbb 2015 is an de facto industry standard for benchmarking performance of Java Virtual Machines (JVM). However, there are two problems with this benchmark: 1) it's not open source and 2) it's rather old. For example, it doesn't cover Stream API at all. We can create a new benchmark (a collection of Java scripts) that would test the performance of a JVM with a strong focus on its object-oriented features, in particular dynamic dispatch and object allocation.

Fast In-Memory Graph Manager, in Rust:
In EO, an experimental object-oriented programming language, there is a possibility to represent any program as a directed graph, where nodes would be objects and edges between them would be attributed. The SODG format, which we introduced to the EO compiler in 2023, implements exactly such a representation. REO, the virtual machine that we implemented to compile SODG and execute, in Rust. Now, we need a fast in-memory manager of the graph, which would enable adding new nodes, new edges, removing them and attaching data to some nodes. The manager should be implemented as an independent Rust library.
Complexity: Middle; Effort: Middle; Stack: Rust

A Comparison of Java vs. C++ Memory Management Performance:
There are different memory management strategies in C++ and Java. While C++ expects programmers to allocate and release memory blocks explicitly, Java relies on a background process of garbage collection. It would be interesting to setup a benchmarking experiment and compare the performance of object allocations in these two languages. It may be assumed that Java will be faster, but there are no experimental evidence of them have been collected so far (to my knowledge).
Complexity: High; Effort: Middle; Stack: C++, Java

A Study of Object-Oriented Optimizations in Clang:
Since the invention of OOP, a number of compile-time and run-time optimizations have been proposed by researchers, including object inlining, method specialization, inline caching, object combining, and others. To our best knowledge, there is no benchmark that would check the presence of such optimizations in a C++ compiler and their effectiveness. We may create such a benchmark and then analyze Clang.
Complexity: Middle; Effort: Middle; Stack: C++

Study of C++ Virtual Tables Popularity:
Dynamic dispatch is one of the two primary sources of performance inefficiencies in object-oriented programming (along with on-heap object allocation). It would be interesting to study the runtime behavior of a number of C++ programs to find out how many object methods are executed, located through virtual tables versus those that are statically linked. This study may help compiler designers understand the importance of devirtualization. We recently published a similar research for Java.
Complexity: High; Effort: High; Stack: C++

Evaluation of Performance of malloc() in Different Operating Systems:
Heap is the primary storage for variable-sized memory blocks in modern operating systems and virtual machines. Allocating a slice of bytes in the heap and then releasing it back is a time-consuming operation, requiring several hundred CPU cycles. However, the exact number of cycles it takes to allocate and free memory chunks in different virtual machines and OSs remains unclear, even though some folklore studies exist. We suggest studying this subject, performing experiments on a sufficiently large number of testing platforms, summarizing and analyzing the results, and then publishing a research paper. Such an analysis might assist creators of programming languages and compilers in making better design decisions.
Complexity: High; Effort: High; Stack: C++

Study of Preferences of Programmers for Object Extension:
In object-oriented programming, additional functionality can be added to classes using inheritance, decoration, composition, or by simply expanding existing classes with new code or methods. It is commonly believed that most programmers, especially those with over 10 years of practical coding experience, prefer decoration or composition, as these methods typically result in superior design. To validate this assumption, we propose conducting a survey among a sizeable group of programmers. We’ll present them with various code snippets and ask them to choose a method for modifications. The findings from our research might offer insights to designers of new programming languages about programmers’ perceptions of OOP.
Complexity: Low; Effort: Middle; Stack: Java

Study of Data Presence in Design Patterns:
In object-oriented programming, many design patterns are recommended for use. It’s commonly believed that if programmers adhere to these patterns in their code, the code quality will improve due to clearer design. We hypothesize that most design patterns emphasize the encapsulation of behaviors rather than data. In other words, the objects participating in design patterns are typically “dataless” objects. We propose studying this subject through a Systematic Literature Review (SLR) of existing literature on design patterns, aiming to either confirm or refute our hypothesis. The results of our research might be useful for compiler and programming language designers, prompting them to treat objects differently if they are dataless.
Complexity: Low; Effort: Middle; Stack: Java

SQLite Backend for Factbase:
Factbase is an existing open source NoSQL database engine with LISP-ish query interface, which is implemented in pure Ruby. Its backend has to replaced by SQLite, in order to make the system faster.
Complexity: High; Effort: High; Stack: Ruby

REPL for EO:
EO is a compiled object-oriented programming language. We need to create REPL for it, enabling interactive usage of it. The implementation language is not important, but something simple as Bash or Python would be the most preferred options.
Complexity: Middle; Effort: Middle; Stack: JavaScript

JSmith:
CSmith is a famous tool for random C code generation. We want to create a similar tool, but for Java language. It should be a command line open source tool, generating Java code according to provided configuration params. We already created a draft (you will have to implement its entire functionality).
Complexity: Middle; Effort: Low; Stack: Java

CaM Dataset:
CaM is an open source dataset of 800K Java classes collected automatically from GitHub, using Python. We need to enrich it with additional metrics and perform a study of relationships between metrics, then publishing a research paper with conclusions.
Complexity: Middle; Effort: Middle; Stack: Python

EOdoc:
EO is an experimental object-oriented programming language. We need to create an automated documentation generator for it, which will read source code files and generate HTML pages with the documentation, similar to how javadoc and rustdoc work.
Complexity: Low; Effort: Low; Stack: JavaScript

EOfmt:
Would be nice to create an auto-formatter for EO programming language. Similar to rustfmt, it should take an .eo file and reformat it.
Complexity: Low; Effort: Low; Stack: JavaScript

Style Checker of EOLANG Programs (EOlint):
Similar to cpplint and pylint, EO language needs a linter: a command line tool that would take a collection of .eo files as an input and emit complaints about they style, their anti-patterns, high complexity, low readability, etc.
Complexity: Middle; Effort: Middle; Stack: JavaScript OR Java
Repository: objectionary/lints

0PDD:
0PDD is a hosted open source GitHub chatbot that helps programmers decompose their tasks on-fly, via special TODO markers in their source code, known as "puzzles". We need to make it use Machine Learning in order to prioritize the backlogs of GitHub repositories. You will need to use Ruby and its existing ML frameworks, most probably rumale.
Complexity: High; Effort: Middle; Stack: Ruby

Xembly:
Xembly, created in 2013, is an imperative lanuage for XML manipulations and a Java library that implements the language. Even though the language and the library work, they never were compared with other libraries for performance and usability. We can perform such a comparison and then publish an academic paper about it.
Complexity: Low; Effort: Middle; Stack: Java

Port Xembly to JavaScript/Python/C++/C#:
Xembly, created in 2013, is an imperative lanuage for XML manipulations and a Java library that implements the language. Currently, there are only two implementations of Xembly: in Java and Ruby. Would be great to port it to a few other programming languages, like C#, C++, Python, JavaScript, Go, and maybe others.
Complexity: Moderate; Effort: Middle; Stack: Any

Type Inference for EO:
There are no types in EO, an experimental object-oriented programming language. It would be interesting to create a type inference subsystem, which would "guess" types of objects (we call them "formas") with high enough precision. Similar type inference systems exist, for example, for Python (pytype) and JavaScript (flow).
Complexity: High; Effort: High; Stack: Java

Simple-XSL:
XSLT is a functional language for XML transformations. It is widely used in web development, ETL pipelines, and even in language design (EO-to-Java compiler is written in XSLT). However, the complexity of the language is often a barrier for programmers not familiar with XML. We may create a new language, with exactly the same grammar and semantic as XSLT 3.0, but with a simpler syntax. A good example of such approach is HAML, which is a simplified version of HTML.
Complexity: Middle; Effort: High; Stack: any

IntelliJ IDEA Plugin for EO:
A few years ago, we've created an EO plugin for IntelliJ IDEA. It highlights the syntax of EO and helps detect syntax errors. However, its functionality is pretty limited: it doesn't support code completion, compilation, debugging, and many other features a user would expect. Would be nice to improve the plugin and release its updated version with all the needed features.
Complexity: Middle; Effort: Middle; Stack: Java

CTFE for EO:
EO is an object-oriented programming language, where everything is an object, include arithmetic operators. Would be interesting to create a Compile-time_function_executor as a command line tool. It would take an EO program and replace some of its expressions with constants. For example, 2.plus 2 would be replaced with the 4 literal, thus making the program faster.
Complexity: Middle; Effort: Low; Stack: Java

XSL linter:
In our projects, we have many XSL documents. They are essentially XML documents and we validate their formatting using xcop. However, we don't use any XSL static analyzer. The problem is that there is none on the market. Only a prototype exist, which is not so easy to use. We may take this prototype, use it as a basis, and create our own XSL linter.
Complexity: Middle; Effort: Middle; Stack: Java

Goto Eliminator for C/C++:
There is goto statement in C/C++ programming languages, which can be used freely in any place of the code. The existence of goto statements often makes code impossible to translate to EO, even though we have goto object over there. I would be nice to have an automated tool, which would take C/C++ code as an input and generate a new C/C++ code without goto statements. This paper may be relevant.
Complexity: High; Effort: High; Stack: any

EO Runtime Libraries:
At the moment, EO programming language has a very limited set of objects in its runtime library. Would be nice to add more objects, in order to enable more convenient usage of the language. The following groups of objects are of the highest necessity:

  • eo-files: open, close, read, write, delete and other file operations via syscalls
  • eo-dom: to build and traverse XML documents, using DOM and XPath
  • eo-sax: to scan XML documents, using SAX
  • eo-http: to send and receive HTTP
  • eo-time: to parse, print, and manage date/time
  • eo-exec: to interact with the command line (exec and spawn)
  • eo-unicode: to manager UTF-8/12 strings
  • eo-base64: to encode and decode Base64/Base58 strings
  • eo-sprintf: to simulate sprintf
  • eo-pgsql: to interact with PostgreSQL
  • eo-xembly: to modify XML using Xembly language
  • eo-json: to parse and print JSON documents

Complexity: Middle; Effort: Middle; Stack: Java

Graph Algorithms in EO:
We have a simple benchmark of three programming langauges (C++, Java, and EO) in this repository. However, only one algorithm is implemented in EO. Would be great to implement other three too and then run the benchmark. Then, would be interesting to summarize the results and publish an academic paper about it.
Complexity: Middle; Effort: Middle; Stack: Java, C++

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment