This is a growing list of project ideas which may be good for a budding programmer to try. The tasks vary in difficulty and length, but I try to make it language agnostic.
-
-
Save pqnelson/3efd29483828bfb171b39d088350e915 to your computer and use it in GitHub Desktop.
Create a rolodex program, which stores contact information for people.
- Use a database to store the data.
- What database will you use? SQL or NoSQL? For SQL, which one: Postgres, MySQL, SQLite, or something else? For NoSQL: CouchDB, MongoDB, or something else?
- What will the schema be?
- What are the possible tables? And what are the relationships among them?
- What are the primary keys for the tables?
- What are the foreign keys for the tables?
- Use VCard format to store data.
- How to handle a person with multiple twitter handles?
- How do we differentiate distinct people with identical names?
- How will you handle loading information into the system? Displaying information?
- How to handle duplicate entries?
- Ostensibly, merging duplicates together would be best, but if this is done, how will we "undo" merging?
- If we delete entries, then will this be "soft deletes" (i.e., a column in the table with a
deleted
flag indicating the row has been deleted and should be ignored) or hard deletes (removing the entries from the database altogeter)?
Extend this program to include a person's CV. This would be a list of jobs a person has had, employed by either organizations or people.
Each entry may or may not have start/end dates. They may be partial dates (e.g., only a year, or only a month and year). There may be only one date given.
- What about extending this to include CV/resume information for individuals?
- How will you handle fragments of a person's resume? Or incomplete information? (E.g., sometimes we only have years when a person started/ended a job, or a month & year)
- We also want to store citations, for where we got the data about these relationships.
- What new tables are needed in the database? What are the relationships?
- How will we render information for a person? For an organization?
- We may be interested in asking, "Who has been employed by X at time t?" How would we render this information?
- How will we store information for an employer? For a job?
- How will we enter information into this system? How to handle bulk information?
- Consider what happens to the rows in these new tables when we merge duplicate people together. How do we handle "undoing" a merge?
Write a scraper to pull this information down from various websites. For example, Politico's "Influencer" newsletter will give us information about when people enter new jobs, leave positions, etc.; this gives us a steady stream of information, in a fairly adequate format. And we can setup an email for our bot, and regularly check it daily for the Politico email.
Create a library to generate UUIDs.
This should include creating a UUID (specifically a v1, v2, v3, or v4 UUID), deleting a UUID, comparing them, parsing a string to a UUID, and print a string version of the UUID.
The implementation should try to be memory efficient (i.e., 128-bit memory footprint, not say a 36 character string).
- What other methods are needed? What are their contracts?
- How to determine which version a UUID object is?
- What unit tests are needed?
Write a script to count the number of words in a file. Not only this, allow the user to ask for additional information. First a couple definitions:
Definition 1. A Contraction is when we append an apostrophe followed by at least one letter to a word or number, i.e., look like "<letter> "'" <letter>
" or "<digit> "'" <letter>
".
(End of definition)
Definition 2. Given a list of numbers, its summary statistics is a tuple: (min, 25% percentile, median, 75% percentile, max, mean, standard deviation, length of input list). (End of definition)
The user may ask for:
- Count the number of contractions, provide summary statistics
- Count the sentence length, provide summary statistics
- Count the number of sentences in each paragraph, plus summary statistics
- Count the number of words in each paragraph, plus summary statistics
- Number of words between each contraction, plus summary statistics
- What are the top
N
most common words? (Count "they're" as "they", in this case.)
The VCard 4.0 specification is worth consulting.