Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save trikitrok/3d0c48e245d4454b48bf9751ab850071 to your computer and use it in GitHub Desktop.
Save trikitrok/3d0c48e245d4454b48bf9751ab850071 to your computer and use it in GitHub Desktop.

Modified Text Processing kata

As a developer that writes blog posts I want a tool that helps me to understand better the text I am writing. For that I need a way to know the following:

  1. What are the most common words used in the text?

  2. How many characters does the text have?

Passing a text through the command line the programa should show the results on the console.

Iteration 1.

  • Count the words in the text.

    > analyze Hello
    > These are the top 1 most used words:
    > 1 hello
    

Iteration 2.

  • Rank the words from most used to less used.

Example:

> analyze Hello, this is an example for you to practice. You should grab this text and make it as your test case
> These are the top 10 most used words:
> 1 you
> 2 this
> 3 your
> 4 to
> 5 text
> 6 test
> 7 should
> 8 practice
> 9 make
> 10 it
> The text contains 21 words.

Iteration 3.

  • Show the frequency of each word.

Example:

> analyze G A T T A C A
> These are the top 4 most used words:
> 1 A (3)
> 2 T (2)
> 3 G (1)
> 4 C (1)
> The text contains 7 words.

Iteration 4.

  • Indicate the maximum number of words to show. Use the parameter --max: analyze G A T T A C A --max=2

Example:

> analyze G A T T A C A --max=2
> These are the top 2 most used words:
> 1 A (3)
> 2 T (2)
> The text contains 7 words.

Iteration 5.

  • Don't show infrequent words. Indicate the minimum frequency to show using the --minfreq parameter.

Example:

> analyze G A T T A C A --minfreq=3
> These are the top 1 most used words:
> 1 A (3)
> The text contains 7 words.

Iteration 6.

  • Don't show some words using the noshow parameter (hint use --noshow=[return constructor readonly class private public]).

Example:

> analyze G A T T A C A --noshow=[T C G]
> These are the top 1 most used words:
> 1 A (3)
> The text contains 7 words.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment