Forked from philschmid/search_query_instructions.txt
Created
January 30, 2025 17:01
-
-
Save Neilblaze/45f7ca8c5c980198bcad62133aa9b680 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Classify user search queries as either "Good Google Search Query" or "Bad Google Search Query" based on their likelihood of yielding relevant and helpful results from Google Search. | |
Input: User search query (text string). | |
Output: Classification label: | |
* Good Google Search Query: The query is likely to be effectively answered by Google Search. | |
* Bad Google Search Query: The query is unlikely to be effectively answered by Google Search. Further categorize "Bad" queries into subtypes for better understanding and classifier training (optional but highly recommended): | |
* Chit-Chat/Conversational/Social | |
* Personal/Subjective/Opinion-Based (Un-searchable) | |
* Vague/Ambiguous/Lacking Specificity | |
* Too Specific/Niche/Personal (Unlikely to be Indexed) | |
* Computational/Task-Oriented (Beyond Search) | |
* Commands/Actions (Not Search Intent) | |
* Incomplete/Missing Information (Crucial Context Missing) | |
* Misleading/Nonsensical/Gibberish | |
* Better Suited for LLMs (Reasoning, Creative Tasks, Complex Synthesis) | |
* Math Problems (Beyond Simple Arithmetic) | |
## Definitions: | |
Definition of "Good Google Search Queries": | |
A "Good Google Search Query" is defined as a query that: | |
* Expresses a clear and specific information need. The user is actively seeking factual information or answers to a question. | |
* Targets objective and factual information readily available and indexable on the web. The information sought is likely to be present in Google's search index. | |
* Is well-formed and uses appropriate keywords relevant to the information need. The query is grammatically sound and employs terms a search engine can effectively understand. | |
* Is specific enough to retrieve relevant results but not overly narrow or niche, ensuring some results are likely to be found. It strikes a balance between specificity and general searchability. | |
* Is non-personal and non-private in nature. It does not seek access to personal data or confidential information. | |
* Is intended to find information, not to engage in conversation, perform computational tasks directly within the search bar, or express opinions/emotions. | |
* Needs factual, up-to-date information to be answered precisely. | |
Examples of "Good Google Search Queries": | |
* "weather forecast in Paris" | |
* "history of the Roman Empire" | |
* "best recipes for chocolate chip cookies" | |
* "symptoms of seasonal allergies" | |
* "compare iPhone 14 vs Samsung Galaxy S23" | |
Definition of "Bad Google Search Queries" (Categorized): | |
"Bad Google Search Queries" are queries that are unlikely to yield satisfactory results from Google Search and may be better suited for other systems or are simply not effective queries. These fall into the following categories: | |
1. Chit-chat/Conversational Queries: Queries that are more akin to casual conversation starters and lack a specific information need. | |
* Examples: "How are you doing today?", "Tell me a funny joke please", "What's your favorite color?". | |
* Reason for "Bad": Google Search is not designed for conversational interactions. | |
2. Nonsensical/Poorly Formed Queries: Queries that are grammatically incorrect, contain random words, typos, or are incoherent and lack clear meaning. | |
* Examples: "asdfghjkl", "blue elephant table run", "what is the the to". | |
* Reason for "Bad": Difficult for Google to parse and understand the intended meaning. | |
3. Misleading/Factually Incorrect Premise Queries: Queries based on false information, misconceptions, or illogical premises. | |
* Examples: "Is the earth flat evidence?", "How to build a perpetual motion machine easily?", "Do vaccines cause autism proof?". | |
* Reason for "Bad": While Google might provide results, the underlying premise is flawed, and the user's intent might be misguided in the context of factual information retrieval. | |
4. Too Specific/Niche Queries (Low Search Volume/Unlikely to be Indexed): Queries that are extremely narrow, specific, or refer to very niche topics unlikely to have publicly available, indexed information. | |
* Examples: "average height of pine trees in my backyard in zip code 90210", "specific serial number lookup for a obscure product", "internal company document search". | |
* Reason for "Bad": Google's index may not contain information at this level of extreme specificity or for very low-interest topics. | |
5. Personal/Private Information Seeking Queries: Queries attempting to access personal data, private information, or confidential details. | |
* Examples: "my bank account balance", "private email address of John Doe", "social security number lookup". | |
* Reason for "Bad": Google Search is not designed to access or reveal private information. | |
6. Queries with Missing Essential Information: Queries that lack crucial details necessary to provide a meaningful and relevant answer. | |
* Examples: "What is the capital?", "How much does it cost?", "Translate this phrase". (Without specifying which capital, what costs, or which language). | |
* Reason for "Bad": Ambiguity makes it difficult for Google to understand the specific information need. | |
7. Task/Computational/Action Requests (Within Search Bar Context): Queries that are requests to perform a calculation, set a timer, or execute an action directly within the search bar interface, rather than seeking information. | |
* Examples: "2 + 2 =", "set timer for 10 minutes", "remind me to buy milk tomorrow". | |
* Reason for "Bad" (in terms of information search): While Google Search can sometimes handle these, they are not primarily information-seeking queries in the traditional sense. They might be better handled by assistants or other interfaces. | |
Procedure: | |
Step 1: Initial Assessment - Information Intent? | |
* Question: Is the query primarily intended to find information, answers, explanations, or solutions using a search engine? | |
* If YES: Proceed to Step 2. | |
* If NO: Classify as Bad Google Search Query. Go to Step 5 to determine the "Bad" query subtype. | |
Step 2: Keyword Specificity and Searchability | |
* Question: Does the query contain specific and relevant keywords related to a searchable topic? Is the topic something likely to be documented or discussed on the web? | |
* If YES: Proceed to Step 3. | |
* If NO (Keywords too vague or topic unsearchable): Classify as Bad Google Search Query. Go to Step 5 to determine the "Bad" query subtype (likely Vague/Ambiguous or Too Specific/Niche/Personal). | |
Step 3: Appropriate Specificity for Google Search | |
* Question: Is the query specificity balanced? Is it specific enough to narrow down results but not so specific that it becomes overly niche and yields no results? | |
* If YES: Proceed to Step 4. | |
* If NO (Too broad or too narrow): Classify as Bad Google Search Query. Go to Step 5 to determine the "Bad" query subtype (likely Vague/Ambiguous or Too Specific/Niche/Personal). | |
Step 4: Query Type and Suitability for Google Search | |
* Question: Is the query primarily seeking factual information, definitions, how-tos, comparisons, or objective analysis? Or, if it's a question requiring reasoning or creative output, is it still something Google Search can provide relevant background information for? | |
* If YES (Generally fact-based or seeks searchable information): Classify as Good Google Search Query. | |
* If NO (Primarily chit-chat, personal opinion, computational task, command, etc.): Classify as Bad Google Search Query. Go to Step 5 to determine the "Bad" query subtype. | |
Step 5: (For Bad Google Search Queries Only) Determine "Bad" Query Subtype | |
If you classified the query as "Bad Google Search Query" in any of the previous steps, determine the most appropriate subtype based on the following descriptions: | |
* Chit-Chat/Conversational/Social: Look for greetings, social inquiries, jokes, or conversational starters. (e.g., "Hi Google how are you?", "Tell me something funny") | |
* Personal/Subjective/Opinion-Based: Look for questions about personal feelings, opinions, subjective choices, or self-evaluation. (e.g., "Should I break up with my boyfriend?", "What color looks best on me?") | |
* Vague/Ambiguous: Look for queries lacking specific nouns, verbs, or context; overly general terms. (e.g., "things to do", "help", "information"). | |
* Too Specific/Niche/Personal: Look for queries referencing highly specific, likely non-public information or extremely niche topics. (e.g., "My doctor's appointment schedule", "The serial number of my toothbrush"). | |
* Computational/Task-Oriented: Look for requests to solve equations, translate languages, set timers, or perform complex calculations (beyond simple arithmetic Google Search can handle). (e.g., "Solve integral of x^2", "Translate this to Klingon"). | |
* Commands/Actions: Look for verbs indicating device control or application operations. (e.g., "Play music on Spotify", "Send email to John"). | |
* Incomplete/Missing Information: Look for queries with placeholders, missing locations, undefined pronouns, or crucial context missing. (e.g., "Best pizza near", "How to fix it", "Cheapest flights to"). | |
* Misleading/Nonsensical/Gibberish: Look for random characters, incoherent phrases, or queries based on false or nonsensical premises. (e.g., "asdfghjkl", "Why is water dry?", "Purple flying cats"). | |
* Better Suited for LLMs: Look for queries that require creative writing, complex summarization, in-depth explanation, nuanced comparison, or generation of novel content rather than just finding existing information links. (e.g., "Write a short story about a talking dog", "Explain quantum physics to a child", "Compare pros and cons of electric cars vs. gasoline cars in detail"). | |
* Math Problems (Beyond Simple Arithmetic): Look for complex mathematical equations, proofs, or calculations requiring specialized mathematical solvers. (e.g., "Prove Riemann Hypothesis", "Calculate eigenvalues of this matrix"). | |
Examples: | |
| Query | Classification | Bad Query Subtype (if applicable) | Rationale | | |
|--------------------------------------------|------------------------|------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------| | |
| "What is the capital of France?" | Good Google Search Query | N/A | Clear information intent, specific keywords, searchable topic. | | |
| "How are you feeling today?" | Bad Google Search Query | Chit-Chat/Conversational/Social | Conversational, not information seeking. | | |
| "My favorite color" | Bad Google Search Query | Personal/Subjective/Opinion-Based | Subjective opinion, not searchable fact. | | |
| "Weather" | Good Google Search Query (could be improved, but Google handles it well) | N/A (borderline Vague, but Google infers location) | Broad but Google effectively handles location-based weather searches. | | |
| "My diary from July 12th, 2023" | Bad Google Search Query | Too Specific/Niche/Personal | Highly personal and not publicly indexed. | | |
| "Solve x^2 + 3x - 4 = 0" | Bad Google Search Query | Computational/Task-Oriented (beyond simple) | Math problem requiring a solver, not just information retrieval (though simple equations are sometimes solved directly by Google). | | |
| "Play Despacito on Spotify" | Bad Google Search Query | Commands/Actions | Device command, not search intent. | | |
| "Best restaurant near" | Bad Google Search Query | Incomplete/Missing Information | Missing location information. | | |
| "asdfjkl;" | Bad Google Search Query | Misleading/Nonsensical/Gibberish | Gibberish, no coherent meaning. | | |
| "Write a poem about a cat wearing shoes" | Bad Google Search Query | Better Suited for LLMs | Creative writing task better suited for LLM generation, though Google Search could find cat poems, the intent is more creative generation. | | |
| "Calculate the integral of sin(x)/x from 0 to infinity" | Bad Google Search Query | Math Problems (Beyond Simple Arithmetic) | Complex mathematical calculation. | | |
Important Notes: | |
* Context Matters: Consider the overall context of search queries if available. | |
* Ambiguity Resolution: If a query is borderline, lean towards the classification that best reflects its *primary* intent. | |
* Consistency: Strive for consistency in your classifications. If multiple people are classifying, ensure clear understanding and agreement on the guidelines. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment