Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save MR901/84711fafcd7a1327c26a2ca6ef21cc0d to your computer and use it in GitHub Desktop.
Save MR901/84711fafcd7a1327c26a2ca6ef21cc0d to your computer and use it in GitHub Desktop.
SNo Category Issue Description Implications Business Explanation
1 Environment & Dependency Control Limited control over Python versions, no native support for venv or conda, and minimal environment isolation. Difficult to pin down exact library versions, which is critical for reproducibility, model portability, and CI/CD. Lack of precise control may lead to inconsistent results between development and production, increasing project risk and time to market.
2 Production Deployment for Edge/Embedded Databricks is not optimized for producing lightweight, portable, or edge-deployable code and models. Inference pipelines and final models deployed on edge or embedded systems need tight memory and runtime control—something Databricks doesn’t support well. The platform isn't suitable for projects targeting IoT or low-power environments, which could delay adoption in real-world applications.
3 Real-Time Resource Monitoring Lack of real-time CPU/memory/IO/GPU usage metrics at the notebook or job level. Hard to optimize memory-heavy jobs or identify inefficient stages. No fine-grained profiling tools built in. Without visibility into resource usage, costs may spike unexpectedly, and debugging performance issues becomes harder.
4 Debugging & Observability Logs are fragmented across notebook outputs, cluster logs, and the Spark UI. Poor root cause analysis, especially for intermittent issues or dependency failures. Engineers spend more time tracing issues, slowing down delivery, and increasing operational costs.
5 Deployment Architecture Sprawl With Databricks layered on top of foundational cloud services (AWS/GCP/Azure), your production code and infrastructure become split across multiple environments. Increases complexity, reduces maintainability, and creates more surfaces for bugs, especially in lean teams with limited DevOps expertise. Managing infrastructure across multiple layers adds overhead, making operations more expensive and fragile.
6 Cost Governance & Transparency Databricks uses its own custom DBU billing model. Costs are not reported at a fine-grained per-user/job/table level. Cost prediction is difficult; tracking usage back to business units or teams requires custom tagging and external dashboards. Hard to forecast or explain spend, to the finance and business leaders. This may also lead to cost overruns or poor ROI clarity.
7 Not Suitable as a Data Historian Databricks is not designed as a real-time historian or long-term time-series archival platform (e.g., for OT/industrial/IOT data). You’ll need another system like InfluxDB, BigQuery, TimescaleDB, or native cloud data lakes to act as a source-of-truth historian. Additional tools and infrastructure must be integrated, increasing the total cost of ownership.
8 Fragmented Developer Experience Notebooks are powerful but don’t support local development fully, continuous sync with IDEs, or edge-based prototyping pipelines. For organizations working across laptops, workstations, cloud environments, and Databricks, syncing and testing code across platforms becomes a challenge. Developers lose productivity due to context switching and non-portable workflows, increasing dev cycle times.
9 Learning Curve & Team Scalability Databricks has a steep learning curve—understanding Spark, clusters, notebooks, Delta Lake, and MLflow is non-trivial, especially for small/lean teams. Smaller teams may not have the bandwidth to manage Databricks efficiently, especially in production environments. Upskilling takes time and money, and teams may avoid the platform due to perceived complexity.
10 Version Control Challenges Version control is limited to notebook-level Git integration (Repos). No built-in support for pull requests, branching workflows, or inline reviews. Tracking changes across large projects is hard; merges conflict easily and aren’t as robust as code-first workflows in GitHub/GitLab/Bitbucket. Increases chances of bugs and regressions in collaborative projects, slowing down development velocity.
11 Workflow Orchestration Lacks built-in DAG views, conditional branching, retries, and fine-grained task handling. Makes complex pipeline management hard. Not suitable for ML Ops/ETL with dozens of interdependent tasks. Advanced pipelines need extra tools, adding cost and architectural burden.
12 Unstructured Data Handling Databricks is Spark-first and structured-data-first. Handling of audio, video, images, or PDFs is inefficient and awkward. Use cases involving speech, CV, or multi-modal data suffer. Limits platform usability in modern AI applications that use diverse data types.
13 Cluster Lifecycle Friction Cluster startup time is non-trivial (1–5 mins), and auto-termination can kill sessions during breaks. Interrupts workflow in interactive settings; causes frustration during exploration. User frustration can result in lower adoption or reduced productivity.
14 Model Deployment Features MLflow model serving is simple but lacks deep MLOps support (no drift detection, no traffic routing, no inference analytics). Needs external systems for robust production deployment. Production-readiness of ML models becomes harder, delaying real-world impact.
15 SQL Performance Serverless SQL can be slow for certain queries, and pricing is not as competitive as Snowflake for BI workloads. Can become a bottleneck in analytics-heavy orgs. May need to invest in additional BI tools, increasing platform sprawl and cost.
16 Documentation Gaps Docs can be outdated, especially when newer cloud provider features are added or deprecated in the underlying platform. Confusion around configurations, features, and integration points (e.g., cloud-native auth, secrets, networking). Project timelines can slip due to reliance on support and experimentation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment