Mohit Rajput MR901

SNo	Advantages	Why it matters
1	Unified Data & AI Platform	Combines analytics, engineering, and ML into one consistent interface across multiple clouds.
2	Optimized Spark Runtime	High-performance distributed compute engine with better throughput than open-source Spark.
3	Delta Lake & Time Travel	Brings reliability, ACID compliance, and versioning to the data lake.
4	MLflow Native Integration	Seamlessly track experiments, versions, and manage model registry.
5	Collaboration Features	Built-in sharing and permissions help team collaboration.
6	Security & Governance	RBAC, Unity Catalog, and audit logging enable enterprise-grade governance.
7	Multi-cloud & Interoperability	Works across GCP, AWS, Azure with a consistent experience.
8	AutoML (Basic)	Allows non-experts to get started quickly with modeling.
9	Scalability for Large Workloads	Scales easily for petabyte-scale processing and large model training.

Metric	Description	Ideal value
Startup Time	Time to get an environment up and running for experimentation	Fast
Hardware Dependency	Need for specialized hardware eg. GPU/TPU	Low
Scalability	Ability to scale across distributed datasets and compute	High
Runtime Observability	Availability of monitoring tools (CPU, memory, logs, errors)	High
Environment Control	Ability to use specific os, language, packages	High
Portability	Ease of taking the solution to another platform or edge	High
Interoperability	Ability to integrate with other tools in the stack (e.g., CI/CD, GCP, MLflow)	High
Latency Suitability	Suited for real-time / low-latency inference need	High
Storage Flexibility	Ability to handle and work with diverse data formats (structure, unstructured, etc)	High

SNo	Domain	Tool/Platform	Startup Time	Hardware Dependency	Scalability	Runtime Observability	Environment Control	Portability	Interoperability	Latency Suitability	Storage Flexibility	Prod. Pipeline Readiness
1	Structured Tabular Data	Databricks	Fast	Low	High	Medium	Medium	Medium	High	High	High	High
		BigQuery + VMs Runtime	Fast	Low	High	High	High	High	High	High	High	High
2	Time Series	Databricks	Medium	Medium	High	Medium	Medium	Medium	High	Medium	High	Medium
		AWS Timestream + VMs Python + Grafana	Fast	Medium	Medium	High	High	High	High	High	High	High
3	Signal Processing	Databricks	Medium	High	Medium	Medium	Low	Medium	Medium	Medium	Medium	Medium
		MATLAB / VMs Python Runtime	Fast	Medium	Medium	High	High	High	High	High	Medium	High
4	Computer Vision	Databricks	Slow	High	High	Medium	Low	Medium	High	Medium	Low	Medium
		VMs GPU Dev (PyTorch + Jupyter)	Fast	Medium	High	High	High	High	High	Medium	Medium	High
5	NLP	Databricks	Medium	Medium	High	Medium	Medium	Medium	High	Medium	High	Medium

	SNo,Category,Issue Description,Implications,Business Explanation
	1,Environment & Dependency Control,"Limited control over Python versions, no native support for venv or conda, and minimal environment isolation.","Difficult to pin down exact library versions, which is critical for reproducibility, model portability, and CI/CD.","Lack of precise control may lead to inconsistent results between development and production, increasing project risk and time to market."
	2,Production Deployment for Edge/Embedded,"Databricks is not optimized for producing lightweight, portable, or edge-deployable code and models.",Inference pipelines and final models deployed on edge or embedded systems need tight memory and runtime control—something Databricks doesn’t support well.,"The platform isn't suitable for projects targeting IoT or low-power environments, which could delay adoption in real-world applications."
	3,Real-Time Resource Monitoring,Lack of real-time CPU/memory/IO/GPU usage metrics at the notebook or job level.,Hard

	SNo,Role,Domain,Best Tools/Platform,Why Use Databricks,Limitations in using Databricks,Recommendation on Databricks
	1,Data Scientist,Structured Tabular Data,"Databricks, BigQuery, Redshift, VMs, Local Env","Seamless data lake integration, Delta Lake, good AutoML pipelines, strong at aggregations and joins","Less useful for low-latency modeling, slow to deploy fine-tuned models to prod",Strongly recommended as a primary tool for structured data prototyping and experimentation.
	2,Data Scientist,"Time Series (sensor, finance)","Azure Data Explorer, Prophet, Darts, GluonTS, GCP AI Forecasting, Kafka Streams",Can handle large-scale time series processing and feature engineering,Lacks ready-to-use TS-specific models and operational forecasting toolkits,"Partially recommended as a side tool for feature generation, not final modeling."
	3,Data Scientist,Signal Processing,"MATLAB, SciPy, Local Env, Librosa, Wavelet Toolbox",Useful for distributing raw signal data processing at scale,"Not natively signal-processing frie