Summary of Why Most Data Projects Fail & How to Avoid It • Jesse Anderson • GOTO 2023
Main Ideas and Concepts
- High Failure Rate of Data Projects:
The failure rate for data projects is alarmingly high, with only 15% succeeding in reaching production, according to Gartner and Jesse Anderson's own consulting experiences.
- Misconceptions About Technology:
Common beliefs attribute project failures to the choice of technology or programming languages (e.g., Snowflake vs. Databricks, Python vs. Java). However, these are not the primary reasons for failure.
- Importance of Asking the Right Questions:
Successful data projects require clear answers to fundamental questions: Who, What, When, Where, and How.
- Team Composition:
The right mix of personnel is crucial:
- Data Scientists: Typically have a mathematical background and some coding skills, but are not software engineers.
- Data Engineers: Software engineers who specialize in data and understand best practices.
- Operations Engineers: Responsible for operationalizing data frameworks and ensuring system reliability.
- Balanced Team Ratios:
Organizations often fail when they have an imbalance in team composition, such as too many Data Scientists and not enough Data Engineers.
- Clear Business Value:
Projects must have a clear path to value creation, avoiding vague goals like "doing AI." A focus on attainable, clear objectives is essential.
- Timelines and Expectations:
Establish realistic timelines for project milestones. Both overpromising and underestimating project duration can lead to failure.
- Data Processing Architecture:
Decisions about where data will be processed (cloud, on-premises, or hybrid) are critical and should be made carefully to avoid using inappropriate technologies.
- Execution Focus:
Teams should concentrate on a limited number of objectives (1-3) to avoid spreading themselves too thin and losing efficiency.
- Understanding Value:
Teams must articulate the business value of their projects, ideally aiming for a 10x return on investment (ROI) to justify expenses.
- Addressing Gaps:
Organizations need to identify gaps in both technology and personnel. Not all challenges stem from technology; often, they are related to organizational structure and communication.
- Seeking Help:
Organizations should consider external help (outsourcing, consulting) when needed, but recognize that problems won't resolve themselves without proactive effort.
Methodology / Instructions
- Before Starting a Data Project:
Ensure answers to the following questions are clear:
- Who: Identify the right team members and their roles.
- What: Define the specific objectives and value creation path.
- When: Establish a realistic timeline for project milestones.
- Where: Decide on the architecture for data processing.
- How: Develop a clear execution plan focused on 1-3 key objectives.
- Focus on Team Composition:
Ensure a balanced ratio of Data Scientists, Data Engineers, and Operations Engineers to support project success.
- Communicate Value:
Articulate the business value of data initiatives clearly to stakeholders, aiming for a 10x ROI.
- Evaluate External Assistance:
Be open to seeking external help to address persistent challenges, but ensure there is a concerted effort to resolve issues.
Speakers/Sources Featured
- Jesse Anderson (main speaker)
Notable Quotes
— 01:15 — « Only 15% of data projects succeed; they actually get into production. »
— 02:10 — « Technology is just one small piece; it's an important piece, but still, there are other parts that are much more important with data. »
— 08:42 — « None of them [data science, data engineering, operations] is more important; each one is equally important to get this value chain done. »
— 09:26 — « It looks like a house falling in on itself all the time. »
— 10:40 — « We need to have a clear annotatable path to value creation; if we don't have this, we will lose. »
Category
Educational