Technology

Data Engineer, startdataengineering.com | Bringing software engineering best practices to data engineering.

46,364 followers 2mo

It took me 10 years to learn about the different types of data quality checks; I'll teach it to you in 5 minutes: 1. Check table constraints The goal is to ensure your table's structure is what you expect: * Uniqueness * Not null * Enum check * Referential integrity Ensuring the table's constraints is an excellent way to cover your data quality base. 2. Check business criteria Work with the subject matter expert to understand what data users check for: * Min/Max permitted value * Order of events check * Data format check, e.g., check for the presence of the '$' symbol Business criteria catch data quality issues specific to your data/business. 3. Table schema checks Schema checks are to ensure that no inadvertent schema changes happened * Using incorrect transformation function leading to different data type * Upstream schema changes 4. Anomaly detection Metrics change over time; ensure it's not due to a bug. * Check percentage change of metrics over time * Use simple percentage change across runs * Use standard deviation checks to ensure values are within the "normal" range Detecting value deviations over time is critical for business metrics (revenue, etc.) 5. Data distribution checks Ensure your data size remains similar over time. * Ensure the row counts remain similar across days * Ensure critical segments of data remain similar in size over time Distribution checks ensure you get all the correct dates due to faulty joins/filters. 6. Reconciliation checks Check that your output has the same number of entities as your input. * Check that your output didn't lose data due to buggy code 7. Audit logs Log the number of rows input and output for each "transformation step" in your pipeline. * Having a log of the number of rows going in & coming out is crucial for debugging * Audit logs can also help you answer business questions Debugging data questions? Look at the audit log to see where data duplication/dropping happens. DQ warning levels: Make sure that your data quality checks are tagged with appropriate warning levels (e.g., INFO, DEBUG, WARN, ERROR, etc.). Based on the criticality of the check, you can block the pipeline. Get started with the business and constraint checks, adding more only as needed. Before you know it, your data quality will skyrocket! Good Luck! - Like this thread? Read about they types of data quality checks in detail here 👇 https://lnkd.in/eBdmNbKE Please let me know what you think in the comments below. Also, follow me for more actionable data content. #data #dataengineering #dataquality

12 Comments

Geoff Hancock CISO CISSP, CISA, CEH, CRISC

As a CISO (multiple times) and CEO I help business and technology executives enhance their leadership, master cyber operations, and bridge cybersecurity with business strategy.

8,339 followers 2mo

A Quick Plan/Approach For CISO’s to Address AI Fast. As a CISO/CEO you have to stay on top of new ideas, risks and opportunities to grow and protect the business. As we all keep hearing and seeing LLM/AI usage is increasing every day. This past week my inbox is full of one question How do I actually protect my company's data when using AI tools? Over the last 9 years I have been working on, involved with and creating LLM/AI cyber and business programs and as a CISO I have been slowly integrating ideas about AI/cyber operations, data protection and business. Here are five AI privacy practices that I have found that really work. I recommend to clients, partners and peers. I group them into three clear areas: Mindset, Mechanics, and Maintenance. 1. Mindset: Build AI Privacy Into the Culture Privacy isn't just a checklist, it's a behavior. Practice #1: Treat AI like a junior employee with no NDA. Before you drop anything into ChatGPT, Copilot, or any other AI tool, stop and ask: Would I tell this to a freelancer I just hired five minutes ago? That's about the level of control you have once your data is in a cloud-based AI system. This simple mental filter keeps teams from oversharing sensitive client or company info. Practice #2: Train people before they use the tool, not after. Too many companies slap a "responsible AI use" policy into the employee handbook and call it a day. That's no good. Instead, run short, focused training on how to use AI responsibly specially around data privacy. 2. Mechanics: Make Privacy Part of the System Practice #3: Use privacy-friendly AI tools or self-host when possible. Do your research. For highly sensitive work, explore open-source LLMs or self-hosted solutions like private GPTs or on-prem language models. It's a heavier lift but you control the environment. Practice #4: Classify your data before using AI. Have a clear, documented data classification policy. Label what's confidential, internal, public, or restricted, and give guidance on what can and can't be included in AI tools. Some organizations embed DLP tools into browser extensions or email clients to prevent slip-ups. 3. Maintenance: Keep It Tight Over Time Practice #5: Audit AI usage regularly. People get busy. Policies get ignored. That's why you need a regular cadence quarterly is a good place to start where you review logs, audit prompts and check who's using what. AI is evolving fast, and privacy expectations are only getting tighter. What other ways are you using LLM/AI in your organization?

2 Comments

Kira Makagon

President and COO | Independent Board Director

9,282 followers 3mo

SMBs are facing a critical challenge: how to maximize efficiency, connectivity, and communication without massive resources. The answer? Strategic AI implementation. Many small business owners tell me they're intimidated by AI. But the truth is you don't need to overhaul your entire operation overnight. The most successful AI adoptions I've seen follow these six straightforward steps: 1️⃣ Identify Immediate Needs: Look for quick wins where AI can make an immediate impact. Customer response automation is often the perfect starting point because it delivers instant value while freeing your team for higher-value work. 2️⃣ Choose User-Friendly Tools: The best AI solutions integrate seamlessly with your existing technology stack. Don't force your team to learn entirely new systems. Find tools that enhance what you're already using. 3️⃣ Start Small, Scale Gradually: Begin with focused implementations in 1-2 key areas. This builds confidence, demonstrates value, and creates organizational momentum before expanding. 4️⃣ Measure and Adjust Continuously: Set clear KPIs from the start. Monitor performance religiously and be ready to refine your AI configurations to optimize results. 5️⃣ Invest in Team Education: The most overlooked success factor? Proper training. When your team understands both the "how" and "why" behind AI tools, adoption rates soar. 6️⃣ Look Beyond Automation: While efficiency gains are valuable, the real competitive advantage comes from AI-driven insights. Let the technology reveal patterns in your business processes and customer behaviors that inform better strategic decisions. The bottom line: AI adoption doesn't require disruption. The most effective approaches complement your existing workflows, enabling incremental improvements that compound over time. What's been your experience implementing AI in your business? I'd love to hear what's working (or not) for you in the comments below. #SmallBusiness #AI #BusinessStrategy #DigitalTransformation

4 Comments

Dustin Powers

Cannabis Processing Consultant - Permaculturist - Gang Leader - Forum Host - Land Owner - Father

1,592 followers 3mo

AI didn't take my job... It helped me get promoted. It also came up with that hook. 🤖 Now that I know you are invested in learning about AI, I want to show you some tools I am using to maximize my efficiency in my new fractional remote role. Like most people, I used to juggle countless apps, lists, and notes, only to end my day feeling overwhelmed and underproductive. But integrating AI into my workflow has completely changed the game. Here's the 3 tools that I currently use the most: Notion.so : Organize & Streamline Notion: Effortlessly structures my ideas, projects, and plans in one cohesive space. UseMotion.com : Prioritize & Schedule Motion: Transforms chaotic task lists into clear, prioritized schedules, reducing stress and boosting productivity. The automatic scheduling and rescheduling of tasks, directly into my Google Calendar has been incredibly powerful. Here in the next couple weeks I plan on integrating several more members of my Heron Labs team into the app as well so that all of our projects and tasks are immediately visible to eachother. No more back and forth emails trying to schedule a call. (A note here, Google Calendar recently rolled out a new native feature for scheduling calls based on your calendar that works really well too) ChatGPT.com : Create & Inspire ChatGPT: Fuels my creativity, quickly turning rough ideas into polished content and captivating visuals. By delegating routine decisions and overcoming creative roadblocks with AI, I've been able to assume more professional responsibilities without sacrificing personal family time or neglecting my farm/homestead chores. Time is my most valuable resource. AI tools help me me spend it wisely. How are you leveraging AI to optimize your time?

The AI Powered Super App for Work | Motion (Try for free)

usemotion.com

9 Comments

Yassine Kachchani

I talk to eng. leaders & share what I learned on talent, leadership, and culture | Co-founder & CEO at Gemography

8,799 followers 4mo

The more I used AI to build, the less magical it felt. After spending weeks trying to rebuild our platform with Claude/ChatGPT, I realized a hard truth: What started as a productivity hack turned into a full-time prompt engineering job. Don't get me wrong, AI is mind-blowing at what it does. But you have to think of it the right way: Even with recent updates like Claude Code + Sonnet 3.7, AI still needs your architectural vision and continuous feedback to deliver production-quality results. I recently shared my experience with AI coding, and dozens of engineers and leaders jumped in with their own battle-tested insights (original post in comments). These were the strongest insights and recommendations: - Design the architecture first, then let AI do the implementation following YOUR architecture. - Instead of asking AI to build the whole car at once, ask it to build individual parts separately. - Iterate and guide: Aiming for perfection on the first prompt response will not work. - Start with AI-written unit tests to ensure your expectations align with the output (hello TDD 😉). - And finally, remember that AI can only code as well as you can architect and explain. These insights are from engineering professionals who are making AI work instead of creating more work. The gap between using AI and using it WELL is wider than most realize, and the teams closing this gap are the ones getting real results. What's your most effective technique for integrating AI into your development process? — Big thanks to Antons Kumkovs, Michael Rollins, Winner Emetuche, Drew Adams, Domingo Gómez García, Ryan Booth, Michael Fanous, David Cornelson, Youssef El Hassani, and Michael L. whose comments are included in the carousel 👇

19 Comments

Isaac Truong

7,064 followers 4mo

Choosing between Snowflake and Databricks? It’s not just a tech decision; it’s a strategic one. Here’s a quick breakdown: Snowflake shines with structured data and SQL analytics. If you’re in finance or healthcare, this could be your go-to. It’s built for speed and efficiency in handling structured data. On the flip side, Databricks is all about flexibility and performance, especially for big data and real-time analytics. Retail and energy sectors often find it more suited to their needs. But here’s the kicker: your choice should hinge on your specific use case. What type of data are you working with? How much control do you need? (Trust me, these questions matter more than you think.) From my experience, I’ve seen organizations struggle when they don’t align their data strategy with their business goals. For instance, a healthcare company I worked with initially chose Databricks for its flexibility but later realized they needed the structured approach that Snowflake offered. They ended up spending more time and resources than necessary. Data analytics isn’t just about the tools; it’s about how those tools fit into your overall strategy. So, before you make a decision, take a step back and assess your needs. What’s your experience with these platforms? Have you found one to be more beneficial than the other? I’d love to hear your thoughts. If you found this post helpful, consider sharing it with your network. #DataAnalytics #Snowflake #Databricks

4 Comments

Karim Boussedra

Fractional CFO and Advisor | San Francisco Bay area | Ex KPMG

4,496 followers 5mo

$250k/mo burn. 3 months runway left. No VC cash. "You don't need a miracle, you need a plan". That's what I told a startup founder who hired me as a CFO. Runway increased from 3 to 9 months. Here is how: ▶️ The crisis: • Baseline burn: $250k/month • Cash in bank: $750k → 3 months of runway • No VC lifeline: out of question for now. • Goal: buy time to hit 6+ months runway and qualify for non-dilutive capital. • Acknowledging that this situation is a failure in terms of planning. ▶️ The playbook: 4 levers to pull We attacked burn from all angles: cost cuts, cash flow optimization, revenue acceleration, and non-dilutive financing. 1. Cost reduction: saved $80k/month Why? Fixed costs are the easiest to control quickly. Tactics: • Cloud infrastructure (savings: $25k/month): => Renegotiated AWS commit discounts (locked in 3-year terms for 40% savings). • Software stack (savings: $15k/month): => Audited 35 tools. Cut duplicate/redundant apps. => Demanded 20% discounts from vendors by threatening cancellations (yes, dirty). • Team restructuring (savings: $40k/month): => Reduced headcount by 12% (underperforming roles). => Shifted to contractors in lower-cost regions. => Paused all non-critical hires. 2. Better payment terms: unlocked $20k/month in cash flow Why? Stretch payables without damaging relationships. Tactics: • Vendor negotiations: => Extended Net-30 to Net-60 terms with 4 key vendors. • Customer Collections: => Hired a part-time collections specialist to chase late payments (>30 days). 3. Faster sales cycles: added $20k/month in Revenue Why? Speed = cash. Tactics: •Removed friction: => Cut demo steps from 3 calls to 1. => Launched a self-service “Start Now” plan (no sales call, 14-day trial). • Upsold existing customers: => Targeted inactive users with a “reactivation” campaign (12% converted to paid add-ons). 4. Non-dilutive financing: added $300k in Cash Why? Buy runway without giving up equity. Tactics: • Revenue-based financing: => Secured $200k at 8% fee (repay 5% of monthly revenue until 1.4x repaid). • AR factoring: => Sold $100k of outstanding invoices (90% advance rate, 3% fee). ▶️ Results • New monthly burn: $130k/month (48% reduction). • Cash balance after 3 months: 750k(initial)−390k (3 months burn) + 300k(financing)=660k • Extended Runway: 660k/130k = 5+ months → 9+ months with financing. ▶️ Key takeaways for founders • Cut fast, cut deep: Focus on high-impact fixed costs first (cloud, payroll, SaaS tools). • Cash flow > Accounting profit: Stretch payables, pull forward receivables. • Simplify to accelerate: Remove friction in sales, pricing, and onboarding. • Get creative with financing: Revenue-based loans, prepayments, and AR factoring buy runway. You don’t need a miracle — you need a plan. If you’re staring down a single-digit runway, DM me. Let’s fix this.

73 Comments

Dylan Davis

I help mid-size teams with AI automation | Save time, cut costs, boost revenue | No-fluff tips that work

4,529 followers 5mo

Last week I spent 6 hours debugging with AI. Then I tried this approach and fixed it in 10 minutes The Dark Room Problem: AI is like a person trying to find an exit in complete darkness. Without visibility, it's just guessing at solutions. Each failed attempt teaches us nothing new. The solution? Strategic debug statements. Here's exactly how: 1. The Visibility Approach - Insert logging checkpoints throughout the code - Illuminate exactly where things go wrong - Transform random guesses into guided solutions 2. Two Ways to Implement: Method #1: The Automated Fix - Open your Cursor AI's .cursorrules file - Add: "ALWAYS insert debug statements if an error keeps recurring" - Let the AI automatically illuminate the path Method #2: The Manual Approach - Explicitly request debug statements from AI - Guide it to critical failure points - Maintain precise control over the debugging process Pro tip: Combine both methods for best results. Why use both? Rules files lose effectiveness in longer conversations. The manual approach gives you backup when that happens. Double the visibility, double the success. Remember: You wouldn't search a dark room with your eyes closed. Don't let your AI debug that way either. — Enjoyed this? 2 quick things: - Follow along for more - Share with 2 teammates who need this P.S. The best insights go straight to your inbox (link in bio)

Chandrasekar Srinivasan

Engineering Leader at Microsoft

29,252 followers 5mo

About five years ago, I had a junior engineer on my team who was brilliant but struggled so much he was about to have a low performance review. Let’s call him Anthony He was fresh out of college and eager to prove himself, but his code reviews often came back with extensive feedback. The root of the issue wasn’t his intelligence or effort it was his approach. Anthony had this habit of jumping straight into the deep end. He wanted his code to be optimized, elegant, and perfect from day one. But in that pursuit, he often got stuck either over-engineering a solution or ending up with something too complex to debug. Deadlines were slipping, and his confidence was taking a hit. One day, during a particularly rough code review, I pulled him aside and shared a principle that had profoundly shaped my own career: “Make it work, make it right, make it fast.” I explained it like this: 1. Make it work – First, solve the problem. Forget about how pretty or efficient your code is. Focus on meeting the acceptance criteria. If it doesn’t work, nothing else matters. 2. Make it right – Once it works, step back. Refactor the code, and make it clean, modular, and maintainable. Code is for humans who’ll work with it in the future. 3. Make it fast – Finally, if performance is critical, optimize. But don’t sacrifice clarity or maintainability for marginal speed gains. The next sprint, he followed this approach on a tricky API integration task. When we reviewed his work, the difference was night and day. Not only had he delivered on time, but the code was a joy to read. Even he admitted it was the least stressful sprint he’d had in months. Six months later, Anthony came to me and said, “That principle you shared, it’s changed everything. Thank you for pulling me aside that day.” Today, Anthony is a senior engineer leading his team, mentoring others, and applying the same principle that once helped him. We’re still on good terms though he moved to another org. Sometimes, the most impactful advice is the simplest. As engineers, we often get caught up in trying to do everything perfectly all at once But stepping back and breaking it into manageable steps can make all the difference.

78 Comments

Dawn Choo

Data Scientist (ex-Meta, ex-Amazon)

149,994 followers 6mo

It took me 6 years to land my first Data Science job. Here's how you can do it in (much) less time 👇 1️⃣ 𝗣𝗶𝗰𝗸 𝗼𝗻𝗲 𝗰𝗼𝗱𝗶𝗻𝗴 𝗹𝗮𝗻𝗴𝘂𝗮𝗴𝗲 — 𝗮𝗻𝗱 𝘀𝘁𝗶𝗰𝗸 𝘁𝗼 𝗶𝘁. I learned SQL and Python at the same time... ... thinking that it would make me a better Data Scientist. But I was wrong. Learning two languages at once was counterproductive. I ended up being at both languages & mastering none. 𝙇𝙚𝙖𝙧𝙣 𝙛𝙧𝙤𝙢 𝙢𝙮 𝙢𝙞𝙨𝙩𝙖𝙠𝙚: Master one language before moving onto the next. I recommend SQL, as it is most commonly required. ——— How do you know if you've mastered SQL? You can ✔ Do multi-level queries with CTE and window functions ✔ Use advanced JOINs, like cartesian joins or self-joins ✔ Read error messages and debug your queries ✔ Write complex but optimized queries ✔ Design and build ETL pipelines ——— 2️⃣ 𝗟𝗲𝗮𝗿𝗻 𝗦𝘁𝗮𝘁𝗶𝘀𝘁𝗶𝗰𝘀 𝗮𝗻𝗱 𝗵𝗼𝘄 𝘁𝗼 𝗮𝗽𝗽𝗹𝘆 𝗶𝘁 As a Data Scientist, you 𝘯𝘦𝘦𝘥 to know Statistics. Don't skip the foundations! Start with the basics: ↳ Descriptive Statistics ↳ Probability + Bayes' Theorem ↳ Distributions (e.g. Binomial, Normal etc) Then move to Intermediate topics like ↳ Inferential Statistics ↳ Time series modeling ↳ Machine Learning models But you likely won't need advanced topics like 𝙭 Deep Learning 𝙭 Computer Vision 𝙭 Large Language Models 3️⃣ 𝗕𝘂𝗶𝗹𝗱 𝗽𝗿𝗼𝗱𝘂𝗰𝘁 & 𝗯𝘂𝘀𝗶𝗻𝗲𝘀𝘀 𝘀𝗲𝗻𝘀𝗲 For me, this was the hardest skill to build. Because it was so different from coding skills. The most important skills for a Data Scientist are: ↳ Understand how data informs business decisions ↳ Communicate insights in a convincing way ↳ Learn to ask the right questions 𝙇𝙚𝙖𝙧𝙣 𝙛𝙧𝙤𝙢 𝙢𝙮 𝙚𝙭𝙥𝙚𝙧𝙞𝙚𝙣𝙘𝙚: Studying for Product Manager interviews really helped. I love the book Cracking the Product Manager Interview. I read this book t𝘸𝘪𝘤𝘦 before landing my first job. 𝘗𝘚: 𝘞𝘩𝘢𝘵 𝘦𝘭𝘴𝘦 𝘥𝘪𝘥 𝘐 𝘮𝘪𝘴𝘴 𝘢𝘣𝘰𝘶𝘵 𝘣𝘳𝘦𝘢𝘬𝘪𝘯𝘨 𝘪𝘯𝘵𝘰 𝘋𝘢𝘵𝘢 𝘚𝘤𝘪𝘦𝘯𝘤𝘦? Repost ♻️ if you found this useful.

179 Comments

The AI Powered Super App for Work | Motion (Try for free)

usemotion.com

More in technology

Explore categories