What do we do?
We gather and process machine learning training data for AI applications internationally and have been providing services for cutting-edge AI businesses as well as Fortune 500 companies. We count Amazon, Sony and Portugal Ventures amongst our investors and are proud to be one of the fastest growing companies in the AI field.
How do we do it?
DefinedCrowd’s culture is about our four core values: Trust, Innovation, Passion, and Creativity. We like to think that we are a multi-talented, quirky and hard-working group dedicated to building a great platform, making our customers and community happy, and making our employees feel at home.
How can you help?
We are currently looking for talented new members across the world to join this energetic, hardworking and fun team in our Seattle headquarter, our R&D centers in Lisbon and Porto, or our office in Tokyo:
- Ensure quality and integrity of the data provided to end users
- Be responsible for developing reliable and easy to modify data processing pipelines
- Awareness of other departments' data needs
- Adapt and extend our data infrastructure as our operational services grow
- Build and improve infrastructure for data science and analytics
What do we offer:
- The opportunity to learn the industry best practices
- Flexible working conditions
- International and diverse teams
- Fresh fruit and a healthy working environment.
- Background in the Computer Science field or related
- At least 3 years’ experience programming in one or more of the following languages: Java, Python, C++, C#, etc.
- At least 2 years’ experience in ETL batch and streaming data pipeline development, and maintenance
- Experience working with big data technologies like: Hive, Spark, Pig, Presto, Impala, etc.
- Knowledge of messaging systems like: RabbitMQ, ActiveMQ, Kafka, etc.
- Experience working with both SQL and NoSQL databases
- Experience with data cleaning, and visualization
- Experience ensuring data quality and integrity for multiple end users
- Proactive, fast learner, and good communicator
- Experience in data warehouse modelling
- Experience in data governance and data lineage domains
- Experience serving and maintaining data for a BI solution
- Experience building infrastructure for self-serving data science projects
- Theoretical knowledge in the Machine Learning field