Vigilant is on a mission to unlock the potential of public data by taking denormalized and disaggregated public datasets and making them accessible for search and analysis at scale. Vigilant's backend engineering team builds out innovative tools for mining thousands of public databases, implements robust and flexible pipelines for wrangling unpredictable data, and creates intuitive APIs to connect the data layer with multiple end-user products. As a Data Engineer, you will not only assist in maintaining existing pipelines but will also undertake longer-term projects to improve data ingestion along correctness, latency, and cost dimensions.
We bat well above our weight - we're a small but growing company that provides critical tools for some of the largest financial institutions, media companies and political campaigns and organizations in the world. We're building exciting things and tackling some difficult problems, and we'd love to have you come work with us.
We hope most or all of these things are true of you:
- You’re comfortable writing generalized data transformation tools to be used by engineers and developers on the team along with thorough documentation.
- You can write and/or update models based on database-derived datasets.
- You’re able to spin up instances of common databases with Docker and configure production tables, indices, and access controls.
- You’re comfortable getting hands-on with complicated data without relying on it being pre-processed or cleaned.
- You’re excited about working to improve public data access and legibility.
We hope you'll have experience with some or all of these things:
- Building and maintaining robust and scalable end-to-end ETL/ELT pipelines capable of ingesting data from a range of sources including database dumps, API endpoints, massive Excel/CSV files, and sources in less friendly formats such as PDFs and HTML.
- Building analytical systems to rapidly identify and recover from changes in unpredictable third-party data sources.
- Building with and using both SQL and NoSQL databases, and identifying the appropriate use cases for each. Experience with PostgreSQL and Elasticsearch is preferred.
- Advanced database administration, including cluster migration, access control management, and handling version upgrades.
- Communicating technical concepts effectively to stakeholders both within and outside of engineering.
- Conducting analysis of data quality and assurance, coverage, and answering specific business queries as needed.
Here's some of what you'll be doing:
- Assist in the ongoing refinement of the data layer roadmap and help reconcile long-term technical direction with short-term projects that arise in response to engineering or business needs.
- Maintain and improve existing data ingestion tools and infrastructure, including substantial work with custom implementations.
- Establish and develop naming conventions and consistent schemas to ensure clarity and usability in the data aggregated.
- Assist with data warehouse integration into various products and applications.