avatar
OCR job from recruiter# DataSciences - 数据科学
c*z
1
If you would be interested please let me know at h******[email protected] as
early as possible.
Job Title: position for Data Scientist for Machine Learning and Natural
Language Processing Experience
Company: BITS
Task 1: Extend NIST Scientific Text Extraction System
Description of Tasks
I. Implement distributed PDF to image conversion subsystem that converts
pages of scientific articles to individual images.
II. Implement distributed optical character recognition-based text
extraction system that extracts text from images of individual pages and
prepares them for further processing by error correcting, machine learning,
and natural language systems.
III. Develop installation scripts for developed system and required tools
to facilitate installation on Linux virtual machines.
IV. Configure Linux virtual machine images with developed system and
necessary software tools and libraries for deployment in a distributed
virtualized system such as cloud computing.
V. Develop system documentation for deployment, maintenance, and
operation.
Deliverables:
The deliverables for the tasks under Task 1 are:
1. PDF to Image Conversion subsystem in Python using ImageMagick to
perform the actual image conversion. Distribution of computation implemented
using Redis and Thoonk to create a distributed job queue system in which a
publisher node enters PDF ids from a fileserver into the job queue for
distributed worker nodes to fetch and convert into images. Images should be
returned to the fileserver as completed work units in zip files containing
the images. Subsystem should be fault tolerant and include the necessary
error handling and logging to disk to allow for uninterrupted operation over
long periods of times. Failure to perform an image conversion should not
prevent the system from continuing nor should information about the failure
be lost.
2. OCR-based Text Extraction subsystem in Python using OCRopus to extract
the text from the image files. Distribution of computation implemented
using Redis and Thoonk to create a distributed job queue system in which a
publisher node enters work unit identifiers (work units are generated during
image conversion) from a fileserver into the job queue for distributed
worker nodes to fetch and process. Extracted text should be added to zip
file-based work unit and sent back to the fileserver. Subsystem should be
fault tolerant and include the necessary error handling and logging to disk
to allow for uninterrupted operation over long periods of times. Failure to
perform a text extraction should not prevent the system from continuing nor
should information about the failure be lost.
3. Command-line installation scripts in Python that make use of existing
packaging and distribution facilities associated with Linux and Python
libraries when available.
4. A Linux Virtual Machine image compatible with existing VMWare based
infrastructure what have been configured for rapid deployment. The VM image
should contain a current patched version of Linux with the developed code
and its prerequisites installed via the installation script previously
developed.
5. System documentation, in Microsoft Word, for the entire Scientific
Text Extraction System. Documentation shall include an overview of the
architecture, data flow, use of and integration with Redis, deployment,
maintenance, and operation of the application.
Task 2: Develop Graphical User Interface for Computational Soft Materials
Workbench for Multiscale Modeling.
I. Working with MML-specified prototype workbench code extend existing C+
+-based GUI to design and implement menu bar items and dialog boxes which
can be connected to MML specified libraries and tools.
Deliverables:
The deliverables for the tasks under Task 2 are:
1. A prototype of workbench that can be used to illustrate key user
interface concepts. Consists of menus for ZENO and help, a toolbar, an
interface for Python, dialogs for Amorphous Builder, Trajectory Analysis
Tool, LAMMPS and GROMACS simulations, Coarse-Mapping Tool, Coarse-Grain
Structure Tool, Coarse-Grained Force Field Assignment, and ZENO.
Task 3: Develop Computational Soft Materials Workbench for Multiscale
Modeling.
I. Develop core application components.
II. Connect GUI to algorithms and tools.
III. Develop visualization of molecular structures.
IV. Implement facilities for reading and writing files in atomistic
formats.
V. Implement classes to interface to algorithms and tools for molecular
modeling.
VI. Create functionality for Molecular Modeling workflows.
VII. Write documentation for workbench system.
Deliverables:
The deliverables for the tasks under Task 3 are:
1. Identified APIs for GUI, data conversion, molecular visualization,
data conversion, and extensibility. Implement class library to integrate
with APIs.
2. Interface classes to connect functionality to menus for ZENO and help,
a toolbar, an interface for Python, and multiple dialogs (Amorphous Builder
, Trajectory Analysis Tool, LAMMPS and GROMACS simulations, Coarse-Mapping
Tool, Coarse-Grain Structure Tool, Coarse-Grained Force Field Assignment,
and ZENO).
3. 3D visualization, rotation, zoom, selection of individual elements,
and display lists of molecular structures. Visualization of grouping of
highlighted elements into coarse grained elements.
4. Functionality to read and write atomistic data in a variety of domain
formats: CML. PDB, XYZ, LAMMPS (Data and Input), GROMACS (Data, Input, and
Trajectory), Coarse Grain (Mapping and Force Field Table).
5. Classes to interface with molecular modeling algorithms (Amorphous
Builder and Coarse-Graining).
and molecular modeling tools (LAMMPS, GROMACS, Coarse-Grained Structure
Building Tool, ZENO, and Trajectory Analysis Tool).
6. Workflow Functionality that supports a variety of molecular
calculations and computations.
7. Documentation of workbench, creation of GUI Help Menus and web pages.
QUALIFICATIONS OF CONTRACTOR KEY PERSONNEL
All contractor personnel working under this task order shall be designated
as Key Personnel. All Contractor Key Personnel working under this task order
must meet the following minimum qualifications.
• Minimum of 5 years of a scripting language, such as Python,
Javascipt, Perl, or PHP
• Minimum of 5 years experience with system languages such as C or
C++
• Minimum of 5 years experience with Agile Methodologies, such as
XP or SCRUM
• Minimum of 5 years experience with a combination of SQL and No-
SQL databases
• Minimum of 5 years experience developing web applications, with
HTML5, CSS3, Javascript, JQuery, Web 2.0 technologies
• Minimum of 3 years experience developing RESTful interfaces
• Minimum of 1 year experience setting up virtual machines and
installing and making Debian packages
• Minimum of 5 years experience in developing graphical user
interfaces
• Minimum of 5 years experience with XML technologies
相关阅读
logo
联系我们隐私协议©2024 redian.news
Redian新闻
Redian.news刊载任何文章,不代表同意其说法或描述,仅为提供更多信息,也不构成任何建议。文章信息的合法性及真实性由其作者负责,与Redian.news及其运营公司无关。欢迎投稿,如发现稿件侵权,或作者不愿在本网发表文章,请版权拥有者通知本网处理。