Architecture
source{d} Community Editon provides a frictionless experience for trying source{d} for Code Analysis.
Technical Architecture
The sourced
binary, a single CLI binary written in Go, is the user's main interaction mechanism with source{d} CE. It is also the only piece (other than Docker) that the user will need to explicitly download on their machine to get started.
The sourced
binary manages the different installed environments and their configurations, acting as a wrapper of Docker Compose.
The whole architecture is based on Docker containers, orchestrated by Docker Compose and managed by sourced
.
Components of source{d}
source{d} CE relies on different components to handle different use cases and to cover different functionalities. Each component is implemented as a running Docker container.
bblfsh
: parses source code into UASTs using Babelfish;you can learn more about it in our Babelfish UAST guide
gitbase
: runs gitbase, a SQL databaseinterface to Git repositories.
gitcollector
: is responsible for fetching repositories from the organizationsused to initialize source{d} CE. It uses gitcollector.
ghsync
: is responsible for fetching repository metadata from the organizationsused to initialize source{d} CE. It uses ghsync
metadatadb
: runs the PostgreSQL database that stores the repositoriesmetadata (users, pull requests, issues...) extracted by
ghsync
.postgres
: runs the PostgreSQL database that stores the state of the UI(charts, dashboards, users, saved queries and such).
sourced-ui
: runs the source{d} CE Web Interface. This component queriesdata from
bblfsh
,gitbase
,metadatadb
andpostgres
.
Some of these components can be accessed from the outside as described by Docker Networking section.
Docker Set Up
In order to make this work in the easiest way, some design decisions were made:
Isolated Environments.
Read more in Working With Multiple Data Sets
Each dataset runs in an isolated environment, and only one environment can run at the same time. Each environment is defined by one docker-compose.yml
and one .env
, stored in ~/.sourced
.
Docker Naming
All the Docker containers from the same environment share its prefix: srcd-<HASH>_
followed by the name of the service running inside, e.g srcd-c3jjlwq_gitbase_1
and srcd-c3jjlwq_bblfsh_1
will contain gitbase and babelfish for the same environment.
Docker Networking
In order to provide communication between the multiple containers started, all of them are attached to the same single bridge network. The network name also has the same prefix than the containers inside the same environment, e.g. srcd-c3jjlwq_default
.
Some environment services can be accessed from the outside, using their exposed port and connection values:
bblfsh
:port:
9432
gitbase
:port:
3306
database:
gitbase
user:
root
metadatadb
:port:
5433
database:
metadata
user:
metadata
password:
metadata
sourced-ui
:port:
8088
Persistence
To prevent losing data when restarting services, or upgrading containers, the data is stored in volumes. These volumes also share the same prefix with the containers in the same environment, e.g. srcd-c3jjlwq_gitbase_repositories
.
These are the most relevant volumes:
gitbase_repositories
, stores the repositories to be analyzedgitbase_indexes
, stores the gitbases indexesmetadata
, stores the metadata from GitHub pull requests, issues, users...postgres
, stores the dashboards and charts used by the web interface
Last updated